Python Basics for Data Science

Python is the backbone of every data science workflow. Before working with data, a solid understanding of Python fundamentals is essential. This topic covers variables, data types, control flow, functions, and data structures — the exact building blocks that every data science library is built on top of.

Variables and Data Types

A variable stores a value in memory. Python automatically detects the type of value without any declaration.

# Variables store different kinds of information
student_name = "Alice"        # String – text data
student_age  = 22             # Integer – whole number
gpa          = 3.85           # Float – decimal number
is_enrolled  = True           # Boolean – True or False

print(student_name, student_age, gpa, is_enrolled)

Output:

Alice 22 3.85 True

Common Data Types in Python

TypeExampleUsed For
int42Counting, IDs, rankings
float3.14Measurements, prices, percentages
str"Delhi"Names, categories, text
boolTrue / FalseConditions, flags, filters
NoneTypeNoneMissing or empty values

Lists – Ordered Collections

A list holds multiple values in a single variable. Lists maintain insertion order and allow duplicate values.

# A list of monthly sales figures (in thousands)
monthly_sales = [45, 52, 38, 67, 71, 60, 55]

print("First month:", monthly_sales[0])   # Index starts at 0
print("Last month:", monthly_sales[-1])   # Negative index from end
print("First 3 months:", monthly_sales[:3])

# Add a new month
monthly_sales.append(63)
print("Updated list:", monthly_sales)

Output:

First month: 45
Last month: 55
First 3 months: [45, 52, 38]
Updated list: [45, 52, 38, 67, 71, 60, 55, 63]

Diagram – List Indexing

monthly_sales = [45, 52, 38, 67, 71]

Index (forward):  0    1    2    3    4
Index (backward): -5  -4   -3   -2   -1
Values:          [45,  52,  38,  67,  71]

Tuples – Immutable Collections

A tuple works like a list but its values cannot change after creation. Tuples are useful for storing fixed data like coordinates or configuration settings.

# City location – values should never change
city_location = ("New Delhi", 28.6139, 77.2090)

city, latitude, longitude = city_location  # Unpacking
print(f"City: {city}")
print(f"Latitude: {latitude}, Longitude: {longitude}")

Output:

City: New Delhi
Latitude: 28.6139, Longitude: 77.209

Dictionaries – Key-Value Storage

A dictionary stores data as key-value pairs. This structure is widely used in data science for representing a single record or mapping labels to values.

# A dictionary representing one product record
product = {
    "id": 101,
    "name": "Wireless Headphones",
    "price": 1499.99,
    "in_stock": True,
    "category": "Electronics"
}

# Accessing values
print("Product:", product["name"])
print("Price: ₹", product["price"])

# Adding a new key
product["rating"] = 4.5
print("Rating:", product["rating"])

Output:

Product: Wireless Headphones
Price: ₹ 1499.99
Rating: 4.5

Sets – Unique Value Collections

A set stores only unique values. Duplicate entries get removed automatically. Sets are fast for membership checks.

# Customer cities – remove duplicates automatically
cities = {"Mumbai", "Delhi", "Bengaluru", "Mumbai", "Delhi", "Pune"}
print("Unique cities:", cities)

# Check membership
print("Chennai in list?", "Chennai" in cities)

Output:

Unique cities: {'Pune', 'Mumbai', 'Delhi', 'Bengaluru'}
Chennai in list? False

Control Flow – if / elif / else

Control flow lets code make decisions based on conditions. Data science uses this frequently for categorising values or applying rules.

# Classify a student's grade
score = 78

if score >= 90:
    grade = "A"
elif score >= 75:
    grade = "B"
elif score >= 60:
    grade = "C"
else:
    grade = "F"

print(f"Score: {score} → Grade: {grade}")

Output:

Score: 78 → Grade: B

Loops – for and while

Loops repeat a block of code multiple times. Data science uses loops to process lists, rows in a dataset, or results from a model.

For Loop

temperatures = [22, 31, 27, 19, 35, 28]

# Find days hotter than 30°C
hot_days = 0
for temp in temperatures:
    if temp > 30:
        hot_days += 1

print(f"Days above 30°C: {hot_days}")

Output:

Days above 30°C: 2

While Loop

# Keep doubling an investment until it exceeds 10000
amount = 1000
years = 0

while amount < 10000:
    amount = amount * 1.10   # 10% annual growth
    years += 1

print(f"Reaches ₹10,000 in {years} years")
print(f"Final amount: ₹{amount:.2f}")

Output:

Reaches ₹10,000 in 25 years
Final amount: ₹10834.71

Functions – Reusable Code Blocks

A function packages a set of instructions under a name. Calling that name runs the instructions. Functions eliminate repetition and make code easier to maintain.

# Function to calculate Body Mass Index (BMI)
def calculate_bmi(weight_kg, height_m):
    bmi = weight_kg / (height_m ** 2)
    
    if bmi < 18.5:
        category = "Underweight"
    elif bmi < 25:
        category = "Normal"
    elif bmi < 30:
        category = "Overweight"
    else:
        category = "Obese"
    
    return round(bmi, 2), category

# Call the function
bmi_value, bmi_label = calculate_bmi(70, 1.75)
print(f"BMI: {bmi_value} → {bmi_label}")

Output:

BMI: 22.86 → Normal

Diagram – Function Flow

Input Arguments
      |
      v
+---------------------+
|  def calculate_bmi  |
|  weight_kg, height_m|
|  ------------------|
|  bmi = weight/h^2  |
|  classify bmi      |
+---------------------+
      |
      v
Return Values: (22.86, "Normal")

List Comprehensions – Compact Loops

List comprehension creates a new list from an existing one using a single line of code. This pattern appears constantly in data science code.

# Standard loop – convert Celsius to Fahrenheit
celsius = [0, 20, 37, 100]
fahrenheit = []
for c in celsius:
    fahrenheit.append((c * 9/5) + 32)

# Same result using list comprehension
fahrenheit = [(c * 9/5) + 32 for c in celsius]

print("Celsius:   ", celsius)
print("Fahrenheit:", fahrenheit)

Output:

Celsius:    [0, 20, 37, 100]
Fahrenheit: [32.0, 68.0, 98.6, 212.0]

String Operations

Text data is common in data science — product names, customer reviews, city names. Python provides built-in methods to clean and process strings.

review = "  The product Quality is EXCELLENT!  "

# Common string methods
print(review.strip())          # Remove leading/trailing spaces
print(review.strip().lower())  # Convert to lowercase
print(review.strip().upper())  # Convert to uppercase

# Check and replace
print("excellent" in review.lower())
print(review.replace("EXCELLENT", "GOOD"))

Output:

The product Quality is EXCELLENT!
the product quality is excellent!
THE PRODUCT QUALITY IS EXCELLENT!
True
  The product Quality is GOOD!  

f-Strings – Formatted Output

f-strings embed variable values directly inside a string. They produce cleaner and more readable output than older formatting methods.

name     = "Priya"
score    = 94.5
rank     = 1

print(f"Student: {name}")
print(f"Score:   {score:.1f}%")
print(f"Rank:    {rank}")
print(f"Result:  {'Pass' if score >= 50 else 'Fail'}")

Output:

Student: Priya
Score:   94.5%
Rank:    1
Result:  Pass

Python Data Structures – Quick Comparison

+------------+----------+----------+-----------+------------+
| Structure  | Ordered  | Mutable  | Duplicates| Syntax     |
+------------+----------+----------+-----------+------------+
| List       |   Yes    |   Yes    |    Yes    | [1, 2, 3]  |
| Tuple      |   Yes    |   No     |    Yes    | (1, 2, 3)  |
| Set        |   No     |   Yes    |    No     | {1, 2, 3}  |
| Dictionary |   Yes    |   Yes    |  Keys: No | {"a": 1}   |
+------------+----------+----------+-----------+------------+

Summary

  • Variables store values; Python detects the type automatically
  • Lists, tuples, dictionaries, and sets each serve a different purpose in organising data
  • Control flow with if/elif/else applies logic to data values
  • Loops process collections of data efficiently
  • Functions package reusable logic and accept input to return output
  • List comprehensions write compact, readable transformations in one line
  • String methods clean and prepare text data for analysis

Leave a Comment