Python Basics for Data Science

Python is the backbone of every data science workflow. Before working with data, a solid understanding of Python fundamentals is essential. This topic covers variables, data types, control flow, functions, and data structures — the exact building blocks that every data science library is built on top of.

Variables and Data Types

A variable stores a value in memory. Python automatically detects the type of value without any declaration.

# Variables store different kinds of information
student_name = "Alice"        # String – text data
student_age  = 22             # Integer – whole number
gpa          = 3.85           # Float – decimal number
is_enrolled  = True           # Boolean – True or False

print(student_name, student_age, gpa, is_enrolled)

Output:

Alice 22 3.85 True

Common Data Types in Python

Type	Example	Used For
int	42	Counting, IDs, rankings
float	3.14	Measurements, prices, percentages
str	"Delhi"	Names, categories, text
bool	True / False	Conditions, flags, filters
NoneType	None	Missing or empty values

Lists – Ordered Collections

A list holds multiple values in a single variable. Lists maintain insertion order and allow duplicate values.

# A list of monthly sales figures (in thousands)
monthly_sales = [45, 52, 38, 67, 71, 60, 55]

print("First month:", monthly_sales[0])   # Index starts at 0
print("Last month:", monthly_sales[-1])   # Negative index from end
print("First 3 months:", monthly_sales[:3])

# Add a new month
monthly_sales.append(63)
print("Updated list:", monthly_sales)

Output:

First month: 45
Last month: 55
First 3 months: [45, 52, 38]
Updated list: [45, 52, 38, 67, 71, 60, 55, 63]

Diagram – List Indexing

monthly_sales = [45, 52, 38, 67, 71]

Index (forward):  0    1    2    3    4
Index (backward): -5  -4   -3   -2   -1
Values:          [45,  52,  38,  67,  71]

Tuples – Immutable Collections

A tuple works like a list but its values cannot change after creation. Tuples are useful for storing fixed data like coordinates or configuration settings.

# City location – values should never change
city_location = ("New Delhi", 28.6139, 77.2090)

city, latitude, longitude = city_location  # Unpacking
print(f"City: {city}")
print(f"Latitude: {latitude}, Longitude: {longitude}")

Output:

City: New Delhi
Latitude: 28.6139, Longitude: 77.209

Dictionaries – Key-Value Storage

A dictionary stores data as key-value pairs. This structure is widely used in data science for representing a single record or mapping labels to values.

# A dictionary representing one product record
product = {
    "id": 101,
    "name": "Wireless Headphones",
    "price": 1499.99,
    "in_stock": True,
    "category": "Electronics"
}

# Accessing values
print("Product:", product["name"])
print("Price: ₹", product["price"])

# Adding a new key
product["rating"] = 4.5
print("Rating:", product["rating"])

Output:

Product: Wireless Headphones
Price: ₹ 1499.99
Rating: 4.5

Sets – Unique Value Collections

A set stores only unique values. Duplicate entries get removed automatically. Sets are fast for membership checks.

# Customer cities – remove duplicates automatically
cities = {"Mumbai", "Delhi", "Bengaluru", "Mumbai", "Delhi", "Pune"}
print("Unique cities:", cities)

# Check membership
print("Chennai in list?", "Chennai" in cities)

Output:

Unique cities: {'Pune', 'Mumbai', 'Delhi', 'Bengaluru'}
Chennai in list? False

Control Flow – if / elif / else

Control flow lets code make decisions based on conditions. Data science uses this frequently for categorising values or applying rules.

# Classify a student's grade
score = 78

if score >= 90:
    grade = "A"
elif score >= 75:
    grade = "B"
elif score >= 60:
    grade = "C"
else:
    grade = "F"

print(f"Score: {score} → Grade: {grade}")

Output:

Score: 78 → Grade: B

Loops – for and while

Loops repeat a block of code multiple times. Data science uses loops to process lists, rows in a dataset, or results from a model.

For Loop

temperatures = [22, 31, 27, 19, 35, 28]

# Find days hotter than 30°C
hot_days = 0
for temp in temperatures:
    if temp > 30:
        hot_days += 1

print(f"Days above 30°C: {hot_days}")

Output:

Days above 30°C: 2

While Loop

# Keep doubling an investment until it exceeds 10000
amount = 1000
years = 0

while amount < 10000:
    amount = amount * 1.10   # 10% annual growth
    years += 1

print(f"Reaches ₹10,000 in {years} years")
print(f"Final amount: ₹{amount:.2f}")

Output:

Reaches ₹10,000 in 25 years
Final amount: ₹10834.71

Functions – Reusable Code Blocks

A function packages a set of instructions under a name. Calling that name runs the instructions. Functions eliminate repetition and make code easier to maintain.

# Function to calculate Body Mass Index (BMI)
def calculate_bmi(weight_kg, height_m):
    bmi = weight_kg / (height_m ** 2)
    
    if bmi < 18.5:
        category = "Underweight"
    elif bmi < 25:
        category = "Normal"
    elif bmi < 30:
        category = "Overweight"
    else:
        category = "Obese"
    
    return round(bmi, 2), category

# Call the function
bmi_value, bmi_label = calculate_bmi(70, 1.75)
print(f"BMI: {bmi_value} → {bmi_label}")

Output:

BMI: 22.86 → Normal

Diagram – Function Flow

Input Arguments
      |
      v
+---------------------+
|  def calculate_bmi  |
|  weight_kg, height_m|
|  ------------------|
|  bmi = weight/h^2  |
|  classify bmi      |
+---------------------+
      |
      v
Return Values: (22.86, "Normal")

List Comprehensions – Compact Loops

List comprehension creates a new list from an existing one using a single line of code. This pattern appears constantly in data science code.

# Standard loop – convert Celsius to Fahrenheit
celsius = [0, 20, 37, 100]
fahrenheit = []
for c in celsius:
    fahrenheit.append((c * 9/5) + 32)

# Same result using list comprehension
fahrenheit = [(c * 9/5) + 32 for c in celsius]

print("Celsius:   ", celsius)
print("Fahrenheit:", fahrenheit)

Output:

Celsius:    [0, 20, 37, 100]
Fahrenheit: [32.0, 68.0, 98.6, 212.0]

String Operations

Text data is common in data science — product names, customer reviews, city names. Python provides built-in methods to clean and process strings.

review = "  The product Quality is EXCELLENT!  "

# Common string methods
print(review.strip())          # Remove leading/trailing spaces
print(review.strip().lower())  # Convert to lowercase
print(review.strip().upper())  # Convert to uppercase

# Check and replace
print("excellent" in review.lower())
print(review.replace("EXCELLENT", "GOOD"))

Output:

The product Quality is EXCELLENT!
the product quality is excellent!
THE PRODUCT QUALITY IS EXCELLENT!
True
  The product Quality is GOOD!

f-Strings – Formatted Output

f-strings embed variable values directly inside a string. They produce cleaner and more readable output than older formatting methods.

name     = "Priya"
score    = 94.5
rank     = 1

print(f"Student: {name}")
print(f"Score:   {score:.1f}%")
print(f"Rank:    {rank}")
print(f"Result:  {'Pass' if score >= 50 else 'Fail'}")

Output:

Student: Priya
Score:   94.5%
Rank:    1
Result:  Pass

Python Data Structures – Quick Comparison

+------------+----------+----------+-----------+------------+
| Structure  | Ordered  | Mutable  | Duplicates| Syntax     |
+------------+----------+----------+-----------+------------+
| List       |   Yes    |   Yes    |    Yes    | [1, 2, 3]  |
| Tuple      |   Yes    |   No     |    Yes    | (1, 2, 3)  |
| Set        |   No     |   Yes    |    No     | {1, 2, 3}  |
| Dictionary |   Yes    |   Yes    |  Keys: No | {"a": 1}   |
+------------+----------+----------+-----------+------------+

Summary

Variables store values; Python detects the type automatically
Lists, tuples, dictionaries, and sets each serve a different purpose in organising data
Control flow with if/elif/else applies logic to data values
Loops process collections of data efficiently
Functions package reusable logic and accept input to return output
List comprehensions write compact, readable transformations in one line
String methods clean and prepare text data for analysis

Previous lesson

Back to course

Next lesson

Python Basics for Data Science

Variables and Data Types

Common Data Types in Python

Lists – Ordered Collections

Diagram – List Indexing

Tuples – Immutable Collections

Dictionaries – Key-Value Storage

Sets – Unique Value Collections

Control Flow – if / elif / else

Loops – for and while

For Loop

While Loop

Functions – Reusable Code Blocks

Diagram – Function Flow

List Comprehensions – Compact Loops

String Operations

f-Strings – Formatted Output

Python Data Structures – Quick Comparison

Summary

Leave a Comment Cancel reply