Python Basics for Data Science
Python is the backbone of every data science workflow. Before working with data, a solid understanding of Python fundamentals is essential. This topic covers variables, data types, control flow, functions, and data structures — the exact building blocks that every data science library is built on top of.
Variables and Data Types
A variable stores a value in memory. Python automatically detects the type of value without any declaration.
# Variables store different kinds of information student_name = "Alice" # String – text data student_age = 22 # Integer – whole number gpa = 3.85 # Float – decimal number is_enrolled = True # Boolean – True or False print(student_name, student_age, gpa, is_enrolled)
Output:
Alice 22 3.85 True
Common Data Types in Python
| Type | Example | Used For |
|---|---|---|
| int | 42 | Counting, IDs, rankings |
| float | 3.14 | Measurements, prices, percentages |
| str | "Delhi" | Names, categories, text |
| bool | True / False | Conditions, flags, filters |
| NoneType | None | Missing or empty values |
Lists – Ordered Collections
A list holds multiple values in a single variable. Lists maintain insertion order and allow duplicate values.
# A list of monthly sales figures (in thousands)
monthly_sales = [45, 52, 38, 67, 71, 60, 55]
print("First month:", monthly_sales[0]) # Index starts at 0
print("Last month:", monthly_sales[-1]) # Negative index from end
print("First 3 months:", monthly_sales[:3])
# Add a new month
monthly_sales.append(63)
print("Updated list:", monthly_sales)
Output:
First month: 45 Last month: 55 First 3 months: [45, 52, 38] Updated list: [45, 52, 38, 67, 71, 60, 55, 63]
Diagram – List Indexing
monthly_sales = [45, 52, 38, 67, 71] Index (forward): 0 1 2 3 4 Index (backward): -5 -4 -3 -2 -1 Values: [45, 52, 38, 67, 71]
Tuples – Immutable Collections
A tuple works like a list but its values cannot change after creation. Tuples are useful for storing fixed data like coordinates or configuration settings.
# City location – values should never change
city_location = ("New Delhi", 28.6139, 77.2090)
city, latitude, longitude = city_location # Unpacking
print(f"City: {city}")
print(f"Latitude: {latitude}, Longitude: {longitude}")
Output:
City: New Delhi Latitude: 28.6139, Longitude: 77.209
Dictionaries – Key-Value Storage
A dictionary stores data as key-value pairs. This structure is widely used in data science for representing a single record or mapping labels to values.
# A dictionary representing one product record
product = {
"id": 101,
"name": "Wireless Headphones",
"price": 1499.99,
"in_stock": True,
"category": "Electronics"
}
# Accessing values
print("Product:", product["name"])
print("Price: ₹", product["price"])
# Adding a new key
product["rating"] = 4.5
print("Rating:", product["rating"])
Output:
Product: Wireless Headphones Price: ₹ 1499.99 Rating: 4.5
Sets – Unique Value Collections
A set stores only unique values. Duplicate entries get removed automatically. Sets are fast for membership checks.
# Customer cities – remove duplicates automatically
cities = {"Mumbai", "Delhi", "Bengaluru", "Mumbai", "Delhi", "Pune"}
print("Unique cities:", cities)
# Check membership
print("Chennai in list?", "Chennai" in cities)
Output:
Unique cities: {'Pune', 'Mumbai', 'Delhi', 'Bengaluru'}
Chennai in list? False
Control Flow – if / elif / else
Control flow lets code make decisions based on conditions. Data science uses this frequently for categorising values or applying rules.
# Classify a student's grade
score = 78
if score >= 90:
grade = "A"
elif score >= 75:
grade = "B"
elif score >= 60:
grade = "C"
else:
grade = "F"
print(f"Score: {score} → Grade: {grade}")
Output:
Score: 78 → Grade: B
Loops – for and while
Loops repeat a block of code multiple times. Data science uses loops to process lists, rows in a dataset, or results from a model.
For Loop
temperatures = [22, 31, 27, 19, 35, 28]
# Find days hotter than 30°C
hot_days = 0
for temp in temperatures:
if temp > 30:
hot_days += 1
print(f"Days above 30°C: {hot_days}")
Output:
Days above 30°C: 2
While Loop
# Keep doubling an investment until it exceeds 10000
amount = 1000
years = 0
while amount < 10000:
amount = amount * 1.10 # 10% annual growth
years += 1
print(f"Reaches ₹10,000 in {years} years")
print(f"Final amount: ₹{amount:.2f}")
Output:
Reaches ₹10,000 in 25 years Final amount: ₹10834.71
Functions – Reusable Code Blocks
A function packages a set of instructions under a name. Calling that name runs the instructions. Functions eliminate repetition and make code easier to maintain.
# Function to calculate Body Mass Index (BMI)
def calculate_bmi(weight_kg, height_m):
bmi = weight_kg / (height_m ** 2)
if bmi < 18.5:
category = "Underweight"
elif bmi < 25:
category = "Normal"
elif bmi < 30:
category = "Overweight"
else:
category = "Obese"
return round(bmi, 2), category
# Call the function
bmi_value, bmi_label = calculate_bmi(70, 1.75)
print(f"BMI: {bmi_value} → {bmi_label}")
Output:
BMI: 22.86 → Normal
Diagram – Function Flow
Input Arguments
|
v
+---------------------+
| def calculate_bmi |
| weight_kg, height_m|
| ------------------|
| bmi = weight/h^2 |
| classify bmi |
+---------------------+
|
v
Return Values: (22.86, "Normal")
List Comprehensions – Compact Loops
List comprehension creates a new list from an existing one using a single line of code. This pattern appears constantly in data science code.
# Standard loop – convert Celsius to Fahrenheit
celsius = [0, 20, 37, 100]
fahrenheit = []
for c in celsius:
fahrenheit.append((c * 9/5) + 32)
# Same result using list comprehension
fahrenheit = [(c * 9/5) + 32 for c in celsius]
print("Celsius: ", celsius)
print("Fahrenheit:", fahrenheit)
Output:
Celsius: [0, 20, 37, 100] Fahrenheit: [32.0, 68.0, 98.6, 212.0]
String Operations
Text data is common in data science — product names, customer reviews, city names. Python provides built-in methods to clean and process strings.
review = " The product Quality is EXCELLENT! "
# Common string methods
print(review.strip()) # Remove leading/trailing spaces
print(review.strip().lower()) # Convert to lowercase
print(review.strip().upper()) # Convert to uppercase
# Check and replace
print("excellent" in review.lower())
print(review.replace("EXCELLENT", "GOOD"))
Output:
The product Quality is EXCELLENT! the product quality is excellent! THE PRODUCT QUALITY IS EXCELLENT! True The product Quality is GOOD!
f-Strings – Formatted Output
f-strings embed variable values directly inside a string. They produce cleaner and more readable output than older formatting methods.
name = "Priya"
score = 94.5
rank = 1
print(f"Student: {name}")
print(f"Score: {score:.1f}%")
print(f"Rank: {rank}")
print(f"Result: {'Pass' if score >= 50 else 'Fail'}")
Output:
Student: Priya Score: 94.5% Rank: 1 Result: Pass
Python Data Structures – Quick Comparison
+------------+----------+----------+-----------+------------+
| Structure | Ordered | Mutable | Duplicates| Syntax |
+------------+----------+----------+-----------+------------+
| List | Yes | Yes | Yes | [1, 2, 3] |
| Tuple | Yes | No | Yes | (1, 2, 3) |
| Set | No | Yes | No | {1, 2, 3} |
| Dictionary | Yes | Yes | Keys: No | {"a": 1} |
+------------+----------+----------+-----------+------------+
Summary
- Variables store values; Python detects the type automatically
- Lists, tuples, dictionaries, and sets each serve a different purpose in organising data
- Control flow with if/elif/else applies logic to data values
- Loops process collections of data efficiently
- Functions package reusable logic and accept input to return output
- List comprehensions write compact, readable transformations in one line
- String methods clean and prepare text data for analysis
