DS NumPy for Numerical Computing
NumPy (Numerical Python) is the foundation of numerical computing in Python. Every major data science library — Pandas, Scikit-learn, TensorFlow — is built on top of NumPy. Understanding NumPy arrays, operations, and functions makes every other data science topic easier to learn.
Why NumPy Instead of Python Lists
Python lists can store numbers, but they are slow and limited when performing mathematical operations on large datasets. NumPy arrays solve this problem by storing data in a compact, memory-efficient format and performing operations on entire arrays at once — without writing explicit loops.
import numpy as np # Speed comparison python_list = list(range(1_000_000)) numpy_array = np.arange(1_000_000) # Python list – needs a loop to square each element squared_list = [x**2 for x in python_list] # NumPy array – squares all elements in one command squared_array = numpy_array ** 2 # NumPy is ~50x faster on large datasets
Diagram – List vs Array Memory Layout
Python List: +------+ +---+ +---+ +---+ +---+ | refs | -> | 1 | | 2 | | 3 | | 4 | +------+ +---+ +---+ +---+ +---+ Each element stored at a different memory location NumPy Array: +---+---+---+---+ | 1 | 2 | 3 | 4 | +---+---+---+---+ All elements stored in one continuous memory block → Faster access, less memory usage
Creating NumPy Arrays
From a Python List
import numpy as np
# 1D array (like a single column of numbers)
scores = np.array([88, 92, 75, 95, 60])
print("Array:", scores)
print("Type:", type(scores))
print("Data type:", scores.dtype)
Output:
Array: [88 92 75 95 60] Type: <class 'numpy.ndarray'> Data type: int64
Built-in Array Creators
# Zeros array – useful for initialising result containers
zeros = np.zeros(5)
print("Zeros:", zeros) # [0. 0. 0. 0. 0.]
# Ones array – useful for creating masks
ones = np.ones((2, 3))
print("Ones:\n", ones) # 2x3 matrix of 1s
# Range of values – like Python range() but faster
rng = np.arange(0, 20, 5)
print("Range:", rng) # [ 0 5 10 15]
# Evenly spaced values between two numbers
spaced = np.linspace(0, 1, 5)
print("Linspace:", spaced) # [0. 0.25 0.5 0.75 1. ]
# Random values between 0 and 1
np.random.seed(42)
rand = np.random.rand(4)
print("Random:", rand)
Output:
Zeros: [0. 0. 0. 0. 0.] Ones: [[1. 1. 1.] [1. 1. 1.]] Range: [ 0 5 10 15] Linspace: [0. 0.25 0.5 0.75 1. ] Random: [0.374 0.951 0.732 0.599]
Array Shape and Dimensions
NumPy arrays can have one dimension (like a list), two dimensions (like a table), or more. The shape property describes the structure.
# 1D Array – a single row of values
array_1d = np.array([10, 20, 30, 40])
print("Shape:", array_1d.shape) # (4,)
# 2D Array – rows and columns (like a table)
array_2d = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print("Shape:", array_2d.shape) # (3, 3)
print("Rows:", array_2d.shape[0]) # 3
print("Cols:", array_2d.shape[1]) # 3
Diagram – 1D, 2D, and 3D Arrays
1D Array (shape: 4,): [10, 20, 30, 40] 2D Array (shape: 3x3) – like a data table: +---+---+---+ | 1 | 2 | 3 | ← Row 0 +---+---+---+ | 4 | 5 | 6 | ← Row 1 +---+---+---+ | 7 | 8 | 9 | ← Row 2 +---+---+---+ Col0 Col1 Col2 3D Array (shape: 2x3x3) – like two tables stacked: [[[1,2,3], [[[7,8,9], [4,5,6], + [1,2,3], [7,8,9]]] [4,5,6]]]
Indexing and Slicing
Indexing selects individual elements. Slicing selects a range of elements. Both work on any dimension.
marks = np.array([55, 70, 88, 92, 64, 78])
# Single element
print("3rd student:", marks[2]) # 88
# Slice: first 3 students
print("First 3:", marks[:3]) # [55 70 88]
# Slice: last 2 students
print("Last 2:", marks[-2:]) # [64 78]
# 2D array indexing
exam_results = np.array([
[80, 90, 70], # Student A
[60, 55, 75], # Student B
[95, 88, 92] # Student C
])
# Row 0, Column 2 (Student A, Subject 3)
print("Student A, Subject 3:", exam_results[0, 2]) # 70
# All students, Subject 2 (column 1)
print("All Subject 2 scores:", exam_results[:, 1]) # [90 55 88]
# Students B and C only
print("Student B & C:\n", exam_results[1:3, :])
Boolean Masking – Filter Without Loops
Boolean masking applies a condition to an entire array and returns only the elements that meet the condition. This is one of the most used techniques in data science.
temperatures = np.array([22, 35, 18, 40, 27, 33, 15, 38])
# Find days hotter than 30°C
hot_mask = temperatures > 30
print("Mask:", hot_mask)
# [False True False True False True False True]
hot_days = temperatures[hot_mask]
print("Hot days:", hot_days) # [35 40 33 38]
# Shortcut: combine in one line
print("Days below 20°C:", temperatures[temperatures < 20])
Array Operations – Vectorised Arithmetic
NumPy performs arithmetic on entire arrays without loops. This is called vectorised computing and runs much faster than Python loops.
product_prices = np.array([100, 250, 399, 50, 175])
discount_rate = 0.15 # 15% discount
# Apply discount to all products at once
discounted = product_prices * (1 - discount_rate)
print("Original:", product_prices)
print("Discounted:", discounted)
# Tax calculation (18% GST)
tax = product_prices * 0.18
final = product_prices + tax
print("After GST:", final)
Output:
Original: [100 250 399 50 175] Discounted: [ 85. 212.5 339.15 42.5 148.75] After GST: [118. 295. 470.82 59. 206.5 ]
Statistical Functions
NumPy provides a complete set of statistical functions that work on arrays in one call.
exam_scores = np.array([72, 85, 91, 60, 78, 95, 55, 88, 74, 82])
print("Count :", len(exam_scores))
print("Mean :", np.mean(exam_scores)) # Average
print("Median :", np.median(exam_scores)) # Middle value
print("Std Dev:", np.std(exam_scores)) # Spread
print("Min :", np.min(exam_scores)) # Lowest
print("Max :", np.max(exam_scores)) # Highest
print("Sum :", np.sum(exam_scores)) # Total
print("25th % :", np.percentile(exam_scores, 25))
print("75th % :", np.percentile(exam_scores, 75))
Output:
Count : 10 Mean : 78.0 Median : 80.0 Std Dev: 12.74 Min : 55 Max : 95 Sum : 780 25th % : 73.5 75th % : 86.25
Reshaping Arrays
Reshape changes the dimensions of an array without changing its data. Machine learning models often require data in a specific shape.
# 12 data points as a flat 1D array
data = np.arange(1, 13)
print("Original shape:", data.shape) # (12,)
print(data) # [ 1 2 3 ... 12]
# Reshape to 3 rows, 4 columns (like a table)
table = data.reshape(3, 4)
print("\nReshaped (3x4):\n", table)
# Reshape to 4 rows, 3 columns
table2 = data.reshape(4, 3)
print("\nReshaped (4x3):\n", table2)
Output:
Original shape: (12,) [ 1 2 3 4 5 6 7 8 9 10 11 12] Reshaped (3x4): [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] Reshaped (4x3): [[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]]
Broadcasting – Operations on Different Shapes
Broadcasting allows NumPy to perform operations between arrays of different shapes by automatically expanding the smaller array to match the larger one.
# Sales data: 3 products x 4 weeks
sales = np.array([
[100, 120, 90, 110], # Product A
[200, 180, 220, 195], # Product B
[50, 60, 55, 70] # Product C
])
# Apply different weekly targets to each week
weekly_target = np.array([150, 150, 150, 150])
# Subtract target from each row (broadcasting handles the shape)
difference = sales - weekly_target
print("Above/Below Target:\n", difference)
Diagram – Broadcasting Rules
Array A shape: (3, 4) Array B shape: (4,) +----+----+----+----+ +---+---+---+---+ |100 |120 | 90 |110 | - |150|150|150|150| |200 |180 |220 |195 | +---+---+---+---+ | 50 | 60 | 55 | 70 | (expands to 3x4 automatically) +----+----+----+----+ Result: +-----+-----+-----+-----+ | -50 | -30 | -60 | -40 | | 50 | 30 | 70 | 45 | |-100 | -90 | -95 | -80 | +-----+-----+-----+-----+
Saving and Loading Arrays
# Save array to disk
data = np.array([10, 20, 30, 40, 50])
np.save("my_data.npy", data)
# Load it back
loaded = np.load("my_data.npy")
print("Loaded:", loaded)
# Save as text (CSV-style)
np.savetxt("data.csv", data, delimiter=",")
Key NumPy Functions – Quick Reference
| Function | Purpose |
|---|---|
| np.array() | Create an array from a list |
| np.zeros() / np.ones() | Create arrays filled with 0s or 1s |
| np.arange() | Create evenly spaced range of values |
| np.linspace() | Create N evenly spaced values between two numbers |
| np.reshape() | Change array dimensions |
| np.mean() / np.median() | Calculate average or middle value |
| np.std() / np.var() | Calculate spread of values |
| np.min() / np.max() | Find smallest or largest value |
| np.sum() | Add all elements |
| np.sort() | Sort elements |
| np.unique() | Get unique values |
| np.where() | Apply conditions to select values |
Summary
- NumPy arrays store numbers in a compact memory block — much faster than Python lists
- Arrays support 1D, 2D, and higher dimensions for different data structures
- Indexing and slicing select specific elements or ranges without loops
- Boolean masking filters data based on conditions in one line of code
- Vectorised operations perform arithmetic on entire arrays simultaneously
- Statistical functions like mean, median, and std run on full arrays in one call
- Broadcasting allows operations between arrays of compatible but different shapes
