DS NumPy for Numerical Computing

NumPy (Numerical Python) is the foundation of numerical computing in Python. Every major data science library — Pandas, Scikit-learn, TensorFlow — is built on top of NumPy. Understanding NumPy arrays, operations, and functions makes every other data science topic easier to learn.

Why NumPy Instead of Python Lists

Python lists can store numbers, but they are slow and limited when performing mathematical operations on large datasets. NumPy arrays solve this problem by storing data in a compact, memory-efficient format and performing operations on entire arrays at once — without writing explicit loops.

import numpy as np

# Speed comparison
python_list = list(range(1_000_000))
numpy_array = np.arange(1_000_000)

# Python list – needs a loop to square each element
squared_list = [x**2 for x in python_list]

# NumPy array – squares all elements in one command
squared_array = numpy_array ** 2

# NumPy is ~50x faster on large datasets

Diagram – List vs Array Memory Layout

Python List:
+------+    +---+    +---+    +---+    +---+
| refs | -> | 1 |    | 2 |    | 3 |    | 4 |
+------+    +---+    +---+    +---+    +---+
Each element stored at a different memory location

NumPy Array:
+---+---+---+---+
| 1 | 2 | 3 | 4 |
+---+---+---+---+
All elements stored in one continuous memory block
→ Faster access, less memory usage

Creating NumPy Arrays

From a Python List

import numpy as np

# 1D array (like a single column of numbers)
scores = np.array([88, 92, 75, 95, 60])
print("Array:", scores)
print("Type:", type(scores))
print("Data type:", scores.dtype)

Output:

Array: [88 92 75 95 60]
Type: <class 'numpy.ndarray'>
Data type: int64

Built-in Array Creators

# Zeros array – useful for initialising result containers
zeros = np.zeros(5)
print("Zeros:", zeros)          # [0. 0. 0. 0. 0.]

# Ones array – useful for creating masks
ones = np.ones((2, 3))
print("Ones:\n", ones)          # 2x3 matrix of 1s

# Range of values – like Python range() but faster
rng = np.arange(0, 20, 5)
print("Range:", rng)            # [ 0  5 10 15]

# Evenly spaced values between two numbers
spaced = np.linspace(0, 1, 5)
print("Linspace:", spaced)      # [0.   0.25 0.5  0.75 1.  ]

# Random values between 0 and 1
np.random.seed(42)
rand = np.random.rand(4)
print("Random:", rand)

Output:

Zeros: [0. 0. 0. 0. 0.]
Ones:
 [[1. 1. 1.]
  [1. 1. 1.]]
Range: [ 0  5 10 15]
Linspace: [0.   0.25 0.5  0.75 1.  ]
Random: [0.374 0.951 0.732 0.599]

Array Shape and Dimensions

NumPy arrays can have one dimension (like a list), two dimensions (like a table), or more. The shape property describes the structure.

# 1D Array – a single row of values
array_1d = np.array([10, 20, 30, 40])
print("Shape:", array_1d.shape)    # (4,)

# 2D Array – rows and columns (like a table)
array_2d = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print("Shape:", array_2d.shape)    # (3, 3)
print("Rows:", array_2d.shape[0])  # 3
print("Cols:", array_2d.shape[1])  # 3

Diagram – 1D, 2D, and 3D Arrays

1D Array (shape: 4,):
[10, 20, 30, 40]

2D Array (shape: 3x3) – like a data table:
+---+---+---+
| 1 | 2 | 3 |    ← Row 0
+---+---+---+
| 4 | 5 | 6 |    ← Row 1
+---+---+---+
| 7 | 8 | 9 |    ← Row 2
+---+---+---+
  Col0 Col1 Col2

3D Array (shape: 2x3x3) – like two tables stacked:
[[[1,2,3],        [[[7,8,9],
  [4,5,6],   +      [1,2,3],
  [7,8,9]]]         [4,5,6]]]

Indexing and Slicing

Indexing selects individual elements. Slicing selects a range of elements. Both work on any dimension.

marks = np.array([55, 70, 88, 92, 64, 78])

# Single element
print("3rd student:", marks[2])           # 88

# Slice: first 3 students
print("First 3:", marks[:3])              # [55 70 88]

# Slice: last 2 students
print("Last 2:", marks[-2:])              # [64 78]

# 2D array indexing
exam_results = np.array([
    [80, 90, 70],   # Student A
    [60, 55, 75],   # Student B
    [95, 88, 92]    # Student C
])

# Row 0, Column 2 (Student A, Subject 3)
print("Student A, Subject 3:", exam_results[0, 2])   # 70

# All students, Subject 2 (column 1)
print("All Subject 2 scores:", exam_results[:, 1])   # [90 55 88]

# Students B and C only
print("Student B & C:\n", exam_results[1:3, :])

Boolean Masking – Filter Without Loops

Boolean masking applies a condition to an entire array and returns only the elements that meet the condition. This is one of the most used techniques in data science.

temperatures = np.array([22, 35, 18, 40, 27, 33, 15, 38])

# Find days hotter than 30°C
hot_mask = temperatures > 30
print("Mask:", hot_mask)
# [False  True False  True False  True False  True]

hot_days = temperatures[hot_mask]
print("Hot days:", hot_days)     # [35 40 33 38]

# Shortcut: combine in one line
print("Days below 20°C:", temperatures[temperatures < 20])

Array Operations – Vectorised Arithmetic

NumPy performs arithmetic on entire arrays without loops. This is called vectorised computing and runs much faster than Python loops.

product_prices = np.array([100, 250, 399, 50, 175])
discount_rate  = 0.15   # 15% discount

# Apply discount to all products at once
discounted = product_prices * (1 - discount_rate)
print("Original:", product_prices)
print("Discounted:", discounted)

# Tax calculation (18% GST)
tax       = product_prices * 0.18
final     = product_prices + tax
print("After GST:", final)

Output:

Original:   [100 250 399  50 175]
Discounted: [ 85.  212.5 339.15  42.5 148.75]
After GST:  [118.  295.  470.82  59.  206.5 ]

Statistical Functions

NumPy provides a complete set of statistical functions that work on arrays in one call.

exam_scores = np.array([72, 85, 91, 60, 78, 95, 55, 88, 74, 82])

print("Count  :", len(exam_scores))
print("Mean   :", np.mean(exam_scores))       # Average
print("Median :", np.median(exam_scores))     # Middle value
print("Std Dev:", np.std(exam_scores))        # Spread
print("Min    :", np.min(exam_scores))        # Lowest
print("Max    :", np.max(exam_scores))        # Highest
print("Sum    :", np.sum(exam_scores))        # Total
print("25th % :", np.percentile(exam_scores, 25))
print("75th % :", np.percentile(exam_scores, 75))

Output:

Count  : 10
Mean   : 78.0
Median : 80.0
Std Dev: 12.74
Min    : 55
Max    : 95
Sum    : 780
25th % : 73.5
75th % : 86.25

Reshaping Arrays

Reshape changes the dimensions of an array without changing its data. Machine learning models often require data in a specific shape.

# 12 data points as a flat 1D array
data = np.arange(1, 13)
print("Original shape:", data.shape)   # (12,)
print(data)                             # [ 1  2  3 ... 12]

# Reshape to 3 rows, 4 columns (like a table)
table = data.reshape(3, 4)
print("\nReshaped (3x4):\n", table)

# Reshape to 4 rows, 3 columns
table2 = data.reshape(4, 3)
print("\nReshaped (4x3):\n", table2)

Output:

Original shape: (12,)
[ 1  2  3  4  5  6  7  8  9 10 11 12]

Reshaped (3x4):
 [[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

Reshaped (4x3):
 [[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]
  [10 11 12]]

Broadcasting – Operations on Different Shapes

Broadcasting allows NumPy to perform operations between arrays of different shapes by automatically expanding the smaller array to match the larger one.

# Sales data: 3 products x 4 weeks
sales = np.array([
    [100, 120, 90,  110],   # Product A
    [200, 180, 220, 195],   # Product B
    [50,  60,  55,  70]     # Product C
])

# Apply different weekly targets to each week
weekly_target = np.array([150, 150, 150, 150])

# Subtract target from each row (broadcasting handles the shape)
difference = sales - weekly_target
print("Above/Below Target:\n", difference)

Diagram – Broadcasting Rules

Array A shape: (3, 4)       Array B shape: (4,)
+----+----+----+----+       +---+---+---+---+
|100 |120 | 90 |110 |  -   |150|150|150|150|
|200 |180 |220 |195 |      +---+---+---+---+
| 50 | 60 | 55 | 70 |      (expands to 3x4 automatically)
+----+----+----+----+

Result:
+-----+-----+-----+-----+
| -50 | -30 | -60 | -40 |
|  50 |  30 |  70 |  45 |
|-100 | -90 | -95 | -80 |
+-----+-----+-----+-----+

Saving and Loading Arrays

# Save array to disk
data = np.array([10, 20, 30, 40, 50])
np.save("my_data.npy", data)

# Load it back
loaded = np.load("my_data.npy")
print("Loaded:", loaded)

# Save as text (CSV-style)
np.savetxt("data.csv", data, delimiter=",")

Key NumPy Functions – Quick Reference

FunctionPurpose
np.array()Create an array from a list
np.zeros() / np.ones()Create arrays filled with 0s or 1s
np.arange()Create evenly spaced range of values
np.linspace()Create N evenly spaced values between two numbers
np.reshape()Change array dimensions
np.mean() / np.median()Calculate average or middle value
np.std() / np.var()Calculate spread of values
np.min() / np.max()Find smallest or largest value
np.sum()Add all elements
np.sort()Sort elements
np.unique()Get unique values
np.where()Apply conditions to select values

Summary

  • NumPy arrays store numbers in a compact memory block — much faster than Python lists
  • Arrays support 1D, 2D, and higher dimensions for different data structures
  • Indexing and slicing select specific elements or ranges without loops
  • Boolean masking filters data based on conditions in one line of code
  • Vectorised operations perform arithmetic on entire arrays simultaneously
  • Statistical functions like mean, median, and std run on full arrays in one call
  • Broadcasting allows operations between arrays of compatible but different shapes

Leave a Comment