DS Time Series Analysis

Time Series Analysis deals with data collected at regular intervals over time — daily stock prices, monthly sales figures, hourly temperature readings, or annual rainfall totals. Unlike cross-sectional data, time series data has a natural order that carries crucial information. This topic covers time series components, decomposition, stationarity, ARIMA modelling, and forecasting using Python.

What Is a Time Series

A time series is a sequence of data points indexed in time order. The order matters — removing or rearranging data points changes the meaning. The goal of time series analysis is to understand patterns in past data and use them to forecast future values.

Real-World Time Series Examples

Domain	Time Series	Frequency
Finance	Stock price, exchange rate	Minute / Daily
Retail	Product sales, website traffic	Daily / Weekly
Energy	Electricity consumption, solar generation	Hourly
Weather	Temperature, rainfall, humidity	Hourly / Daily
Healthcare	ICU patient heart rate, disease case counts	Seconds / Daily

Components of a Time Series

Every time series is composed of four underlying components that together explain the observed data.

Diagram – Four Components of Time Series

Observed Data = Trend + Seasonality + Cyclicality + Noise

1. Trend:          Long-term upward or downward movement
                   ╱─────────────────────────────────╱
                  ╱                                   (steady climb)

2. Seasonality:    Regular pattern that repeats at a fixed interval
                   ∧   ∧   ∧   ∧   ∧   ∧   ∧
                  ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲
                 ╱   V   V   V   V   V   V   V
                  (sales spike every December)

3. Cyclicality:    Long, irregular waves – NOT fixed period
                   ╭──────╮         ╭──────╮
                 ──╯       ╰────────╯       ╰──
                   (business cycles – recession / growth)

4. Noise:          Random fluctuations that cannot be explained
                   ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ (random jitter)

Additive Model:     Y = Trend + Season + Cycle + Noise
Multiplicative Model: Y = Trend × Season × Cycle × Noise
(Use multiplicative when seasonal amplitude grows with trend)

Setting Up – Creating a Time Series in Pandas

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate 3 years of daily sales data
np.random.seed(42)
n = 365 * 3   # 1095 days

date_range = pd.date_range(start="2021-01-01", periods=n, freq="D")

# Trend: gradually increasing sales
trend = np.linspace(200, 400, n)

# Seasonality: weekly and annual cycles
weekly_season   = 30 * np.sin(2 * np.pi * np.arange(n) / 7)
annual_season   = 80 * np.sin(2 * np.pi * np.arange(n) / 365 - np.pi/2)

# Noise: random fluctuations
noise = np.random.normal(0, 20, n)

# Combine all components
sales = trend + weekly_season + annual_season + noise

# Create a Pandas Series with datetime index
ts = pd.Series(sales, index=date_range, name="Daily_Sales")

print("Time Series Info:")
print(f"  Start : {ts.index[0].date()}")
print(f"  End   : {ts.index[-1].date()}")
print(f"  Points: {len(ts)}")
print(f"  Mean  : {ts.mean():.2f}")
print(f"  Std   : {ts.std():.2f}")
print("\nFirst 5 observations:")
print(ts.head())

Visualising the Time Series

plt.figure(figsize=(12, 5))
ts.plot(color="steelblue", linewidth=0.8)
plt.title("Daily Sales – 3 Years (2021–2023)")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.tight_layout()
plt.savefig("time_series_raw.png")
plt.show()

Resampling – Aggregating to Different Time Frequencies

Resampling converts data from one time frequency to another. Downsampling reduces frequency (daily → monthly), while upsampling increases it.

# Monthly average sales
monthly_avg = ts.resample("M").mean()

# Weekly total sales
weekly_total = ts.resample("W").sum()

# Quarterly maximum
quarterly_max = ts.resample("Q").max()

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

ts.plot(ax=axes[0], color="steelblue", linewidth=0.6, title="Daily Sales")
monthly_avg.plot(ax=axes[1], color="tomato", marker="o", title="Monthly Average Sales")
quarterly_max.plot(ax=axes[2], color="green", marker="s", title="Quarterly Peak Sales")

for ax in axes:
    ax.set_xlabel("Date")
    ax.set_ylabel("Sales")

plt.tight_layout()
plt.savefig("resampled_ts.png")
plt.show()

Rolling Statistics – Smoothing the Series

Rolling statistics compute a metric (mean, standard deviation) over a moving window of the most recent N data points. A 30-day rolling mean smooths out daily noise and makes the underlying trend visible.

# 30-day rolling mean and standard deviation
rolling_mean = ts.rolling(window=30).mean()
rolling_std  = ts.rolling(window=30).std()

plt.figure(figsize=(12, 5))
plt.plot(ts.index, ts,           color="steelblue", alpha=0.4, linewidth=0.8, label="Daily Sales")
plt.plot(ts.index, rolling_mean, color="tomato",    linewidth=2,              label="30-Day Rolling Mean")
plt.fill_between(ts.index,
                 rolling_mean - 2*rolling_std,
                 rolling_mean + 2*rolling_std,
                 color="tomato", alpha=0.15, label="±2 Std Dev Band")

plt.title("Daily Sales with 30-Day Rolling Mean")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plt.savefig("rolling_stats.png")
plt.show()

Time Series Decomposition

Decomposition separates a time series into its four components — trend, seasonality, cyclicality, and noise. Visualising each component separately makes patterns and anomalies easier to detect and understand.

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose into components (additive model, period=365 days = 1 year)
decomposition = seasonal_decompose(ts, model="additive", period=365)

fig, axes = plt.subplots(4, 1, figsize=(12, 10))

decomposition.observed.plot(ax=axes[0], title="Observed (Original)")
decomposition.trend.plot(ax=axes[1], title="Trend")
decomposition.seasonal.plot(ax=axes[2], title="Seasonality")
decomposition.resid.plot(ax=axes[3], title="Residuals (Noise)")

for ax in axes:
    ax.set_xlabel("")

plt.suptitle("Time Series Decomposition", fontsize=14)
plt.tight_layout()
plt.savefig("ts_decomposition.png")
plt.show()

Diagram – What Decomposition Reveals

After decomposition:

Observed:    ╭╮╭╮╭╮╭╮╭╮╭╮ (noisy, hard to read)

Trend:       ────────────╱  (clear upward movement)

Seasonality: ╭╮╭╮╭╮╭╮╭╮╭╮  (pure repeating cycle)

Residuals:   ↑↓↑↓↑↓↑↓↑↓↑↓  (only noise remains)
             (should look random — if a pattern remains,
              the model is missing something)

Stationarity – A Required Property for ARIMA

A stationary time series has constant mean, constant variance, and no seasonal pattern over time. Most time series models — especially ARIMA — require stationarity. A non-stationary series must be transformed before modelling.

Diagram – Stationary vs Non-Stationary

Non-Stationary (Trend + Seasonality):
     ╱─────────────────────────── (mean changes over time)
    ╱     ↗   ↗   ↗
   ╱  ↗       ↗
  ╱

Stationary (no trend, constant variance):
─────────────────────────────────── (constant mean)
  ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↑ ↓ ↑
(fluctuates around constant mean)

Augmented Dickey-Fuller (ADF) Test

The ADF test checks whether a time series is stationary. A p-value below 0.05 means the series is stationary. A p-value above 0.05 means differencing or transformation is required.

from statsmodels.tsa.stattools import adfuller

def adf_test(series, name="Series"):
    result = adfuller(series.dropna())
    print(f"\nADF Test – {name}")
    print(f"  ADF Statistic : {result[0]:.4f}")
    print(f"  P-value       : {result[1]:.4f}")
    print(f"  Critical (5%) : {result[4]['5%']:.4f}")
    if result[1] < 0.05:
        print("  Result: STATIONARY (p < 0.05)")
    else:
        print("  Result: NON-STATIONARY (p ≥ 0.05) → Differencing needed")

# Test original series
adf_test(ts, "Original Sales")

# Apply first-order differencing
ts_diff = ts.diff().dropna()
adf_test(ts_diff, "First-Differenced Sales")

Output:

ADF Test – Original Sales
  ADF Statistic : -1.8321
  P-value       : 0.3614
  Critical (5%) : -2.8645
  Result: NON-STATIONARY (p ≥ 0.05) → Differencing needed

ADF Test – First-Differenced Sales
  ADF Statistic : -16.4823
  P-value       : 0.0000
  Critical (5%) : -2.8645
  Result: STATIONARY (p < 0.05)

ARIMA Modelling – Forecast the Future

ARIMA (AutoRegressive Integrated Moving Average) is the most widely used statistical model for time series forecasting. It captures autocorrelations — the relationship between a value and its own past values — to project the series forward.

ARIMA Parameters Explained

ARIMA(p, d, q)

p = AutoRegressive order
    → How many past values (lags) to use as predictors
    → AR(1): y_t = φ × y_(t-1) + noise
    → High p: recent past has long influence

d = Integrated order (Differencing)
    → How many times to difference the series to make it stationary
    → d=0: series already stationary
    → d=1: apply first-order differencing once
    → d=2: difference the already-differenced series

q = Moving Average order
    → How many past forecast errors to include
    → MA(1): y_t = θ × error_(t-1) + noise
    → High q: smooth out recent errors

Example: ARIMA(1, 1, 1)
→ Use 1 past value + 1 difference + 1 past error

from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Use monthly data for cleaner modelling
monthly = ts.resample("M").mean()

# Split: train on first 80% of months
split   = int(len(monthly) * 0.8)
train   = monthly[:split]
test    = monthly[split:]

print(f"Training months : {len(train)}")
print(f"Testing months  : {len(test)}")

# Fit ARIMA(1,1,1)
model = ARIMA(train, order=(1, 1, 1))
result = model.fit()

print(result.summary())

# Forecast over test period
forecast = result.forecast(steps=len(test))

# Evaluate
mae  = mean_absolute_error(test, forecast)
rmse = np.sqrt(mean_squared_error(test, forecast))

print(f"\nForecast Evaluation:")
print(f"  MAE  : {mae:.2f}")
print(f"  RMSE : {rmse:.2f}")

Visualising ARIMA Forecasts

plt.figure(figsize=(12, 5))
train.plot(label="Training Data", color="steelblue")
test.plot(label="Actual (Test)", color="green")
forecast.plot(label="ARIMA Forecast", color="tomato", linestyle="--")

plt.title("ARIMA Forecast vs Actual Monthly Sales")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plt.savefig("arima_forecast.png")
plt.show()

Choosing ARIMA Parameters – ACF and PACF Plots

ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots help select the p and q parameters for ARIMA models by showing how correlated the series is with its own past values at each lag.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

plot_acf(train.diff().dropna(), lags=20, ax=axes[0], title="ACF (Autocorrelation)\n→ Helps choose q")
plot_pacf(train.diff().dropna(), lags=20, ax=axes[1], title="PACF (Partial Autocorrelation)\n→ Helps choose p")

plt.tight_layout()
plt.savefig("acf_pacf.png")
plt.show()

Reading ACF and PACF

ACF Plot (choose q):
If the ACF cuts off sharply after lag k → use q = k for MA term

PACF Plot (choose p):
If the PACF cuts off sharply after lag k → use p = k for AR term

Blue shaded region = 95% confidence interval
Bars INSIDE the band → Not statistically significant
Bars OUTSIDE the band → Significant lag, should be included in model

Example:
PACF has a significant spike at lag 1, then drops into the band
→ AR(1) is appropriate → p = 1
ACF has a significant spike at lag 1, then drops
→ MA(1) is appropriate → q = 1
Combined → ARIMA(1, 1, 1)

Auto-ARIMA – Automatic Parameter Selection

# Install: pip install pmdarima
import pmdarima as pm

auto_model = pm.auto_arima(
    train,
    seasonal=True,
    m=12,                  # Monthly seasonality (12 months per cycle)
    stepwise=True,
    trace=True,            # Print all models tested
    error_action="ignore",
    suppress_warnings=True
)

print("\nBest Model:", auto_model)
print("Best Parameters (p,d,q):", auto_model.order)

# Forecast
auto_forecast = auto_model.predict(n_periods=len(test))
auto_mae = mean_absolute_error(test, auto_forecast)
print(f"Auto-ARIMA MAE: {auto_mae:.2f}")

Forecasting with Prophet (Facebook)

Prophet is a forecasting library from Meta (Facebook) designed for business time series with strong seasonality and occasional holidays. It requires minimal parameter tuning and handles missing values gracefully.

# Install: pip install prophet
from prophet import Prophet

# Prophet requires columns named 'ds' (date) and 'y' (value)
df_prophet = pd.DataFrame({
    "ds": monthly.index,
    "y":  monthly.values
}).reset_index(drop=True)

# Split
train_p = df_prophet[:-len(test)]
test_p  = df_prophet[-len(test):]

# Fit
model_p = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=False,
    daily_seasonality=False,
    seasonality_mode="additive"
)
model_p.fit(train_p)

# Forecast
future    = model_p.make_future_dataframe(periods=len(test), freq="M")
forecast_p = model_p.predict(future)

# Plot
fig1 = model_p.plot(forecast_p)
plt.title("Prophet Forecast")
plt.savefig("prophet_forecast.png")
plt.show()

# Component plot
fig2 = model_p.plot_components(forecast_p)
plt.savefig("prophet_components.png")
plt.show()

ARIMA vs Prophet Comparison

Aspect	ARIMA	Prophet
Parameter selection	Manual (ACF/PACF) or Auto-ARIMA	Automatic — minimal configuration
Handles seasonality	SARIMA variant required	Built-in, multiple seasonalities
Handles holidays	Manual adjustment required	Built-in holiday support
Missing values	Must be imputed before modelling	Handles automatically
Interpretability	Strong (statistical foundation)	Moderate (component plots)
Best for	Stationary series, short-term forecasts	Business data, long-term forecasts

Forecasting Evaluation Metrics

Metric	Formula	Interpretation
MAE (Mean Absolute Error)	Mean of \|actual − forecast\|	Average error in original units (₹, units, etc.)
RMSE (Root Mean Squared Error)	√(Mean of (actual − forecast)²)	Penalises large errors more than MAE
MAPE (Mean Absolute Percentage Error)	Mean of \|actual − forecast\| / actual × 100	Error as a percentage — easy to interpret across scales

# Calculate all three metrics
actual = test.values
predicted = forecast.values

mae  = mean_absolute_error(actual, predicted)
rmse = np.sqrt(mean_squared_error(actual, predicted))
mape = np.mean(np.abs((actual - predicted) / actual)) * 100

print("Forecast Evaluation Metrics:")
print(f"  MAE  : {mae:.2f} units")
print(f"  RMSE : {rmse:.2f} units")
print(f"  MAPE : {mape:.2f}%")

Time Series Best Practices

Always plot the data first – Visual inspection reveals trends, seasonality, and outliers before any modelling
Test for stationarity – Run the ADF test and difference the series if needed before applying ARIMA
Split chronologically – Never shuffle time series data; always split by time (first 80% = train, last 20% = test)
Match the forecasting horizon – A model trained on monthly data forecasts months; it cannot reliably forecast individual days
Validate with walk-forward testing – Re-train the model as each new observation arrives to simulate real forecasting conditions
Inspect residuals – Residuals from a good model should look like white noise (no pattern remaining)

Summary

A time series is data indexed in time order — the order of observations carries critical information
Every time series contains four components: trend, seasonality, cyclicality, and noise
Decomposition separates these components for individual analysis and pattern detection
Stationarity is required for ARIMA — use the ADF test and differencing to achieve it
ARIMA(p,d,q) models past values (AR), differencing (I), and past errors (MA) to forecast future values
ACF and PACF plots guide the selection of p and q parameters for ARIMA
Auto-ARIMA automates parameter selection; Prophet handles business seasonality and holidays automatically
Evaluate forecasts using MAE, RMSE, and MAPE — always on a hold-out test period that the model never saw

Previous lessons

Back to courses