DS Time Series Analysis

Time Series Analysis deals with data collected at regular intervals over time — daily stock prices, monthly sales figures, hourly temperature readings, or annual rainfall totals. Unlike cross-sectional data, time series data has a natural order that carries crucial information. This topic covers time series components, decomposition, stationarity, ARIMA modelling, and forecasting using Python.

What Is a Time Series

A time series is a sequence of data points indexed in time order. The order matters — removing or rearranging data points changes the meaning. The goal of time series analysis is to understand patterns in past data and use them to forecast future values.

Real-World Time Series Examples

DomainTime SeriesFrequency
FinanceStock price, exchange rateMinute / Daily
RetailProduct sales, website trafficDaily / Weekly
EnergyElectricity consumption, solar generationHourly
WeatherTemperature, rainfall, humidityHourly / Daily
HealthcareICU patient heart rate, disease case countsSeconds / Daily

Components of a Time Series

Every time series is composed of four underlying components that together explain the observed data.

Diagram – Four Components of Time Series

Observed Data = Trend + Seasonality + Cyclicality + Noise

1. Trend:          Long-term upward or downward movement
                   ╱─────────────────────────────────╱
                  ╱                                   (steady climb)

2. Seasonality:    Regular pattern that repeats at a fixed interval
                   ∧   ∧   ∧   ∧   ∧   ∧   ∧
                  ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲
                 ╱   V   V   V   V   V   V   V
                  (sales spike every December)

3. Cyclicality:    Long, irregular waves – NOT fixed period
                   ╭──────╮         ╭──────╮
                 ──╯       ╰────────╯       ╰──
                   (business cycles – recession / growth)

4. Noise:          Random fluctuations that cannot be explained
                   ↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ (random jitter)

Additive Model:     Y = Trend + Season + Cycle + Noise
Multiplicative Model: Y = Trend × Season × Cycle × Noise
(Use multiplicative when seasonal amplitude grows with trend)

Setting Up – Creating a Time Series in Pandas

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate 3 years of daily sales data
np.random.seed(42)
n = 365 * 3   # 1095 days

date_range = pd.date_range(start="2021-01-01", periods=n, freq="D")

# Trend: gradually increasing sales
trend = np.linspace(200, 400, n)

# Seasonality: weekly and annual cycles
weekly_season   = 30 * np.sin(2 * np.pi * np.arange(n) / 7)
annual_season   = 80 * np.sin(2 * np.pi * np.arange(n) / 365 - np.pi/2)

# Noise: random fluctuations
noise = np.random.normal(0, 20, n)

# Combine all components
sales = trend + weekly_season + annual_season + noise

# Create a Pandas Series with datetime index
ts = pd.Series(sales, index=date_range, name="Daily_Sales")

print("Time Series Info:")
print(f"  Start : {ts.index[0].date()}")
print(f"  End   : {ts.index[-1].date()}")
print(f"  Points: {len(ts)}")
print(f"  Mean  : {ts.mean():.2f}")
print(f"  Std   : {ts.std():.2f}")
print("\nFirst 5 observations:")
print(ts.head())

Visualising the Time Series

plt.figure(figsize=(12, 5))
ts.plot(color="steelblue", linewidth=0.8)
plt.title("Daily Sales – 3 Years (2021–2023)")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.tight_layout()
plt.savefig("time_series_raw.png")
plt.show()

Resampling – Aggregating to Different Time Frequencies

Resampling converts data from one time frequency to another. Downsampling reduces frequency (daily → monthly), while upsampling increases it.

# Monthly average sales
monthly_avg = ts.resample("M").mean()

# Weekly total sales
weekly_total = ts.resample("W").sum()

# Quarterly maximum
quarterly_max = ts.resample("Q").max()

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

ts.plot(ax=axes[0], color="steelblue", linewidth=0.6, title="Daily Sales")
monthly_avg.plot(ax=axes[1], color="tomato", marker="o", title="Monthly Average Sales")
quarterly_max.plot(ax=axes[2], color="green", marker="s", title="Quarterly Peak Sales")

for ax in axes:
    ax.set_xlabel("Date")
    ax.set_ylabel("Sales")

plt.tight_layout()
plt.savefig("resampled_ts.png")
plt.show()

Rolling Statistics – Smoothing the Series

Rolling statistics compute a metric (mean, standard deviation) over a moving window of the most recent N data points. A 30-day rolling mean smooths out daily noise and makes the underlying trend visible.

# 30-day rolling mean and standard deviation
rolling_mean = ts.rolling(window=30).mean()
rolling_std  = ts.rolling(window=30).std()

plt.figure(figsize=(12, 5))
plt.plot(ts.index, ts,           color="steelblue", alpha=0.4, linewidth=0.8, label="Daily Sales")
plt.plot(ts.index, rolling_mean, color="tomato",    linewidth=2,              label="30-Day Rolling Mean")
plt.fill_between(ts.index,
                 rolling_mean - 2*rolling_std,
                 rolling_mean + 2*rolling_std,
                 color="tomato", alpha=0.15, label="±2 Std Dev Band")

plt.title("Daily Sales with 30-Day Rolling Mean")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plt.savefig("rolling_stats.png")
plt.show()

Time Series Decomposition

Decomposition separates a time series into its four components — trend, seasonality, cyclicality, and noise. Visualising each component separately makes patterns and anomalies easier to detect and understand.

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose into components (additive model, period=365 days = 1 year)
decomposition = seasonal_decompose(ts, model="additive", period=365)

fig, axes = plt.subplots(4, 1, figsize=(12, 10))

decomposition.observed.plot(ax=axes[0], title="Observed (Original)")
decomposition.trend.plot(ax=axes[1], title="Trend")
decomposition.seasonal.plot(ax=axes[2], title="Seasonality")
decomposition.resid.plot(ax=axes[3], title="Residuals (Noise)")

for ax in axes:
    ax.set_xlabel("")

plt.suptitle("Time Series Decomposition", fontsize=14)
plt.tight_layout()
plt.savefig("ts_decomposition.png")
plt.show()

Diagram – What Decomposition Reveals

After decomposition:

Observed:    ╭╮╭╮╭╮╭╮╭╮╭╮ (noisy, hard to read)

Trend:       ────────────╱  (clear upward movement)

Seasonality: ╭╮╭╮╭╮╭╮╭╮╭╮  (pure repeating cycle)

Residuals:   ↑↓↑↓↑↓↑↓↑↓↑↓  (only noise remains)
             (should look random — if a pattern remains,
              the model is missing something)

Stationarity – A Required Property for ARIMA

A stationary time series has constant mean, constant variance, and no seasonal pattern over time. Most time series models — especially ARIMA — require stationarity. A non-stationary series must be transformed before modelling.

Diagram – Stationary vs Non-Stationary

Non-Stationary (Trend + Seasonality):
     ╱─────────────────────────── (mean changes over time)
    ╱     ↗   ↗   ↗
   ╱  ↗       ↗
  ╱

Stationary (no trend, constant variance):
─────────────────────────────────── (constant mean)
  ↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↑ ↓ ↑
(fluctuates around constant mean)

Augmented Dickey-Fuller (ADF) Test

The ADF test checks whether a time series is stationary. A p-value below 0.05 means the series is stationary. A p-value above 0.05 means differencing or transformation is required.

from statsmodels.tsa.stattools import adfuller

def adf_test(series, name="Series"):
    result = adfuller(series.dropna())
    print(f"\nADF Test – {name}")
    print(f"  ADF Statistic : {result[0]:.4f}")
    print(f"  P-value       : {result[1]:.4f}")
    print(f"  Critical (5%) : {result[4]['5%']:.4f}")
    if result[1] < 0.05:
        print("  Result: STATIONARY (p < 0.05)")
    else:
        print("  Result: NON-STATIONARY (p ≥ 0.05) → Differencing needed")

# Test original series
adf_test(ts, "Original Sales")

# Apply first-order differencing
ts_diff = ts.diff().dropna()
adf_test(ts_diff, "First-Differenced Sales")

Output:

ADF Test – Original Sales
  ADF Statistic : -1.8321
  P-value       : 0.3614
  Critical (5%) : -2.8645
  Result: NON-STATIONARY (p ≥ 0.05) → Differencing needed

ADF Test – First-Differenced Sales
  ADF Statistic : -16.4823
  P-value       : 0.0000
  Critical (5%) : -2.8645
  Result: STATIONARY (p < 0.05)

ARIMA Modelling – Forecast the Future

ARIMA (AutoRegressive Integrated Moving Average) is the most widely used statistical model for time series forecasting. It captures autocorrelations — the relationship between a value and its own past values — to project the series forward.

ARIMA Parameters Explained

ARIMA(p, d, q)

p = AutoRegressive order
    → How many past values (lags) to use as predictors
    → AR(1): y_t = φ × y_(t-1) + noise
    → High p: recent past has long influence

d = Integrated order (Differencing)
    → How many times to difference the series to make it stationary
    → d=0: series already stationary
    → d=1: apply first-order differencing once
    → d=2: difference the already-differenced series

q = Moving Average order
    → How many past forecast errors to include
    → MA(1): y_t = θ × error_(t-1) + noise
    → High q: smooth out recent errors

Example: ARIMA(1, 1, 1)
→ Use 1 past value + 1 difference + 1 past error
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Use monthly data for cleaner modelling
monthly = ts.resample("M").mean()

# Split: train on first 80% of months
split   = int(len(monthly) * 0.8)
train   = monthly[:split]
test    = monthly[split:]

print(f"Training months : {len(train)}")
print(f"Testing months  : {len(test)}")

# Fit ARIMA(1,1,1)
model = ARIMA(train, order=(1, 1, 1))
result = model.fit()

print(result.summary())

# Forecast over test period
forecast = result.forecast(steps=len(test))

# Evaluate
mae  = mean_absolute_error(test, forecast)
rmse = np.sqrt(mean_squared_error(test, forecast))

print(f"\nForecast Evaluation:")
print(f"  MAE  : {mae:.2f}")
print(f"  RMSE : {rmse:.2f}")

Visualising ARIMA Forecasts

plt.figure(figsize=(12, 5))
train.plot(label="Training Data", color="steelblue")
test.plot(label="Actual (Test)", color="green")
forecast.plot(label="ARIMA Forecast", color="tomato", linestyle="--")

plt.title("ARIMA Forecast vs Actual Monthly Sales")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plt.savefig("arima_forecast.png")
plt.show()

Choosing ARIMA Parameters – ACF and PACF Plots

ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots help select the p and q parameters for ARIMA models by showing how correlated the series is with its own past values at each lag.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

plot_acf(train.diff().dropna(), lags=20, ax=axes[0], title="ACF (Autocorrelation)\n→ Helps choose q")
plot_pacf(train.diff().dropna(), lags=20, ax=axes[1], title="PACF (Partial Autocorrelation)\n→ Helps choose p")

plt.tight_layout()
plt.savefig("acf_pacf.png")
plt.show()

Reading ACF and PACF

ACF Plot (choose q):
If the ACF cuts off sharply after lag k → use q = k for MA term

PACF Plot (choose p):
If the PACF cuts off sharply after lag k → use p = k for AR term

Blue shaded region = 95% confidence interval
Bars INSIDE the band → Not statistically significant
Bars OUTSIDE the band → Significant lag, should be included in model

Example:
PACF has a significant spike at lag 1, then drops into the band
→ AR(1) is appropriate → p = 1
ACF has a significant spike at lag 1, then drops
→ MA(1) is appropriate → q = 1
Combined → ARIMA(1, 1, 1)

Auto-ARIMA – Automatic Parameter Selection

# Install: pip install pmdarima
import pmdarima as pm

auto_model = pm.auto_arima(
    train,
    seasonal=True,
    m=12,                  # Monthly seasonality (12 months per cycle)
    stepwise=True,
    trace=True,            # Print all models tested
    error_action="ignore",
    suppress_warnings=True
)

print("\nBest Model:", auto_model)
print("Best Parameters (p,d,q):", auto_model.order)

# Forecast
auto_forecast = auto_model.predict(n_periods=len(test))
auto_mae = mean_absolute_error(test, auto_forecast)
print(f"Auto-ARIMA MAE: {auto_mae:.2f}")

Forecasting with Prophet (Facebook)

Prophet is a forecasting library from Meta (Facebook) designed for business time series with strong seasonality and occasional holidays. It requires minimal parameter tuning and handles missing values gracefully.

# Install: pip install prophet
from prophet import Prophet

# Prophet requires columns named 'ds' (date) and 'y' (value)
df_prophet = pd.DataFrame({
    "ds": monthly.index,
    "y":  monthly.values
}).reset_index(drop=True)

# Split
train_p = df_prophet[:-len(test)]
test_p  = df_prophet[-len(test):]

# Fit
model_p = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=False,
    daily_seasonality=False,
    seasonality_mode="additive"
)
model_p.fit(train_p)

# Forecast
future    = model_p.make_future_dataframe(periods=len(test), freq="M")
forecast_p = model_p.predict(future)

# Plot
fig1 = model_p.plot(forecast_p)
plt.title("Prophet Forecast")
plt.savefig("prophet_forecast.png")
plt.show()

# Component plot
fig2 = model_p.plot_components(forecast_p)
plt.savefig("prophet_components.png")
plt.show()

ARIMA vs Prophet Comparison

AspectARIMAProphet
Parameter selectionManual (ACF/PACF) or Auto-ARIMAAutomatic — minimal configuration
Handles seasonalitySARIMA variant requiredBuilt-in, multiple seasonalities
Handles holidaysManual adjustment requiredBuilt-in holiday support
Missing valuesMust be imputed before modellingHandles automatically
InterpretabilityStrong (statistical foundation)Moderate (component plots)
Best forStationary series, short-term forecastsBusiness data, long-term forecasts

Forecasting Evaluation Metrics

MetricFormulaInterpretation
MAE (Mean Absolute Error)Mean of |actual − forecast|Average error in original units (₹, units, etc.)
RMSE (Root Mean Squared Error)√(Mean of (actual − forecast)²)Penalises large errors more than MAE
MAPE (Mean Absolute Percentage Error)Mean of |actual − forecast| / actual × 100Error as a percentage — easy to interpret across scales
# Calculate all three metrics
actual = test.values
predicted = forecast.values

mae  = mean_absolute_error(actual, predicted)
rmse = np.sqrt(mean_squared_error(actual, predicted))
mape = np.mean(np.abs((actual - predicted) / actual)) * 100

print("Forecast Evaluation Metrics:")
print(f"  MAE  : {mae:.2f} units")
print(f"  RMSE : {rmse:.2f} units")
print(f"  MAPE : {mape:.2f}%")

Time Series Best Practices

  • Always plot the data first – Visual inspection reveals trends, seasonality, and outliers before any modelling
  • Test for stationarity – Run the ADF test and difference the series if needed before applying ARIMA
  • Split chronologically – Never shuffle time series data; always split by time (first 80% = train, last 20% = test)
  • Match the forecasting horizon – A model trained on monthly data forecasts months; it cannot reliably forecast individual days
  • Validate with walk-forward testing – Re-train the model as each new observation arrives to simulate real forecasting conditions
  • Inspect residuals – Residuals from a good model should look like white noise (no pattern remaining)

Summary

  • A time series is data indexed in time order — the order of observations carries critical information
  • Every time series contains four components: trend, seasonality, cyclicality, and noise
  • Decomposition separates these components for individual analysis and pattern detection
  • Stationarity is required for ARIMA — use the ADF test and differencing to achieve it
  • ARIMA(p,d,q) models past values (AR), differencing (I), and past errors (MA) to forecast future values
  • ACF and PACF plots guide the selection of p and q parameters for ARIMA
  • Auto-ARIMA automates parameter selection; Prophet handles business seasonality and holidays automatically
  • Evaluate forecasts using MAE, RMSE, and MAPE — always on a hold-out test period that the model never saw

Leave a Comment