DS Time Series Analysis
Time Series Analysis deals with data collected at regular intervals over time — daily stock prices, monthly sales figures, hourly temperature readings, or annual rainfall totals. Unlike cross-sectional data, time series data has a natural order that carries crucial information. This topic covers time series components, decomposition, stationarity, ARIMA modelling, and forecasting using Python.
What Is a Time Series
A time series is a sequence of data points indexed in time order. The order matters — removing or rearranging data points changes the meaning. The goal of time series analysis is to understand patterns in past data and use them to forecast future values.
Real-World Time Series Examples
| Domain | Time Series | Frequency |
|---|---|---|
| Finance | Stock price, exchange rate | Minute / Daily |
| Retail | Product sales, website traffic | Daily / Weekly |
| Energy | Electricity consumption, solar generation | Hourly |
| Weather | Temperature, rainfall, humidity | Hourly / Daily |
| Healthcare | ICU patient heart rate, disease case counts | Seconds / Daily |
Components of a Time Series
Every time series is composed of four underlying components that together explain the observed data.
Diagram – Four Components of Time Series
Observed Data = Trend + Seasonality + Cyclicality + Noise
1. Trend: Long-term upward or downward movement
╱─────────────────────────────────╱
╱ (steady climb)
2. Seasonality: Regular pattern that repeats at a fixed interval
∧ ∧ ∧ ∧ ∧ ∧ ∧
╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲
╱ V V V V V V V
(sales spike every December)
3. Cyclicality: Long, irregular waves – NOT fixed period
╭──────╮ ╭──────╮
──╯ ╰────────╯ ╰──
(business cycles – recession / growth)
4. Noise: Random fluctuations that cannot be explained
↑ ↓ ↑ ↓ ↑ ↓ ↑ ↓ (random jitter)
Additive Model: Y = Trend + Season + Cycle + Noise
Multiplicative Model: Y = Trend × Season × Cycle × Noise
(Use multiplicative when seasonal amplitude grows with trend)
Setting Up – Creating a Time Series in Pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Generate 3 years of daily sales data
np.random.seed(42)
n = 365 * 3 # 1095 days
date_range = pd.date_range(start="2021-01-01", periods=n, freq="D")
# Trend: gradually increasing sales
trend = np.linspace(200, 400, n)
# Seasonality: weekly and annual cycles
weekly_season = 30 * np.sin(2 * np.pi * np.arange(n) / 7)
annual_season = 80 * np.sin(2 * np.pi * np.arange(n) / 365 - np.pi/2)
# Noise: random fluctuations
noise = np.random.normal(0, 20, n)
# Combine all components
sales = trend + weekly_season + annual_season + noise
# Create a Pandas Series with datetime index
ts = pd.Series(sales, index=date_range, name="Daily_Sales")
print("Time Series Info:")
print(f" Start : {ts.index[0].date()}")
print(f" End : {ts.index[-1].date()}")
print(f" Points: {len(ts)}")
print(f" Mean : {ts.mean():.2f}")
print(f" Std : {ts.std():.2f}")
print("\nFirst 5 observations:")
print(ts.head())
Visualising the Time Series
plt.figure(figsize=(12, 5))
ts.plot(color="steelblue", linewidth=0.8)
plt.title("Daily Sales – 3 Years (2021–2023)")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.tight_layout()
plt.savefig("time_series_raw.png")
plt.show()
Resampling – Aggregating to Different Time Frequencies
Resampling converts data from one time frequency to another. Downsampling reduces frequency (daily → monthly), while upsampling increases it.
# Monthly average sales
monthly_avg = ts.resample("M").mean()
# Weekly total sales
weekly_total = ts.resample("W").sum()
# Quarterly maximum
quarterly_max = ts.resample("Q").max()
fig, axes = plt.subplots(3, 1, figsize=(12, 10))
ts.plot(ax=axes[0], color="steelblue", linewidth=0.6, title="Daily Sales")
monthly_avg.plot(ax=axes[1], color="tomato", marker="o", title="Monthly Average Sales")
quarterly_max.plot(ax=axes[2], color="green", marker="s", title="Quarterly Peak Sales")
for ax in axes:
ax.set_xlabel("Date")
ax.set_ylabel("Sales")
plt.tight_layout()
plt.savefig("resampled_ts.png")
plt.show()
Rolling Statistics – Smoothing the Series
Rolling statistics compute a metric (mean, standard deviation) over a moving window of the most recent N data points. A 30-day rolling mean smooths out daily noise and makes the underlying trend visible.
# 30-day rolling mean and standard deviation
rolling_mean = ts.rolling(window=30).mean()
rolling_std = ts.rolling(window=30).std()
plt.figure(figsize=(12, 5))
plt.plot(ts.index, ts, color="steelblue", alpha=0.4, linewidth=0.8, label="Daily Sales")
plt.plot(ts.index, rolling_mean, color="tomato", linewidth=2, label="30-Day Rolling Mean")
plt.fill_between(ts.index,
rolling_mean - 2*rolling_std,
rolling_mean + 2*rolling_std,
color="tomato", alpha=0.15, label="±2 Std Dev Band")
plt.title("Daily Sales with 30-Day Rolling Mean")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plt.savefig("rolling_stats.png")
plt.show()
Time Series Decomposition
Decomposition separates a time series into its four components — trend, seasonality, cyclicality, and noise. Visualising each component separately makes patterns and anomalies easier to detect and understand.
from statsmodels.tsa.seasonal import seasonal_decompose
# Decompose into components (additive model, period=365 days = 1 year)
decomposition = seasonal_decompose(ts, model="additive", period=365)
fig, axes = plt.subplots(4, 1, figsize=(12, 10))
decomposition.observed.plot(ax=axes[0], title="Observed (Original)")
decomposition.trend.plot(ax=axes[1], title="Trend")
decomposition.seasonal.plot(ax=axes[2], title="Seasonality")
decomposition.resid.plot(ax=axes[3], title="Residuals (Noise)")
for ax in axes:
ax.set_xlabel("")
plt.suptitle("Time Series Decomposition", fontsize=14)
plt.tight_layout()
plt.savefig("ts_decomposition.png")
plt.show()
Diagram – What Decomposition Reveals
After decomposition:
Observed: ╭╮╭╮╭╮╭╮╭╮╭╮ (noisy, hard to read)
Trend: ────────────╱ (clear upward movement)
Seasonality: ╭╮╭╮╭╮╭╮╭╮╭╮ (pure repeating cycle)
Residuals: ↑↓↑↓↑↓↑↓↑↓↑↓ (only noise remains)
(should look random — if a pattern remains,
the model is missing something)
Stationarity – A Required Property for ARIMA
A stationary time series has constant mean, constant variance, and no seasonal pattern over time. Most time series models — especially ARIMA — require stationarity. A non-stationary series must be transformed before modelling.
Diagram – Stationary vs Non-Stationary
Non-Stationary (Trend + Seasonality):
╱─────────────────────────── (mean changes over time)
╱ ↗ ↗ ↗
╱ ↗ ↗
╱
Stationary (no trend, constant variance):
─────────────────────────────────── (constant mean)
↑ ↓ ↑ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↑ ↓ ↑
(fluctuates around constant mean)
Augmented Dickey-Fuller (ADF) Test
The ADF test checks whether a time series is stationary. A p-value below 0.05 means the series is stationary. A p-value above 0.05 means differencing or transformation is required.
from statsmodels.tsa.stattools import adfuller
def adf_test(series, name="Series"):
result = adfuller(series.dropna())
print(f"\nADF Test – {name}")
print(f" ADF Statistic : {result[0]:.4f}")
print(f" P-value : {result[1]:.4f}")
print(f" Critical (5%) : {result[4]['5%']:.4f}")
if result[1] < 0.05:
print(" Result: STATIONARY (p < 0.05)")
else:
print(" Result: NON-STATIONARY (p ≥ 0.05) → Differencing needed")
# Test original series
adf_test(ts, "Original Sales")
# Apply first-order differencing
ts_diff = ts.diff().dropna()
adf_test(ts_diff, "First-Differenced Sales")
Output:
ADF Test – Original Sales ADF Statistic : -1.8321 P-value : 0.3614 Critical (5%) : -2.8645 Result: NON-STATIONARY (p ≥ 0.05) → Differencing needed ADF Test – First-Differenced Sales ADF Statistic : -16.4823 P-value : 0.0000 Critical (5%) : -2.8645 Result: STATIONARY (p < 0.05)
ARIMA Modelling – Forecast the Future
ARIMA (AutoRegressive Integrated Moving Average) is the most widely used statistical model for time series forecasting. It captures autocorrelations — the relationship between a value and its own past values — to project the series forward.
ARIMA Parameters Explained
ARIMA(p, d, q)
p = AutoRegressive order
→ How many past values (lags) to use as predictors
→ AR(1): y_t = φ × y_(t-1) + noise
→ High p: recent past has long influence
d = Integrated order (Differencing)
→ How many times to difference the series to make it stationary
→ d=0: series already stationary
→ d=1: apply first-order differencing once
→ d=2: difference the already-differenced series
q = Moving Average order
→ How many past forecast errors to include
→ MA(1): y_t = θ × error_(t-1) + noise
→ High q: smooth out recent errors
Example: ARIMA(1, 1, 1)
→ Use 1 past value + 1 difference + 1 past error
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Use monthly data for cleaner modelling
monthly = ts.resample("M").mean()
# Split: train on first 80% of months
split = int(len(monthly) * 0.8)
train = monthly[:split]
test = monthly[split:]
print(f"Training months : {len(train)}")
print(f"Testing months : {len(test)}")
# Fit ARIMA(1,1,1)
model = ARIMA(train, order=(1, 1, 1))
result = model.fit()
print(result.summary())
# Forecast over test period
forecast = result.forecast(steps=len(test))
# Evaluate
mae = mean_absolute_error(test, forecast)
rmse = np.sqrt(mean_squared_error(test, forecast))
print(f"\nForecast Evaluation:")
print(f" MAE : {mae:.2f}")
print(f" RMSE : {rmse:.2f}")
Visualising ARIMA Forecasts
plt.figure(figsize=(12, 5))
train.plot(label="Training Data", color="steelblue")
test.plot(label="Actual (Test)", color="green")
forecast.plot(label="ARIMA Forecast", color="tomato", linestyle="--")
plt.title("ARIMA Forecast vs Actual Monthly Sales")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.tight_layout()
plt.savefig("arima_forecast.png")
plt.show()
Choosing ARIMA Parameters – ACF and PACF Plots
ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots help select the p and q parameters for ARIMA models by showing how correlated the series is with its own past values at each lag.
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(train.diff().dropna(), lags=20, ax=axes[0], title="ACF (Autocorrelation)\n→ Helps choose q")
plot_pacf(train.diff().dropna(), lags=20, ax=axes[1], title="PACF (Partial Autocorrelation)\n→ Helps choose p")
plt.tight_layout()
plt.savefig("acf_pacf.png")
plt.show()
Reading ACF and PACF
ACF Plot (choose q): If the ACF cuts off sharply after lag k → use q = k for MA term PACF Plot (choose p): If the PACF cuts off sharply after lag k → use p = k for AR term Blue shaded region = 95% confidence interval Bars INSIDE the band → Not statistically significant Bars OUTSIDE the band → Significant lag, should be included in model Example: PACF has a significant spike at lag 1, then drops into the band → AR(1) is appropriate → p = 1 ACF has a significant spike at lag 1, then drops → MA(1) is appropriate → q = 1 Combined → ARIMA(1, 1, 1)
Auto-ARIMA – Automatic Parameter Selection
# Install: pip install pmdarima
import pmdarima as pm
auto_model = pm.auto_arima(
train,
seasonal=True,
m=12, # Monthly seasonality (12 months per cycle)
stepwise=True,
trace=True, # Print all models tested
error_action="ignore",
suppress_warnings=True
)
print("\nBest Model:", auto_model)
print("Best Parameters (p,d,q):", auto_model.order)
# Forecast
auto_forecast = auto_model.predict(n_periods=len(test))
auto_mae = mean_absolute_error(test, auto_forecast)
print(f"Auto-ARIMA MAE: {auto_mae:.2f}")
Forecasting with Prophet (Facebook)
Prophet is a forecasting library from Meta (Facebook) designed for business time series with strong seasonality and occasional holidays. It requires minimal parameter tuning and handles missing values gracefully.
# Install: pip install prophet
from prophet import Prophet
# Prophet requires columns named 'ds' (date) and 'y' (value)
df_prophet = pd.DataFrame({
"ds": monthly.index,
"y": monthly.values
}).reset_index(drop=True)
# Split
train_p = df_prophet[:-len(test)]
test_p = df_prophet[-len(test):]
# Fit
model_p = Prophet(
yearly_seasonality=True,
weekly_seasonality=False,
daily_seasonality=False,
seasonality_mode="additive"
)
model_p.fit(train_p)
# Forecast
future = model_p.make_future_dataframe(periods=len(test), freq="M")
forecast_p = model_p.predict(future)
# Plot
fig1 = model_p.plot(forecast_p)
plt.title("Prophet Forecast")
plt.savefig("prophet_forecast.png")
plt.show()
# Component plot
fig2 = model_p.plot_components(forecast_p)
plt.savefig("prophet_components.png")
plt.show()
ARIMA vs Prophet Comparison
| Aspect | ARIMA | Prophet |
|---|---|---|
| Parameter selection | Manual (ACF/PACF) or Auto-ARIMA | Automatic — minimal configuration |
| Handles seasonality | SARIMA variant required | Built-in, multiple seasonalities |
| Handles holidays | Manual adjustment required | Built-in holiday support |
| Missing values | Must be imputed before modelling | Handles automatically |
| Interpretability | Strong (statistical foundation) | Moderate (component plots) |
| Best for | Stationary series, short-term forecasts | Business data, long-term forecasts |
Forecasting Evaluation Metrics
| Metric | Formula | Interpretation |
|---|---|---|
| MAE (Mean Absolute Error) | Mean of |actual − forecast| | Average error in original units (₹, units, etc.) |
| RMSE (Root Mean Squared Error) | √(Mean of (actual − forecast)²) | Penalises large errors more than MAE |
| MAPE (Mean Absolute Percentage Error) | Mean of |actual − forecast| / actual × 100 | Error as a percentage — easy to interpret across scales |
# Calculate all three metrics
actual = test.values
predicted = forecast.values
mae = mean_absolute_error(actual, predicted)
rmse = np.sqrt(mean_squared_error(actual, predicted))
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
print("Forecast Evaluation Metrics:")
print(f" MAE : {mae:.2f} units")
print(f" RMSE : {rmse:.2f} units")
print(f" MAPE : {mape:.2f}%")
Time Series Best Practices
- Always plot the data first – Visual inspection reveals trends, seasonality, and outliers before any modelling
- Test for stationarity – Run the ADF test and difference the series if needed before applying ARIMA
- Split chronologically – Never shuffle time series data; always split by time (first 80% = train, last 20% = test)
- Match the forecasting horizon – A model trained on monthly data forecasts months; it cannot reliably forecast individual days
- Validate with walk-forward testing – Re-train the model as each new observation arrives to simulate real forecasting conditions
- Inspect residuals – Residuals from a good model should look like white noise (no pattern remaining)
Summary
- A time series is data indexed in time order — the order of observations carries critical information
- Every time series contains four components: trend, seasonality, cyclicality, and noise
- Decomposition separates these components for individual analysis and pattern detection
- Stationarity is required for ARIMA — use the ADF test and differencing to achieve it
- ARIMA(p,d,q) models past values (AR), differencing (I), and past errors (MA) to forecast future values
- ACF and PACF plots guide the selection of p and q parameters for ARIMA
- Auto-ARIMA automates parameter selection; Prophet handles business seasonality and holidays automatically
- Evaluate forecasts using MAE, RMSE, and MAPE — always on a hold-out test period that the model never saw
