DS Data Visualization with Matplotlib

Data visualization converts numbers into visual stories. A chart communicates a trend or pattern in seconds — something a table of numbers cannot. Python's Matplotlib and Seaborn libraries together cover every type of chart needed in data science, from simple bar charts to complex statistical visualisations.

Matplotlib – The Foundation

Matplotlib is the base visualization library in Python. Every other library, including Seaborn, builds on top of it. Matplotlib gives complete control over every element of a chart.

Anatomy of a Matplotlib Figure

+----------------------------------------------+
|                   Figure                     |
|  +----------------------------------------+ |
|  |                Axes (Plot Area)         | |
|  |   Title                                | |
|  |   Y-axis label |                       | |
|  |                |    ●                  | |
|  |                |  ●   ●                | |
|  |                | ●      ●              | |
|  |                +──────────────────     | |
|  |                     X-axis label       | |
|  +----------------------------------------+ |
|   Legend: ● Data Points                     |
+----------------------------------------------+

Figure  = the entire canvas (can hold multiple plots)
Axes    = one individual plot inside the figure
Axis    = the X or Y number line
Title   = descriptive name of the chart
Labels  = descriptions of X and Y axes
Legend  = key explaining colours / markers

Setting Up

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Set a consistent style for all plots
plt.rcParams["figure.figsize"] = (9, 5)
plt.rcParams["axes.grid"] = True
sns.set_theme(style="whitegrid", palette="Set2")

Line Chart – Trends Over Time

Line charts show how a value changes over a continuous sequence — most commonly over time.

months  = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
           "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
revenue = [42, 47, 55, 52, 60, 58, 65, 70, 68, 74, 78, 83]
costs   = [30, 32, 36, 35, 40, 38, 42, 45, 44, 47, 49, 52]

plt.figure(figsize=(10, 5))
plt.plot(months, revenue, marker="o", label="Revenue", color="steelblue", linewidth=2)
plt.plot(months, costs,   marker="s", label="Costs",   color="tomato",    linewidth=2)

plt.title("Monthly Revenue vs Costs (₹ Lakhs)")
plt.xlabel("Month")
plt.ylabel("Amount (₹ Lakhs)")
plt.legend()
plt.tight_layout()
plt.savefig("line_chart.png")
plt.show()

When to Use a Line Chart

Use Line Chart WhenAvoid Line Chart When
Data has a time componentComparing unrelated categories
Showing trends, rises, or fallsData is not continuous
Comparing two series over the same periodToo many overlapping lines (use small multiples)

Bar Chart – Comparing Categories

Bar charts compare discrete categories against a numeric value. They clearly show which category is largest or smallest.

departments = ["IT", "Finance", "HR", "Marketing", "Sales"]
avg_salary  = [85000, 78000, 52000, 61000, 67000]

# Vertical bar chart
plt.figure(figsize=(8, 5))
bars = plt.bar(departments, avg_salary, color=["#2196F3","#4CAF50","#FF5722","#9C27B0","#FF9800"], edgecolor="black")

# Add value labels on top of each bar
for bar in bars:
    plt.text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + 500,
        f"₹{bar.get_height():,.0f}",
        ha="center", fontsize=9
    )

plt.title("Average Salary by Department")
plt.xlabel("Department")
plt.ylabel("Average Salary (₹)")
plt.tight_layout()
plt.savefig("bar_chart.png")
plt.show()

Horizontal Bar Chart – Better for Long Labels

cities  = ["Mumbai", "Delhi", "Bengaluru", "Hyderabad", "Chennai", "Pune"]
startups = [1250, 980, 1100, 650, 580, 420]

plt.figure(figsize=(8, 5))
plt.barh(cities, startups, color="steelblue", edgecolor="black")
plt.title("Number of Startups by City")
plt.xlabel("Number of Startups")
plt.tight_layout()
plt.savefig("horizontal_bar.png")
plt.show()

Histogram – Distribution of a Single Column

A histogram groups values into buckets (called bins) and shows how many values fall into each bucket. It reveals the shape, spread, and skew of a distribution.

np.random.seed(10)
ages = np.random.normal(loc=35, scale=8, size=500)

plt.figure(figsize=(8, 4))
plt.hist(ages, bins=25, color="steelblue", edgecolor="white")
plt.axvline(ages.mean(),   color="red",    linestyle="--", label=f"Mean: {ages.mean():.1f}")
plt.axvline(np.median(ages), color="green", linestyle="--", label=f"Median: {np.median(ages):.1f}")

plt.title("Age Distribution of Employees")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.savefig("histogram.png")
plt.show()

Scatter Plot – Relationship Between Two Variables

Scatter plots place one variable on the X-axis and another on the Y-axis. Each point represents one observation. The pattern of points reveals whether a relationship exists between the two variables.

np.random.seed(5)
experience = np.random.randint(0, 20, 100)
salary     = experience * 3000 + np.random.normal(40000, 8000, 100)

plt.figure(figsize=(8, 5))
plt.scatter(experience, salary, alpha=0.6, color="teal", edgecolors="black", s=60)
plt.title("Experience vs Salary")
plt.xlabel("Years of Experience")
plt.ylabel("Annual Salary (₹)")
plt.tight_layout()
plt.savefig("scatter_plot.png")
plt.show()

Scatter Plot Pattern Guide

Positive Correlation:      Negative Correlation:    No Correlation:
●                           ●●●                        ● ●  ●
  ●  ●                        ●   ●                   ●     ●
    ●  ●                        ●  ●                ●    ●
      ●  ●                        ●  ●             ●  ●    ●
         ●●                          ●●           ●  ●  ●
As X increases,            As X increases,         No pattern visible
Y also increases           Y decreases

Pie Chart – Proportion of a Whole

categories = ["Electronics", "Clothing", "Food", "Books", "Other"]
sales_pct  = [35, 25, 20, 12, 8]

plt.figure(figsize=(7, 6))
plt.pie(
    sales_pct,
    labels=categories,
    autopct="%1.1f%%",
    startangle=140,
    colors=["#2196F3","#4CAF50","#FF5722","#9C27B0","#FF9800"]
)
plt.title("Sales by Product Category")
plt.tight_layout()
plt.savefig("pie_chart.png")
plt.show()

Subplots – Multiple Charts in One Figure

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Chart 1: Histogram
axes[0].hist(ages, bins=20, color="steelblue", edgecolor="white")
axes[0].set_title("Age Distribution")
axes[0].set_xlabel("Age")

# Chart 2: Bar chart
axes[1].bar(departments, avg_salary, color="coral", edgecolor="black")
axes[1].set_title("Avg Salary by Dept")
axes[1].set_xlabel("Department")
axes[1].tick_params(axis="x", rotation=30)

# Chart 3: Scatter plot
axes[2].scatter(experience, salary, alpha=0.5, color="teal")
axes[2].set_title("Experience vs Salary")
axes[2].set_xlabel("Experience (Years)")

plt.tight_layout()
plt.savefig("subplots.png")
plt.show()

Seaborn – Statistical Visualisations

Seaborn builds on Matplotlib and adds statistical chart types that would take many lines of Matplotlib code to produce. It integrates directly with Pandas DataFrames.

Box Plot – Distribution and Outliers

np.random.seed(42)
df = pd.DataFrame({
    "Department": np.random.choice(["IT","HR","Finance","Marketing"], 200),
    "Salary": np.random.normal(65000, 15000, 200).round(0)
})

plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x="Department", y="Salary", palette="Set2")
plt.title("Salary Distribution by Department")
plt.xlabel("Department")
plt.ylabel("Salary (₹)")
plt.tight_layout()
plt.savefig("boxplot_seaborn.png")
plt.show()

Violin Plot – Distribution Shape + Box Plot Combined

plt.figure(figsize=(8, 5))
sns.violinplot(data=df, x="Department", y="Salary", palette="pastel")
plt.title("Salary Distribution (Violin Plot)")
plt.tight_layout()
plt.savefig("violin_plot.png")
plt.show()

Heatmap – Correlation Matrix

numeric_df = pd.DataFrame({
    "Age":        np.random.randint(22, 58, 100),
    "Salary":     np.random.normal(65000, 15000, 100),
    "Experience": np.random.randint(0, 30, 100),
    "Rating":     np.random.randint(1, 6, 100)
})

corr = numeric_df.corr()

plt.figure(figsize=(6, 5))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", center=0, linewidths=0.5)
plt.title("Feature Correlation Heatmap")
plt.tight_layout()
plt.savefig("heatmap.png")
plt.show()

Count Plot – Frequency of Categories

plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="Department", palette="Set3", edgecolor="black")
plt.title("Employee Count by Department")
plt.xlabel("Department")
plt.ylabel("Count")
plt.tight_layout()
plt.savefig("countplot.png")
plt.show()

Pair Plot – All Numeric Columns

sns.pairplot(numeric_df, diag_kind="kde", corner=True)
plt.suptitle("Pair Plot – All Numeric Features", y=1.02)
plt.savefig("pairplot.png")
plt.show()

Chart Selection Guide

QuestionBest Chart
How does a value change over time?Line chart
Which category is largest?Bar chart
What is the distribution of a numeric column?Histogram
Is there a relationship between two numeric columns?Scatter plot
What proportion does each category contribute?Pie chart (max 5 slices)
How does a numeric column compare across categories?Box plot
Are two columns correlated?Heatmap
All pairwise relationships in one view?Pair plot
Count of each category?Count plot

Chart Formatting Best Practices

  • Always add a title – The reader should understand the chart without reading surrounding text
  • Label axes – Always include units (₹, %, years) in axis labels
  • Use consistent colours – Avoid rainbow palettes; use a two-colour scheme for comparisons
  • Remove chart junk – Avoid 3D charts, heavy gridlines, and decorative elements that add no information
  • Use tight_layout() – Prevents labels from overlapping
  • Save in high resolution – Use plt.savefig("name.png", dpi=150) for reports

Summary

  • Matplotlib provides full control over every chart element and is the base for all Python visualizations
  • Seaborn adds statistical chart types with cleaner default aesthetics
  • Line charts track trends over time; bar charts compare categories
  • Histograms reveal the shape and spread of a numeric column's distribution
  • Scatter plots expose relationships between two numeric variables
  • Box plots show distribution, median, quartiles, and outliers simultaneously
  • Heatmaps display correlations across all numeric columns at once
  • Choosing the right chart type for the question produces clearer, more actionable insights

Leave a Comment