DS Data Visualization with Matplotlib
Data visualization converts numbers into visual stories. A chart communicates a trend or pattern in seconds — something a table of numbers cannot. Python's Matplotlib and Seaborn libraries together cover every type of chart needed in data science, from simple bar charts to complex statistical visualisations.
Matplotlib – The Foundation
Matplotlib is the base visualization library in Python. Every other library, including Seaborn, builds on top of it. Matplotlib gives complete control over every element of a chart.
Anatomy of a Matplotlib Figure
+----------------------------------------------+ | Figure | | +----------------------------------------+ | | | Axes (Plot Area) | | | | Title | | | | Y-axis label | | | | | | ● | | | | | ● ● | | | | | ● ● | | | | +────────────────── | | | | X-axis label | | | +----------------------------------------+ | | Legend: ● Data Points | +----------------------------------------------+ Figure = the entire canvas (can hold multiple plots) Axes = one individual plot inside the figure Axis = the X or Y number line Title = descriptive name of the chart Labels = descriptions of X and Y axes Legend = key explaining colours / markers
Setting Up
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import numpy as np # Set a consistent style for all plots plt.rcParams["figure.figsize"] = (9, 5) plt.rcParams["axes.grid"] = True sns.set_theme(style="whitegrid", palette="Set2")
Line Chart – Trends Over Time
Line charts show how a value changes over a continuous sequence — most commonly over time.
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
revenue = [42, 47, 55, 52, 60, 58, 65, 70, 68, 74, 78, 83]
costs = [30, 32, 36, 35, 40, 38, 42, 45, 44, 47, 49, 52]
plt.figure(figsize=(10, 5))
plt.plot(months, revenue, marker="o", label="Revenue", color="steelblue", linewidth=2)
plt.plot(months, costs, marker="s", label="Costs", color="tomato", linewidth=2)
plt.title("Monthly Revenue vs Costs (₹ Lakhs)")
plt.xlabel("Month")
plt.ylabel("Amount (₹ Lakhs)")
plt.legend()
plt.tight_layout()
plt.savefig("line_chart.png")
plt.show()
When to Use a Line Chart
| Use Line Chart When | Avoid Line Chart When |
|---|---|
| Data has a time component | Comparing unrelated categories |
| Showing trends, rises, or falls | Data is not continuous |
| Comparing two series over the same period | Too many overlapping lines (use small multiples) |
Bar Chart – Comparing Categories
Bar charts compare discrete categories against a numeric value. They clearly show which category is largest or smallest.
departments = ["IT", "Finance", "HR", "Marketing", "Sales"]
avg_salary = [85000, 78000, 52000, 61000, 67000]
# Vertical bar chart
plt.figure(figsize=(8, 5))
bars = plt.bar(departments, avg_salary, color=["#2196F3","#4CAF50","#FF5722","#9C27B0","#FF9800"], edgecolor="black")
# Add value labels on top of each bar
for bar in bars:
plt.text(
bar.get_x() + bar.get_width() / 2,
bar.get_height() + 500,
f"₹{bar.get_height():,.0f}",
ha="center", fontsize=9
)
plt.title("Average Salary by Department")
plt.xlabel("Department")
plt.ylabel("Average Salary (₹)")
plt.tight_layout()
plt.savefig("bar_chart.png")
plt.show()
Horizontal Bar Chart – Better for Long Labels
cities = ["Mumbai", "Delhi", "Bengaluru", "Hyderabad", "Chennai", "Pune"]
startups = [1250, 980, 1100, 650, 580, 420]
plt.figure(figsize=(8, 5))
plt.barh(cities, startups, color="steelblue", edgecolor="black")
plt.title("Number of Startups by City")
plt.xlabel("Number of Startups")
plt.tight_layout()
plt.savefig("horizontal_bar.png")
plt.show()
Histogram – Distribution of a Single Column
A histogram groups values into buckets (called bins) and shows how many values fall into each bucket. It reveals the shape, spread, and skew of a distribution.
np.random.seed(10)
ages = np.random.normal(loc=35, scale=8, size=500)
plt.figure(figsize=(8, 4))
plt.hist(ages, bins=25, color="steelblue", edgecolor="white")
plt.axvline(ages.mean(), color="red", linestyle="--", label=f"Mean: {ages.mean():.1f}")
plt.axvline(np.median(ages), color="green", linestyle="--", label=f"Median: {np.median(ages):.1f}")
plt.title("Age Distribution of Employees")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.legend()
plt.tight_layout()
plt.savefig("histogram.png")
plt.show()
Scatter Plot – Relationship Between Two Variables
Scatter plots place one variable on the X-axis and another on the Y-axis. Each point represents one observation. The pattern of points reveals whether a relationship exists between the two variables.
np.random.seed(5)
experience = np.random.randint(0, 20, 100)
salary = experience * 3000 + np.random.normal(40000, 8000, 100)
plt.figure(figsize=(8, 5))
plt.scatter(experience, salary, alpha=0.6, color="teal", edgecolors="black", s=60)
plt.title("Experience vs Salary")
plt.xlabel("Years of Experience")
plt.ylabel("Annual Salary (₹)")
plt.tight_layout()
plt.savefig("scatter_plot.png")
plt.show()
Scatter Plot Pattern Guide
Positive Correlation: Negative Correlation: No Correlation:
● ●●● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●
●● ●● ● ● ●
As X increases, As X increases, No pattern visible
Y also increases Y decreases
Pie Chart – Proportion of a Whole
categories = ["Electronics", "Clothing", "Food", "Books", "Other"]
sales_pct = [35, 25, 20, 12, 8]
plt.figure(figsize=(7, 6))
plt.pie(
sales_pct,
labels=categories,
autopct="%1.1f%%",
startangle=140,
colors=["#2196F3","#4CAF50","#FF5722","#9C27B0","#FF9800"]
)
plt.title("Sales by Product Category")
plt.tight_layout()
plt.savefig("pie_chart.png")
plt.show()
Subplots – Multiple Charts in One Figure
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
# Chart 1: Histogram
axes[0].hist(ages, bins=20, color="steelblue", edgecolor="white")
axes[0].set_title("Age Distribution")
axes[0].set_xlabel("Age")
# Chart 2: Bar chart
axes[1].bar(departments, avg_salary, color="coral", edgecolor="black")
axes[1].set_title("Avg Salary by Dept")
axes[1].set_xlabel("Department")
axes[1].tick_params(axis="x", rotation=30)
# Chart 3: Scatter plot
axes[2].scatter(experience, salary, alpha=0.5, color="teal")
axes[2].set_title("Experience vs Salary")
axes[2].set_xlabel("Experience (Years)")
plt.tight_layout()
plt.savefig("subplots.png")
plt.show()
Seaborn – Statistical Visualisations
Seaborn builds on Matplotlib and adds statistical chart types that would take many lines of Matplotlib code to produce. It integrates directly with Pandas DataFrames.
Box Plot – Distribution and Outliers
np.random.seed(42)
df = pd.DataFrame({
"Department": np.random.choice(["IT","HR","Finance","Marketing"], 200),
"Salary": np.random.normal(65000, 15000, 200).round(0)
})
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x="Department", y="Salary", palette="Set2")
plt.title("Salary Distribution by Department")
plt.xlabel("Department")
plt.ylabel("Salary (₹)")
plt.tight_layout()
plt.savefig("boxplot_seaborn.png")
plt.show()
Violin Plot – Distribution Shape + Box Plot Combined
plt.figure(figsize=(8, 5))
sns.violinplot(data=df, x="Department", y="Salary", palette="pastel")
plt.title("Salary Distribution (Violin Plot)")
plt.tight_layout()
plt.savefig("violin_plot.png")
plt.show()
Heatmap – Correlation Matrix
numeric_df = pd.DataFrame({
"Age": np.random.randint(22, 58, 100),
"Salary": np.random.normal(65000, 15000, 100),
"Experience": np.random.randint(0, 30, 100),
"Rating": np.random.randint(1, 6, 100)
})
corr = numeric_df.corr()
plt.figure(figsize=(6, 5))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", center=0, linewidths=0.5)
plt.title("Feature Correlation Heatmap")
plt.tight_layout()
plt.savefig("heatmap.png")
plt.show()
Count Plot – Frequency of Categories
plt.figure(figsize=(7, 4))
sns.countplot(data=df, x="Department", palette="Set3", edgecolor="black")
plt.title("Employee Count by Department")
plt.xlabel("Department")
plt.ylabel("Count")
plt.tight_layout()
plt.savefig("countplot.png")
plt.show()
Pair Plot – All Numeric Columns
sns.pairplot(numeric_df, diag_kind="kde", corner=True)
plt.suptitle("Pair Plot – All Numeric Features", y=1.02)
plt.savefig("pairplot.png")
plt.show()
Chart Selection Guide
| Question | Best Chart |
|---|---|
| How does a value change over time? | Line chart |
| Which category is largest? | Bar chart |
| What is the distribution of a numeric column? | Histogram |
| Is there a relationship between two numeric columns? | Scatter plot |
| What proportion does each category contribute? | Pie chart (max 5 slices) |
| How does a numeric column compare across categories? | Box plot |
| Are two columns correlated? | Heatmap |
| All pairwise relationships in one view? | Pair plot |
| Count of each category? | Count plot |
Chart Formatting Best Practices
- Always add a title – The reader should understand the chart without reading surrounding text
- Label axes – Always include units (₹, %, years) in axis labels
- Use consistent colours – Avoid rainbow palettes; use a two-colour scheme for comparisons
- Remove chart junk – Avoid 3D charts, heavy gridlines, and decorative elements that add no information
- Use tight_layout() – Prevents labels from overlapping
- Save in high resolution – Use plt.savefig("name.png", dpi=150) for reports
Summary
- Matplotlib provides full control over every chart element and is the base for all Python visualizations
- Seaborn adds statistical chart types with cleaner default aesthetics
- Line charts track trends over time; bar charts compare categories
- Histograms reveal the shape and spread of a numeric column's distribution
- Scatter plots expose relationships between two numeric variables
- Box plots show distribution, median, quartiles, and outliers simultaneously
- Heatmaps display correlations across all numeric columns at once
- Choosing the right chart type for the question produces clearer, more actionable insights
