SQL Normalization
Normalization is the process of organizing a database into a clean, structured design that eliminates redundant (repeated) data and ensures that data is stored logically. The goal is to build a database where each piece of information is stored in only one place, and every table has a clear, single purpose.
Think of normalization like organising a wardrobe. Instead of dumping everything in one pile, clothes are separated into drawers and sections — shirts in one place, trousers in another, accessories separately. Everything has a home, nothing is duplicated, and finding things becomes easy.
Why Normalize a Database?
- Reduces redundancy — The same data is not stored in multiple places.
- Prevents update anomalies — Changing one piece of data does not require updating it in dozens of rows.
- Prevents insertion anomalies — A record cannot be added only when other data is available.
- Prevents deletion anomalies — Deleting one record does not accidentally remove other unrelated information.
- Improves data integrity — Consistent, reliable data.
The Three Anomalies (Problems Normalization Solves)
The Problem Table (Unnormalised)
| StudentID | StudentName | CourseName | TeacherName | City |
|---|---|---|---|---|
| 1 | Ravi Sharma | Mathematics | Prof. Singh | Delhi |
| 1 | Ravi Sharma | Physics | Prof. Gupta | Delhi |
| 2 | Priya Mehta | Mathematics | Prof. Singh | Mumbai |
- Update anomaly: If Prof. Singh's name changes, it must be updated in every row where Mathematics is listed.
- Insertion anomaly: A new course cannot be added without also adding a student.
- Deletion anomaly: If Priya's record is deleted, information about the Mathematics + Prof. Singh relationship is partially lost.
Normal Forms
Normalization is achieved in stages called Normal Forms (NF). Each level builds upon the previous one. The most commonly applied levels in practice are 1NF, 2NF, and 3NF.
First Normal Form (1NF)
A table is in First Normal Form when:
- Every column contains atomic (indivisible) values — no lists or multiple values in a single cell.
- Every column contains values of a single type.
- Each row is unique (there is a primary key).
Violates 1NF (Multiple values in one cell)
| StudentID | StudentName | Courses |
|---|---|---|
| 1 | Ravi Sharma | Maths, Physics |
| 2 | Priya Mehta | Maths |
The Courses column contains multiple values in one cell — this violates 1NF.
Converted to 1NF (One value per cell)
| StudentID | StudentName | Course |
|---|---|---|
| 1 | Ravi Sharma | Maths |
| 1 | Ravi Sharma | Physics |
| 2 | Priya Mehta | Maths |
Each cell now has exactly one value. A composite primary key (StudentID + Course) uniquely identifies each row.
Second Normal Form (2NF)
A table is in Second Normal Form when:
- It is already in 1NF.
- Every non-key column is fully dependent on the entire primary key — not just part of it.
This applies only when the table has a composite primary key. If a non-key column depends on only part of the composite key (called a partial dependency), it violates 2NF.
Violates 2NF
Assume the composite primary key is (StudentID, CourseID).
| StudentID | CourseID | StudentName | CourseName | Grade |
|---|---|---|---|---|
| 1 | 101 | Ravi Sharma | Mathematics | A |
| 1 | 102 | Ravi Sharma | Physics | B |
| 2 | 101 | Priya Mehta | Mathematics | A+ |
Partial dependencies:
- StudentName depends only on StudentID (not on CourseID).
- CourseName depends only on CourseID (not on StudentID).
- Only Grade depends on the full composite key (StudentID + CourseID).
Converted to 2NF (Split into separate tables)
Students table:
| StudentID (PK) | StudentName |
|---|---|
| 1 | Ravi Sharma |
| 2 | Priya Mehta |
Courses table:
| CourseID (PK) | CourseName |
|---|---|
| 101 | Mathematics |
| 102 | Physics |
Enrollments table (only fully dependent data):
| StudentID (FK) | CourseID (FK) | Grade |
|---|---|---|
| 1 | 101 | A |
| 1 | 102 | B |
| 2 | 101 | A+ |
Third Normal Form (3NF)
A table is in Third Normal Form when:
- It is already in 2NF.
- There are no transitive dependencies — a non-key column should not depend on another non-key column.
A transitive dependency exists when Column C depends on Column B, which depends on Column A (the primary key). Column C should be in a separate table.
Violates 3NF (Transitive Dependency)
| StudentID (PK) | StudentName | ZipCode | City |
|---|---|---|---|
| 1 | Ravi Sharma | 110001 | Delhi |
| 2 | Priya Mehta | 400001 | Mumbai |
| 3 | Arjun Nair | 110001 | Delhi |
Here, City depends on ZipCode (not directly on StudentID). ZipCode → City is a transitive dependency. Delhi is stored twice because ZipCode 110001 is the same — if the city name changes, it must be updated in every row with that zip code.
Converted to 3NF
Students table:
| StudentID (PK) | StudentName | ZipCode (FK) |
|---|---|---|
| 1 | Ravi Sharma | 110001 |
| 2 | Priya Mehta | 400001 |
| 3 | Arjun Nair | 110001 |
ZipCodes table:
| ZipCode (PK) | City |
|---|---|
| 110001 | Delhi |
| 400001 | Mumbai |
Now, city is stored only once per zip code. If a city name changes, only one row in the ZipCodes table needs to be updated.
Summary of Normal Forms
| Normal Form | Rule | Problem It Solves |
|---|---|---|
| 1NF | Atomic values only; each cell has one value; has a primary key | Multi-valued cells, repeating groups |
| 2NF | In 1NF + no partial dependencies (non-key columns depend on the full key) | Redundancy caused by partial composite key dependencies |
| 3NF | In 2NF + no transitive dependencies (non-key columns depend only on the primary key) | Redundancy caused by non-key columns depending on other non-key columns |
De-normalization
In some cases — especially for read-heavy applications and reporting databases — de-normalization is intentionally applied. This means deliberately adding some redundancy back into the database to reduce the number of JOINs needed and improve read performance. De-normalization is a deliberate design trade-off, not a mistake.
Key Points to Remember
- Normalization organises data to reduce redundancy and improve data integrity.
- 1NF — each cell holds one atomic value; primary key must exist.
- 2NF — eliminate partial dependencies (when a non-key column depends on only part of a composite key).
- 3NF — eliminate transitive dependencies (non-key columns depending on other non-key columns).
- Each normal form includes all the rules of the forms before it.
- For most business applications, achieving 3NF is sufficient.
Summary
Normalization is a disciplined approach to database design that eliminates data redundancy and ensures each table has a single, well-defined purpose. By applying 1NF, 2NF, and 3NF rules, a poorly structured database is transformed into one where data changes are clean, consistent, and easy to maintain. It is one of the most important skills for anyone involved in designing or working with relational databases.
