MySQL Normalization
Normalization is the process of organizing a database's tables and columns to reduce data redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining proper relationships between them. Normalization follows a series of rules called Normal Forms.
Why Normalize?
- Eliminates redundancy — The same data is not stored in multiple places.
- Prevents anomalies — Inserting, updating, or deleting data does not create inconsistencies.
- Saves storage — No duplicate data means less disk usage.
- Improves maintainability — Updating one record updates it everywhere it is referenced.
Anomalies in Unnormalized Data
Consider this single flat table:
student_id | student_name | course_name | teacher_name | teacher_phone
-----------+--------------+-------------+--------------+--------------
1 | Riya Sharma | Maths | Mr. Verma | 9876543210
2 | Anil Joshi | Maths | Mr. Verma | 9876543210
3 | Riya Sharma | Science | Ms. Pillai | 9123456789Problems:
- Update anomaly — Changing Mr. Verma's phone requires updating multiple rows.
- Insert anomaly — A new teacher cannot be added without assigning them a student.
- Delete anomaly — Deleting the only student in Science removes Ms. Pillai's record entirely.
First Normal Form (1NF)
A table is in 1NF when:
- Each column contains atomic (indivisible) values — no lists or multiple values in one cell.
- Each column contains values of a single type.
- Each row is unique (has a primary key).
Violation Example
student_id | student_name | courses
-----------+--------------+------------------
1 | Riya Sharma | Maths, Science1NF Compliant
student_id | student_name | course
-----------+--------------+---------
1 | Riya Sharma | Maths
1 | Riya Sharma | ScienceSecond Normal Form (2NF)
A table is in 2NF when it is in 1NF and every non-key column depends on the entire primary key, not just part of it. This applies to tables with composite primary keys.
Violation (Partial Dependency)
-- Primary key: (student_id, course_id)
student_id | course_id | student_name | course_name
-----------+-----------+--------------+------------
1 | 101 | Riya Sharma | Mathsstudent_name depends only on student_id, not on course_id. This is a partial dependency — a 2NF violation.
2NF Solution
-- Students table
student_id | student_name
-----------+-------------
1 | Riya Sharma
-- Courses table
course_id | course_name
----------+------------
101 | Maths
-- Enrollments table
student_id | course_id
-----------+----------
1 | 101Third Normal Form (3NF)
A table is in 3NF when it is in 2NF and every non-key column depends directly on the primary key — not on another non-key column. This removes transitive dependencies.
Violation (Transitive Dependency)
emp_id | emp_name | dept_id | dept_name
-------+-------------+---------+----------
1 | Ravi Kumar | 10 | ITdept_name depends on dept_id, which depends on emp_id. That is a transitive dependency.
3NF Solution
-- Employees table
emp_id | emp_name | dept_id
-------+------------+--------
1 | Ravi Kumar | 10
-- Departments table
dept_id | dept_name
--------+----------
10 | ITSummary of Normal Forms
| Normal Form | Requirement |
|---|---|
| 1NF | Atomic values, no repeating groups, unique rows |
| 2NF | 1NF + no partial dependencies on composite keys |
| 3NF | 2NF + no transitive dependencies |
| BCNF | Stricter version of 3NF for edge cases |
Key Points
- Normalization removes data redundancy and prevents update, insert, and delete anomalies.
- 1NF requires atomic values and a unique primary key per row.
- 2NF removes partial dependencies — all columns must depend on the full primary key.
- 3NF removes transitive dependencies — columns must depend directly on the primary key only.
- Most production databases target 3NF; some reporting systems intentionally denormalize for read performance.
