Database normalization is the process of structuring a database, usually a relational database, in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model.
Wikipedia
It’s basically the steps to design a new database or alter an existing one so that it isn’t prone to the anomalies discussed in a previous lesson. You do this by reducing a complex structure into multiple simple structures that are related to one another.
Normal forms
There are many normal forms. From least normalized to most normalised they are:
- UNF: Unnormalised form
- 1NF: First normal form
- 2NF: Second normal form
- 3NF: Third normal form
- EKNF: Elementary key normal form
- BCNF: Boyce–Codd normal form
- 4NF: Fourth normal form
- ETNF: Essential tuple normal form
- 5NF: Fifth normal form
- DKNF: Domain-key normal form
- 6NF: Sixth normal form
Above 4NF is only really dealt with in theoretical terms rather than practical applications.
When someone describes a database as normalised they are referring to the third form 3NF.
UNF to 1NF
To reach 1NF each attribute must be composed of a single value.


1NF to 2NF
Remember the key is the column or columns that uniquely identify the row. To satisfy the 2NF requirements, every non-key column of a table must rely on a single key to be identified. Imagine that there are other games listed in the table below:

The Price depends on the Platform, so the Game and Platform make a candidate key. But the Release Date and Rating are only dependent on the Game.


2NF to 3NF
To satisfy 3NF you remove any transitive functional dependencies. Transitive functional dependencies are when one attribute gets their value from another one.

The country of development field is dependent on the developer field so can be split into their own table. Now all fields depend solely on their primary key.


Conclusion
The aim of a normalised database is to avoid data duplication. Any change to data should only have to be done to one row. Everything else should update via the data cascading through foreign keys.
- One value per attribute
- Entity uniquely identified by a single attribute (1 column per key)
- No transitive dependencies
Un-normalised data should only be used for either tables with multi-value attributes or big data where the performance impact of indexing the data is too high.