Database normalisation – FA Software Development

Database normalization is the process of structuring a database, usually a relational database, in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model.
Wikipedia

It’s basically the steps to design a new database or alter an existing one so that it isn’t prone to the anomalies discussed in a previous lesson. You do this by reducing a complex structure into multiple simple structures that are related to one another.

Normal forms

There are many normal forms. From least normalized to most normalised they are:

UNF: Unnormalised form
1NF: First normal form
2NF: Second normal form
3NF: Third normal form
EKNF: Elementary key normal form
BCNF: Boyce–Codd normal form
4NF: Fourth normal form
ETNF: Essential tuple normal form
5NF: Fifth normal form
DKNF: Domain-key normal form
6NF: Sixth normal form

Above 4NF is only really dealt with in theoretical terms rather than practical applications.

When someone describes a database as normalised they are referring to the third form 3NF.

UNF to 1NF

To reach 1NF each attribute must be composed of a single value.

1NF to 2NF

Remember the key is the column or columns that uniquely identify the row. To satisfy the 2NF requirements, every non-key column of a table must rely on a single key to be identified. Imagine that there are other games listed in the table below:

The Price depends on the Platform, so the Game and Platform make a candidate key. But the Release Date and Rating are only dependent on the Game.

2NF to 3NF

To satisfy 3NF you remove any transitive functional dependencies. Transitive functional dependencies are when one attribute gets their value from another one.

The country of development field is dependent on the developer field so can be split into their own table. Now all fields depend solely on their primary key.

Conclusion

The aim of a normalised database is to avoid data duplication. Any change to data should only have to be done to one row. Everything else should update via the data cascading through foreign keys.

One value per attribute
Entity uniquely identified by a single attribute (1 column per key)
No transitive dependencies

Un-normalised data should only be used for either tables with multi-value attributes or big data where the performance impact of indexing the data is too high.

Next: Microsoft Access