58: SQL Normalization
In the previous exercise we explored SQL basics using the European Central Bank's historic Euro data set. In this exercise I'm going to teach you about data modeling by reshaping this data into multiple tables to "normalize" it.
What is Normalization
Normalization is about reducing redundancy in your data set. You see some form of redundancy, move it into a separate table, and then link the two tables via an
id column. It gets far more complex and theoretical, but this is the general idea. Doing this has a few advantages:
- It reduces the size of your data, and reduced size generally improves performance (but not always).
- It helps you understand the structure of the data possibly giving you better insights into better analysis.
- It makes many queries faster because you can narrow searches to specific data you want, rather than always searching all of it (but not always).
- It makes it easier to augment the data later since you can change the contents of a small isolated table rather than trying to change a giant table.
- It helps find errors in analysis since you're forced to explain how two pieces of data should be related. Does a User have one purchase or many purchases? Does that mean a purchase has many users or only one user? Normalization highlights these kinds of mistakes and forces you to formalize an answer.
- It makes you look like a real professional because you know what the word "normalization" means.
When you normalize a database you follow a process that goes through different "levels" or "normal forms" of quality:
- First Normal Form (1NF) has the goal of making 1 row and 1 column for every type and piece of data.
- Second Normal Form (2NF) has the goal of moving redundant discrete data into separate tables based on their relationship to keys in the table.
- Third Normal Form (3NF) requires that every piece of information in a row is only about the key of that row. This is where most people stop with normalization as further normalization can make the data more complicated than it needs to be for your application.
Let's take the ECB table and walk through normalizing it to second normal form (2NF). Going to third normal form (3NF) is not too useful in this data set.
Register for Learn Python the Hard Way, 5th Edition (2023-2024)
Register today for the course and get the all currently available videos and lessons, plus all future modules for no extra charge.