Hopefully, we are all familiar with the structure of a table to hold data: a grid of columns and rows. Most people first experience tables with spreadsheet software like Microsoft Excel, LibreOffice Calc, or Google Sheets. Database tables function on same principle wherein each row is a record with data identified by a column. However, for those of you thinking ‘Why learn databases when I already know spreadsheets?’ there are many key differences that give databases a lot of functionality that cannot be achieved with spreadsheets and with much higher efficiency.
An example that I always like to use with students is people and emails. Today, most people do not have a single email address, but instead have many. At the time of writing I can easily think of five email addresses that I use on a daily basis. If we were wanting to record people and their email addresses in spreadsheet software, we would do something like the following:
|First Name||Last Name||Email 1||Email 2||Email 3|
This is all well and good for small tasks but you can see how it would increasingly get messy. Additionally, what if you wanted to track when users acquired and removed emails? What if you wanted to do the same for phone numbers?
This is where the database can become very handy. Instead of trying to organize everything into a single row, we will separate the data into two tables: A Person table and an Email table. Each of the tables will have a key that can be used to match the data:
As you can see, the design is a little bit different. Rather than put all of the data into a single row, we have logically split the data into two tables. We add the column person_id to the email where it becomes a foreign key to connect Person to Email. This allows us to have a many emails per person as well as add more email data if we would like without getting messy. Like discussed in 1.1, the data is organized logically so that data is easy to connect and find.