SQL allows you to write queries against structured data in a relational database, by specifying how you want the data in the database to be related, and the parameters of the data you want out of it.
Relational databases have tables, that you can think about like individual sheets within a spreadsheet document.
This database currently has a single table, called band_members:
| id | first_name | last_name | age |
|---|---|---|---|
| 1 | Daniela | Avanzini | 21 |
| 2 | Lara | Raj | 19 |
| 3 | Manon | Bannerman | 22 |
| 4 | Megan | Skiendiel | 20 |
| 5 | Sophia | Laforteza | 22 |
| 6 | Yoonchae | Jeung | 17 |
To get some data from this table, execute a query:
SELECT *
FROM band_members
WHERE age > 20;
Returns:
| id | first_name | last_name | age |
|---|---|---|---|
| 1 | Daniela | Avanzini | 21 |
| 3 | Manon | Bannerman | 22 |
| 5 | Sophia | Laforteza | 22 |
SELECT with the * wildcard returns all columns of data, for the rows matching the WHERE clause.
To only get a subset of the data, such as id, and first_name, alter the query:
SELECT id, first_name
FROM band_members
WHERE age > 20;
Returns:
| id | first_name |
|---|---|
| 1 | Daniela |
| 3 | Manon |
| 5 | Sophia |
That's cool, but the real power of a relational database is that you can have many tables, and establish relationships between them, to synthesize any sort of dataset that might be useful for a particular situation.
In a real DB, we're gonna have multiple tables that are related to each other:
band_members:
| id | first_name | last_name | age |
|---|---|---|---|
| 1 | Daniela | Avanzini | 21 |
| 2 | Lara | Raj | 19 |
| 3 | Manon | Bannerman | 22 |
| 4 | Megan | Skiendiel | 20 |
| 5 | Sophia | Laforteza | 22 |
| 6 | Yoonchae | Jeung | 17 |
band_member_roles:
| role_id | role_name |
|---|---|
| 1 | Main Dancer |
| 2 | Vocalist |
| 3 | Rapper |
| 4 | Maknae |
role_assignments:
| member_id | role_id |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 2 |
| 4 | 3 |
| 5 | 2 |
| 6 | 2 |
| 6 | 4 |
The third table there establishes the relationship between each member and their role(s). We use a separate table to allow for "many to many" relationships, where each member could have more than one role.
To make use of those relationships, we need to use a JOIN statement in our query. There are a number of different types of joins (that can be expanded on later), but here, we're gonna use INNER JOIN. Within the JOIN, we specify the relationships we want using the ON keyword:
SELECT
band_members.id,
band_members.first_name,
band_member_roles.role_name
FROM band_members
INNER JOIN role_assignments
ON band_members.id = role_assignments.member_id
INNER JOIN band_member_roles
ON role_assignments.role_id = band_member_roles.role_id;
Returns:
| id | first_name | role_name |
|---|---|---|
| 1 | Daniela | Main Dancer |
| 2 | Lara | Vocalist |
| 3 | Manon | Vocalist |
| 4 | Megan | Rapper |
| 5 | Sophia | Vocalist |
| 6 | Yoonchae | Vocalist |
| 6 | Yoonchae | Maknae |
Yoonchae appears twice, because she has two roles. Depending on what you want to do with this data, you could handle that in different ways in your output/report (including just concatenating it to "Vocalist, Maknae")
This might seem overcomplicated, instead of just adding roles, or other items to the member table, but it becomes more important, and powerful, when the data is more complicated.