Using SQL JOINs Statements
March 21, 2020
Sometimes you won’t find all the information you need from a single “normalized” table.
Normalized (or normalization) is referred to a database design process in which tables are restructured and reorganized over several levels called normal forms (NF). Typically, online transaction processing (OLTP) databases are normalized to the third normal form (3NF). A table normalized to the 3NF usually does not have data redundancies and the table represents only a single entity.
A tedious process is to pull data from one table, and store them in a flat file like an Excel sheet. You’d continue pulling more related data from another table and add them to the spreadsheet. Repeat the process until you get all the data you need. Imagine you’re dealing with not just two or three tables but a hundred tables and pulling thousands or millions of data. That would be a DBA’s nightmare!
Fortunately, there’s an easier and more efficient way. RDBMS allows you to combine tables into a virtual table through a process call JOIN
but using four kinds of JOIN
statements or clauses:
INNER JOIN
(orJOIN
) – joins two tables based on a common field between them and returns all rows where there is a match between them.OUTER JOIN
(bothLEFT
andRIGHT
)LEFT JOIN
– Returns all rows from left table, and only matched rows from right.RIGHT JOIN
– Returns all rows from right table, and only matched rows from left.
FULL JOIN
orFULL OUTER JOIN
– Returns all rows where there is a match in one table.CROSS JOIN
– Joins every record on one side of theJOIN
with every record on the other side of theJOIN
resulting in a Cartesian product (ie. n X m = nm unique records).
When the LEFT JOIN
and RIGHT JOIN
are used together they yield the result as an OUTER JOIN
, so it’s redundant. It’s better to just use OUTER JOIN
to accomplish both. Thus, they’re listed under OUTER JOIN
as alternatives.
Let’s see how JOIN
works. Take the following two tables for this example.
TABLE: FILMS
FilmID (PK) | FilmName | YearMade |
1 | My Fair Lady | 1964 |
2 | Terminator | 1984 |
3 | Super-Man III | 1983 |
TABLE: ACTORS
FilmID (FK) | FirstName | LastName |
1 | Rex | Harrison |
1 | Audrey | Hepburn |
2 | Arnold | Schwarzenegger |
3 | Christopher | Reeve |
This is a small table, but it’s what a one-to-one relationship would look like. A one-to-one (or 1:1) means there are no duplicates in either table, and one record in one table matches exactly to one record in the other. If you recall your algebra or calculus, it’s like a one-to-one function.
Notice also the “FilmID” column in both tables. They both contain the same film ID of “1”. One of them is called a PRIMARY KEY
(or PK) and the other is a FOREIGN KEY
(or FK).
Say, you want to pull information based on a film, and you want My Fair Lady.
You could do this:
SELECT * FROM Films
WHERE FilmID = '1'
This will give you information the film’s ID, Title, and released date.
xxxxxxxxxx
SELECT * FROM Actors
WHERE FilmID = '1'
You get a list of all the actors who were in the film with a matching ID of “1”. In this case it would include only Rex Harrison and Audrey Hepburn.
The problem is they’re pulled separately and you’ll have to manually cut and paste them to a table. That’s laborious work! What if you can achieve it by pulling data from both tables at the same time? That’s exactly what JOIN
s do. By using a JOIN
clause, you can combine only the necessary data from both tables into a third table, a temporary table (virtual table) that you can now use as shown below:
TABLE: FILMS
FilmID (PK) | FilmName | YearMade |
1 | My Fair Lady | 1964 |
2 | The Terminator | 1984 |
3 | Super-Man III | 1983 |
TABLE: ACTORS
ActorID (PK) | FilmID (FK) | FirstName | LastName |
1 | 1 | Rex | Harrison |
2 | 1 | Audrey | Hepburn |
3 | 2 | Arnold | Schwarzenegger |
4 | 3 | Christopher | Reeve |
TABLE: TEMPORARY JOINED TABLE
FilmID | FilmName | YearMade | FirstName | LastName |
1 | My Fair Lady | 1964 | Rex | Harrison |
1 | My Fair Lady | 1964 | Audrey | Hepburn |
Since this table is not unique but has overlapping data, it is no longer a one-to-one but a one-to-many (1:M). There’s one film which maps to many actors, thus one-to-many. Nonetheless, this would be something more useful and more efficient!