When working with databases, you often need to retrieve data from multiple tables. This is where SQL JOINS come in handy. JOINS allow you to combine rows from two or more tables based on a related column between them. Here are a few of the basic 'JOINS' that are most often used on a daily basis.
INNER JOIN
Inner joins will return rows from two (or more) tables in which there are commonalities in both tables. Take for instance a shop that has a table of customers, and a table of purchases. In order to track those purchases made by customers, the tables would have a column that we could reference in our JOIN clause. Most often this will be a PRIMARY KEY from our customer table (customer_id), and we will be able to reference that same id as a FOREIGN KEY in the purchases table.
A PRIMARY KEY is a column (or a set of columns) in a table that uniquely identifies each row in that table. The values in the PRIMARY KEY columns must be unique and cannot be NULL.
A FOREIGN KEY is a column (or a set of columns) in one table that refers to the PRIMARY KEY in another table. The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. It also enforces referential integrity between the child and parent tables.
While not vital in using JOINS in SQL, it is generally best practice to use PRIMARY and FOREIGN KEYS to reference the common column that two tables would have.
From our customer example above, this is the basic syntax for using an INNER JOIN in your query:
SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;
Breaking this down...
SELECT orders.order_id, customers.customer_name : We want to retrieve our order_id column from our 'orders' table, and our customer_name column from our 'customers' table
FROM order : our first table that we are referencing and retrieving data from
INNER JOIN customers ON orders.customer_id = customers.customer_id; : We are using an INNER JOIN on the 'customers' table and we want the tables referencing 'ON' the columns that have identical data - in our case, it is the customer_id that is listed in both tables.
Here is another example from a current project where you can see the actual data returned.
What we are looking at, is a mean_income table and a black_male_income table. The data that is returned in our SELECT clause is the year, lowest_fifth (national annual income average), and middle_fifth (national annual income average) from the mean_income table, followed by the bm_avg_annual (black male average annual income) - which I multiplied by 52, as the data is given as a weekly average.
So, by referencing the 'year' columns from both tables, we are able to retrieve the specific data from two or more tables that we want displayed in the resulting table.
LEFT JOIN
A LEFT JOIN returns all rows from the left table, and the matched rows from the right table. If there is no match, the result is NULL from the right side. Going back to our customer data, the basic syntax for a LEFT JOIN -
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;
Here, all customers will be listed, along with their orders. If a customer has no orders, the order_id will be NULL. Referring back to my project on mean income, we can see below that because the 'black_male_income' table did not have values for the years 1991 through 1999, the values are returned as 'NULL'.
RIGHT JOIN
A RIGHT JOIN returns all rows from the right table, and the matched rows from the left table. If there is no match, the result is NULL from the left side. This is identical to the LEFT JOIN, except you are now returning all rows from the right table instead of all rows from the left. Here is the basic syntax, referencing the customer and orders tables:
SELECT orders.order_id, customers.customer_name
FROM orders
RIGHT JOIN customers ON orders.customer_id = customers.customer_id;
The first table that is referenced in the FROM clause is always considered the 'left' table, and any other table listed in the JOIN clause, is considered the 'right' table. A RIGHT JOIN is not as commonly seen as LEFT JOINS, but can be helpful when you need to pull in all of your data from the 'right' table. Below is another example from the mean income project, but all I have changed, is using a RIGHT JOIN instead of a LEFT JOIN. All of my data is pulled from the 'right' table, but any data that does not match (in this case, we are referencing the year) from the 'left' table comes back as NULL.
FULL OUTER JOIN
FULL JOIN or FULL OUTER JOIN returns all rows when there is a match in either left or right table. Rows without a match in one of the tables will still be included, with NULLs in the columns from the table without the match. FULL JOINS are most commonly used when you need a comprehensive report with all rows from each table, need to audit your data, or are simply merging datasets. Here is the basic syntax for a FULL JOIN query:
SELECT customers.customer_name, orders.order_id
FROM customers
FULL OUTER JOIN orders ON customers.customer_id = orders.customer_id;
This query returns all customers and all orders. Where there is no match, the result will include NULLs. Referring back to the mean income project, here is what the FULL JOIN looks like, with NULL values for both tables where data does not match (no data for 'left' table after 2022, and no data for 'right' table - bm_avg_annual - prior to 2000:
A bit of a large screenshot, but you can see the NULL values at the top right of and also the bottom of the returned data.
CONCLUSION
Understanding SQL joins is crucial for efficient data retrieval in relational databases. INNER JOINs are used when you need only the matched records, while LEFT and RIGHT JOINs include all records from one side and the matched records from the other. FULL JOINs combine all records from both tables, providing a comprehensive view of the data. Please feel free to leave a comment or feedback! I would love to hear about projects you are currently working on.