Getting Started with SQL Queries

The Basics

We are going to start from the beginning in the world of databases known as Structured Query Language (SQL). SQL is a powerful tool that allows you to interact with databases, retrieve data, and perform various operations to get the results you need. The following will introduce you to some of the fundamental SQL clauses: SELECT, FROM, WHERE, and ORDER BY.

SELECT and FROM clauses are the foundation of any SQL query. The SELECT clause specifies the columns you want to retrieve, and the FROM clause specifies the table from which to retrieve them. For any database, you can simply use the following to retrieve all of the data from the table (you can think of tables like an Excel spreadsheet):

SELECT *
FROM <table_name>

For example, I have a table called 'housing_price'. This database contains just two columns - 'date' and 'avg_price'. Using the above query, I can retrieve all of the data from my table, as shown below

Screenshot of PostgresQL SQL query, and the cooresponding data set below the query

This query uses the SELECT statement to retrieve data from a database table. Here’s what each part of the query does:

• SELECT * : This clause selects all columns from the table. The asterisk () is a wildcard character that means “all columns”.

• FROM housing_price: This clause specifies the table from which to retrieve the data, in this case, housing_price.

You can also specify which columns you want to see by simply replacing the '*' with the column names:

PostgresQL screenshot with simple query and resulting data below query

The example above returns the same information from the 'housing_price' table, but the example below returns just the 'avg_price' column

While this query maybe suffice with smaller datasets, it may be more helpful to filter for information when retrieving data from a much larger dataset. This is when the 'WHERE' clause comes in handy!

Filtering your Data

The 'WHERE' clause allows you to filter records from the table that meet a certain condition. This is useful when we only want to retrieve specific data from your table. Here is the basic query structure that includes a 'WHERE' clause:

SELECT *
FROM <table_name>
WHERE <condition>

If we apply this to the 'housing_price' table, we could look specify a condition where we only want to return data where the 'avg_price' is greater than 250,000. This would be done with the following query:

SELECT *
FROM housing_price
WHERE avg_price > 250000

Below is the resulting data output that shows us only the data where our 'avg_price' is greater than 250000. Not only does this help us sift through much larger datasets, but it also speeds up how quickly our data is returned. We also have the ability to determine how our data is ordered when it is returned, using the ORDER BY clause.

Ordering your Data

When retrieving data, the rows of data returned appear in the order that they are stored in the database. When there is a specific need to order data (ie - sorting by dates, dividing results into pages, or ranking), we can use the 'ORDER BY' clause on one or more columns. The basic syntax when using the 'ORDER BY' clause is as follows:

SELECT * (specific column names can also be used)
FROM <table_name>
ORDER BY <column_name_1> [ASC | DESC], <column_name_2> [ASC | DESC]

• SELECT column1, column2, ...: Specifies the columns to be retrieved.

• FROM table_name: Specifies the table from which to retrieve the data.

• ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...: Specifies the columns by which to sort the result set. Each column can be sorted either in ascending (ASC) or descending (DESC) order.

The first column listed in the ORDER BY clause will be how the data is primarily sorted. If there are duplicates for that column, then the data will be sorted secondarily by the next column, and so on. If the first column used has all unique values, the secondary column will have no impact on the order. The order of the data can also be dictated to be in ascending (ASC) or descending (DESC) order. By default, data is returned in ascending order, so if we can simply state the name of the column, as shown below:

Screenshot of SQL query from PostgresQL showing a simple query with ORDER BY clause, and the resulting data table below the query

Using 'DESC' after the column name, sorts all of our data in descending order according to the 'avg_price' column:

Screenshot of SQL query from PostgreSQL showing a simple query with an ORDER BY clause and the resulting data returned

Putting them all togther...

  • SELECT avg_price: selecting only the avg_price column to be returned

  • FROM housing_price: the table where our data is stored

  • WHERE avg_price > 190000: filtering to return only prices that are greater than 190000

  • ORDER BY avg_price DESC: sort our results in descending order of the avg_price column

Conclusion

SQL is a powerful language for managing and retrieving data from relational databases. By understanding and using these basic SQL clauses, you can perform a wide range of data operations. As you become more familiar with SQL, you’ll be able to write more complex queries to meet your data retrieval needs. Here are a couple resources that will help get you started:

Kaggle - datasets that can be downloaded and used for various SQL projects

Alex the Analyst - great content creator that has tons of data analytics tutorials

Luke Barousse - another great content creator that also has a data analytics course

Thanks for making it to the end and good luck in your SQL journey!