SQL is a declarative English-like language. This means you specify the result set you want returned and it’s up to the SQL query engine to figure out how best to retrieve the result set from the database. This is opposed to a imperative language which also defines the steps to achieve what you want. That said, understanding the order in which the SQL engine processes your query is extremely important in writing better and more efficient queries. This is where logical query processing comes in
Originally SQL was called SEQUEL which stood for “structured English query Language” but had to be renamed to SQL because of a trademark dispute with an airline. Still, the point is that you provide your instructions in an English-like manner. For example, consider the instruction, “Bring me all the cake from the cupboard”. Observe that in the instruction in English, the object comes before the location. A similar query in SQL would take the form of:
Select chocolate from Kitchen.cupboard
Now if you wanted to relay this request to a three year old (promise I don’t do this 😉) or a robot, you would need to define logical order in which to execute the steps. It makes sense that the instructions would be something like “Go into the Kitchen; go to the cupboard, open the door, get the chocolate and bring it to me”. Similarly with SQL, in order for the logical query processor to know where to retrieve the data points, it must first know which table(s) to pull them from. So contrary to the keyed order of the query above, the logical processing order is as follows:
From Kitchen.cupboard
Select chocolate
Logical Query Processing Order
If you know the concept of logical query processing well, you will understand many things about the way the language behaves.
# | Keyed Order | Logical Order |
---|---|---|
1 | Select | From |
2 | From | Where |
3 | Where | Group By |
4 | Group By | Having |
5 | Having | Select |
6 | Order By | Order |
Each logical processing step receives a table as an input and produces a table as an output. Let’s illustrate each step with using the below Customer Order table and query:
CustomerName | Address | OrderDate | OrderAmount |
---|---|---|---|
John Barret | New York | 19/02/2021 | 21 |
Kevin Clifford | Kansas | 23/01/2021 | 45 |
Mary Thompson | New Jersey | 25/07/2020 | 96 |
Denise Winfrey | New York | 04/04/2020 | 51 |
Matt Parker | California | 24/02/2021 | 33 |
Jessica Smith | California | 14/02/2021 | 75 |
Kevin Clifford | Kansas | 14/02/2021 | 178 |
Denise Winfrey | Cereal | 25/02/2021 | 37 |
John Barret | New York | 14/02/2021 | 20 |
Select CustomerName, sum(OrderAmount) as TotalSpend
From Customer
Where OrderDate > "01/01/2021"
Group By CustomerName
Having sum(OrderAmount) > 50
Order By TotalSpend
1. From Clause
The first step evaluates the FROM clause. In this case it’s “FROM Customer”. The input table and output table will be the Customer Order table above.
2. Where Clause
The second step is the WHERE clause and it takes the Customer Order table output by the FROM clause and applies the filter “OrderDate > 01/01/2021” to it to generate the following output table:
CustomerName | Address | OrderDate | OrderAmount |
---|---|---|---|
John Barret | New York | 19/02/2021 | 21 |
Kevin Clifford | Kansas | 23/01/2021 | 45 |
Matt Parker | California | 24/02/2021 | 33 |
Jessica Smith | California | 14/02/2021 | 75 |
Kevin Clifford | Kansas | 14/02/2021 | 178 |
Denise Winfrey | Cereal | 25/02/2021 | 37 |
John Barret | New York | 14/02/2021 | 20 |
3. Group By Clause
This step takes the filter table from the WHERE Clause and groups the table by customer. Within the 7 rows produced in the output table from the WHERE clause, the GROUP BY clause identifies 5 customer groups:
CustomerName | Address | OrderDate | OrderAmount |
---|---|---|---|
John Barret | New York | 19/02/2021 | 21 |
New York | 19/02/2021 | 20 | |
Kevin Clifford | Kansas | 23/01/2021 | 45 |
Kansas | 14/02/2021 | 178 | |
Matt Parker | California | 24/02/2021 | 33 |
Jessica Smith | California | 14/02/2021 | 75 |
Denise Winfrey | Cereal | 25/02/2021 | 37 |
Since we have included a GROUP BY, the final result of the query will have 1 row representing each group. Therefore, expressions in all phases that take place after the current grouping phase are
somewhat limited and must guarantee a single value per group.
4. Having Clause
The HAVING clause takes each group as a whole and applies the defined predicate ( sum(OrderAmount) > 50) to each group. This produces the resulting table:
CustomerName | Address | OrderDate | OrderAmount |
---|---|---|---|
Kevin Clifford | Kansas | 23/01/2021 | 45 |
Kansas | 14/02/2021 | 178 | |
Jessica Smith | California | 14/02/2021 | 75 |
5. SELECT Clause
Now that we the result set we want, the SELECT clause gets executed to return only the data points or columns required. As mentioned above, since we included the GROUP BY clause any expressions in the SELECT clause must guarantee 1 row per customer group. We can’t therefore include OrderAmount in the SELECT clause as it would produce 2 rows for Kevin Clifford. We can however sum across the order amounts for Kevin and get his total spend: The SELECT clause sum(OrderAmount) as TotalSpend produces the resulting table:
CustomerName | TotalSpend |
---|---|
Kevin Clifford | 223 |
Jessica Smith | 75 |
6. Order By
The sixth and final step takes the table above and orders it according to the expression in the ORDER BY clause, in this case TotalSpend. It should be noted that the ORDER BY clause is the only clause that can refer to the column alias defined in the SELECT clause. That’s because it is the only clause to be executed after the SELECT clause where the alias is created.
CustomerName | TotalSpend |
---|---|
Jessica Smith | 75 |
Kevin Clifford | 223 |
If you liked this article or found it useful please check out some of our other great articles. New articles added all the time!