Logical Query Processing - The Data Engineering Academy

SQL is a declarative English-like language. This means you specify the result set you want returned and it’s up to the SQL query engine to figure out how best to retrieve the result set from the database. This is opposed to a imperative language which also defines the steps to achieve what you want. That said, understanding the order in which the SQL engine processes your query is extremely important in writing better and more efficient queries. This is where logical query processing comes in

Originally SQL was called SEQUEL which stood for “structured English query Language” but had to be renamed to SQL because of a trademark dispute with an airline. Still, the point is that you provide your instructions in an English-like manner. For example, consider the instruction, “Bring me all the cake from the cupboard”. Observe that in the instruction in English, the object comes before the location. A similar query in SQL would take the form of:

Select chocolate from Kitchen.cupboard

Now if you wanted to relay this request to a three year old (promise I don’t do this 😉) or a robot, you would need to define logical order in which to execute the steps. It makes sense that the instructions would be something like “Go into the Kitchen; go to the cupboard, open the door, get the chocolate and bring it to me”. Similarly with SQL, in order for the logical query processor to know where to retrieve the data points, it must first know which table(s) to pull them from. So contrary to the keyed order of the query above, the logical processing order is as follows:

From Kitchen.cupboard 
Select chocolate

Logical Query Processing Order

If you know the concept of logical query processing well, you will understand many things about the way the language behaves.

#	Keyed Order	Logical Order
1	Select	From
2	From	Where
3	Where	Group By
4	Group By	Having
5	Having	Select
6	Order By	Order

Key Order vs Logical Processing Order

Each logical processing step receives a table as an input and produces a table as an output. Let’s illustrate each step with using the below Customer Order table and query:

CustomerName	Address	OrderDate	OrderAmount
John Barret	New York	19/02/2021	21
Kevin Clifford	Kansas	23/01/2021	45
Mary Thompson	New Jersey	25/07/2020	96
Denise Winfrey	New York	04/04/2020	51
Matt Parker	California	24/02/2021	33
Jessica Smith	California	14/02/2021	75
Kevin Clifford	Kansas	14/02/2021	178
Denise Winfrey	Cereal	25/02/2021	37
John Barret	New York	14/02/2021	20

Customer Order table

Select CustomerName, sum(OrderAmount) as TotalSpend
From Customer
Where OrderDate > "01/01/2021"
Group By CustomerName
Having sum(OrderAmount) > 50
Order By TotalSpend

1. From Clause

The first step evaluates the FROM clause. In this case it’s “FROM Customer”. The input table and output table will be the Customer Order table above.

2. Where Clause

The second step is the WHERE clause and it takes the Customer Order table output by the FROM clause and applies the filter “OrderDate > 01/01/2021” to it to generate the following output table:

CustomerName	Address	OrderDate	OrderAmount
John Barret	New York	19/02/2021	21
Kevin Clifford	Kansas	23/01/2021	45
Matt Parker	California	24/02/2021	33
Jessica Smith	California	14/02/2021	75
Kevin Clifford	Kansas	14/02/2021	178
Denise Winfrey	Cereal	25/02/2021	37
John Barret	New York	14/02/2021	20

Customer Order table filtered on OrderDate > 01/01/2021

3. Group By Clause

This step takes the filter table from the WHERE Clause and groups the table by customer. Within the 7 rows produced in the output table from the WHERE clause, the GROUP BY clause identifies 5 customer groups:

CustomerName	Address	OrderDate	OrderAmount
John Barret	New York	19/02/2021	21
	New York	19/02/2021	20
Kevin Clifford	Kansas	23/01/2021	45
	Kansas	14/02/2021	178
Matt Parker	California	24/02/2021	33
Jessica Smith	California	14/02/2021	75
Denise Winfrey	Cereal	25/02/2021	37

Customer Order table filtered on OrderDate > 01/01/2021, grouped by Customer

Since we have included a GROUP BY, the final result of the query will have 1 row representing each group. Therefore, expressions in all phases that take place after the current grouping phase are
somewhat limited and must guarantee a single value per group.

4. Having Clause

The HAVING clause takes each group as a whole and applies the defined predicate ( sum(OrderAmount) > 50) to each group. This produces the resulting table:

CustomerName	Address	OrderDate	OrderAmount
Kevin Clifford	Kansas	23/01/2021	45
	Kansas	14/02/2021	178
Jessica Smith	California	14/02/2021	75

Customer Order table filtered on OrderDate > 01/01/2021, grouped by customers having a total spend greater than 50

5. SELECT Clause

Now that we the result set we want, the SELECT clause gets executed to return only the data points or columns required. As mentioned above, since we included the GROUP BY clause any expressions in the SELECT clause must guarantee 1 row per customer group. We can’t therefore include OrderAmount in the SELECT clause as it would produce 2 rows for Kevin Clifford. We can however sum across the order amounts for Kevin and get his total spend: The SELECT clause sum(OrderAmount) as TotalSpend produces the resulting table:

CustomerName	TotalSpend
Kevin Clifford	223
Jessica Smith	75

6. Order By

The sixth and final step takes the table above and orders it according to the expression in the ORDER BY clause, in this case TotalSpend. It should be noted that the ORDER BY clause is the only clause that can refer to the column alias defined in the SELECT clause. That’s because it is the only clause to be executed after the SELECT clause where the alias is created.

CustomerName	TotalSpend
Jessica Smith	75
Kevin Clifford	223

If you liked this article or found it useful please check out some of our other great articles. New articles added all the time!