Common Table Expressions (CTEs) provide a way to write auxiliary statements for use in a larger query. These auxiliary statements, essentially named subqueries, can be referenced later in the FROM clause. They make queries easier to understand by breaking them up into smaller, simpler parts.
CTEs are defined within the statement using the WITH keyword. After the WITH keyword, the CTEs are named, and a query is defined that will populate the CTE. Once defined, a CTE can be used in the main query wherever a table could be used.
Here is the basic syntax:
Let’s consider an example where we have a
sales table, which contains sales data including the salesperson’s id, the sale date, and the sale amount.
Now, let’s say we want to find the total sales for each salesperson. We could use a CTE to calculate the total sales for each salesperson, and then reference that in our main query.
In this example,
sales_summary is a CTE that summarizes the total sales for each salesperson. We then use this CTE in the main query to order salespeople by their total sales.
You can also use multiple CTEs in a single query:
In this example, the first CTE (
sales_summary) is the same as before. The second CTE (
top_salespeople) gets the ids of the top 10 salespeople by total sales. The main query then joins these two CTEs to get the total sales for the top 10 salespeople.
How are CTEs different from subqueries?
Common Table Expressions (CTEs) and subqueries are similar in that they both allow you to create named queries that can be referenced elsewhere in your SQL. However, there are several differences between them that can affect their usage:
- Readability and Maintenance: CTEs can make your SQL more readable and easier to maintain, especially for complex queries. This is because they allow you to separate parts of your SQL into named, logically distinct units. Subqueries can also achieve similar effects, but they can become difficult to read and maintain when they’re nested and complex.
- Reuse: A CTE can be referenced multiple times in the same query. This can be very useful when you need to use the same complex subquery multiple times. Without CTEs, you would have to duplicate the subquery, which can make your SQL hard to read and maintain. In contrast, a subquery must be written each time it’s used.
- Recursive Queries: CTEs can be recursive, which means they can reference themselves. This makes it possible to write queries that can do things like traverse hierarchical data, which would be much more difficult to achieve with subqueries.
- Performance: There’s a common misconception that CTEs can improve query performance. In general, CTEs and subqueries have similar performance characteristics. Some database systems, like PostgreSQL, treat CTEs as optimization fences, meaning the CTE query is executed separately from the main query and its results stored temporarily. This could lead to performance issues for large result sets. It’s always a good idea to test your queries for performance.
So while CTEs and subqueries are similar in many ways, the choice between them often comes down to the specific use case and the complexity of the query you’re writing.
What are Recursive CTEs
Recursive Common Table Expressions (CTEs) are a powerful feature of SQL that allow you to perform complex tasks that are otherwise difficult or impossible to do with traditional SQL queries. A common example of where recursive CTEs can be useful is dealing with hierarchical or tree-structured data.
Imagine you have an
employees table that includes each employee’s id, their name, and the id of their manager, and you want to find the reporting structure under a certain employee.
A recursive CTE can help to determine the reporting hierarchy. Let’s assume that we want to find all the subordinates under an employee with the id 1.
The recursive CTE consists of two parts, separated by the
- The anchor member (
SELECT id, name, manager_id FROM employees WHERE id = 1), which is a non-recursive SELECT statement that forms the base result set of the CTE.
- The recursive member (
SELECT e.id, e.name, e.manager_id FROM employees e INNER JOIN employee_hierarchy eh ON eh.id = e.manager_id), which is a SELECT statement that includes a reference to the CTE itself, forming the recursive part of the CTE.
In this case, the query starts with the employee with id 1, then in each subsequent recursive step, it finds and adds the employees who report directly to the employees added in the previous step. This continues until no more employees can be found, at which point the CTE returns the full set of employees who are subordinates of the original employee, directly or indirectly.