What are Common Table Expressions or CTEs?

Common Table Expressions (CTEs) provide a way to write auxiliary statements for use in a larger query. These auxiliary statements, essentially named subqueries, can be referenced later in the FROM clause. They make queries easier to understand by breaking them up into smaller, simpler parts.

CTEs are defined within the statement using the WITH keyword. After the WITH keyword, the CTEs are named, and a query is defined that will populate the CTE. Once defined, a CTE can be used in the main query wherever a table could be used.

Here is the basic syntax:

WITH cte_name AS (
    -- Subquery goes here
)
-- Main query that can reference the CTE goes here

SQL

Let’s consider an example where we have a sales table, which contains sales data including the salesperson’s id, the sale date, and the sale amount.

CREATE TABLE sales (
    salesperson_id INT,
    sale_date DATE,
    sale_amount DECIMAL(10, 2)
);

SQL

Now, let’s say we want to find the total sales for each salesperson. We could use a CTE to calculate the total sales for each salesperson, and then reference that in our main query.

WITH sales_summary AS (
    SELECT salesperson_id, SUM(sale_amount) as total_sales
    FROM sales
    GROUP BY salesperson_id
)
SELECT salesperson_id, total_sales
FROM sales_summary
ORDER BY total_sales DESC;

SQL

In this example, sales_summary is a CTE that summarizes the total sales for each salesperson. We then use this CTE in the main query to order salespeople by their total sales.

You can also use multiple CTEs in a single query:

WITH sales_summary AS (
    SELECT salesperson_id, SUM(sale_amount) as total_sales
    FROM sales
    GROUP BY salesperson_id
), top_salespeople AS (
    SELECT salesperson_id
    FROM sales_summary
    ORDER BY total_sales DESC
    LIMIT 10
)
SELECT s.salesperson_id, s.total_sales
FROM sales_summary s
INNER JOIN top_salespeople t ON s.salesperson_id = t.salesperson_id;

SQL

In this example, the first CTE (sales_summary) is the same as before. The second CTE (top_salespeople) gets the ids of the top 10 salespeople by total sales. The main query then joins these two CTEs to get the total sales for the top 10 salespeople.

How are CTEs different from subqueries?

Common Table Expressions (CTEs) and subqueries are similar in that they both allow you to create named queries that can be referenced elsewhere in your SQL. However, there are several differences between them that can affect their usage:

Readability and Maintenance: CTEs can make your SQL more readable and easier to maintain, especially for complex queries. This is because they allow you to separate parts of your SQL into named, logically distinct units. Subqueries can also achieve similar effects, but they can become difficult to read and maintain when they’re nested and complex.
Reuse: A CTE can be referenced multiple times in the same query. This can be very useful when you need to use the same complex subquery multiple times. Without CTEs, you would have to duplicate the subquery, which can make your SQL hard to read and maintain. In contrast, a subquery must be written each time it’s used.
Recursive Queries: CTEs can be recursive, which means they can reference themselves. This makes it possible to write queries that can do things like traverse hierarchical data, which would be much more difficult to achieve with subqueries.
Performance: There’s a common misconception that CTEs can improve query performance. In general, CTEs and subqueries have similar performance characteristics. Some database systems, like PostgreSQL, treat CTEs as optimization fences, meaning the CTE query is executed separately from the main query and its results stored temporarily. This could lead to performance issues for large result sets. It’s always a good idea to test your queries for performance.

So while CTEs and subqueries are similar in many ways, the choice between them often comes down to the specific use case and the complexity of the query you’re writing.

What are Recursive CTEs

Recursive Common Table Expressions (CTEs) are a powerful feature of SQL that allow you to perform complex tasks that are otherwise difficult or impossible to do with traditional SQL queries. A common example of where recursive CTEs can be useful is dealing with hierarchical or tree-structured data.

Imagine you have an employees table that includes each employee’s id, their name, and the id of their manager, and you want to find the reporting structure under a certain employee.

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT
);

SQL

A recursive CTE can help to determine the reporting hierarchy. Let’s assume that we want to find all the subordinates under an employee with the id 1.

WITH RECURSIVE employee_hierarchy AS (
    SELECT id, name, manager_id
    FROM employees
    WHERE id = 1 -- Starting point of the hierarchy
    UNION ALL
    SELECT e.id, e.name, e.manager_id
    FROM employees e
    INNER JOIN employee_hierarchy eh ON eh.id = e.manager_id
)
SELECT * FROM employee_hierarchy;

SQL

The recursive CTE consists of two parts, separated by the UNION ALL:

The anchor member (SELECT id, name, manager_id FROM employees WHERE id = 1), which is a non-recursive SELECT statement that forms the base result set of the CTE.
The recursive member (SELECT e.id, e.name, e.manager_id FROM employees e INNER JOIN employee_hierarchy eh ON eh.id = e.manager_id), which is a SELECT statement that includes a reference to the CTE itself, forming the recursive part of the CTE.

In this case, the query starts with the employee with id 1, then in each subsequent recursive step, it finds and adds the employees who report directly to the employees added in the previous step. This continues until no more employees can be found, at which point the CTE returns the full set of employees who are subordinates of the original employee, directly or indirectly.

What are Common Table Expressions or CTEs?

How are CTEs different from subqueries?

What are Recursive CTEs

Share this: