Skip to content
Home » Unleashing the Power of EXPLAIN in PostgreSQL

Unleashing the Power of EXPLAIN in PostgreSQL

PostgreSQL logo

As a PostgreSQL developer, the optimization of your SQL queries is a vital task that can greatly enhance the performance of your database. This is where the commands EXPLAIN and EXPLAIN ANALYZE come into play. They provide detailed insights into how your SQL statements are executed, allowing you to spot areas for improvement.

In this article, we’ll delve into how you can use these commands and how indexing impacts your query performance.

Deciphering EXPLAIN Command

This command in PostgreSQL reveals the query planner’s execution path of a SQL statement. It outlines a map of operations, including the following key details:

  • The order of table scans.
  • The type of scans (sequential, index, etc.)
  • The type of joins used (nested loop, hash join, merge join, etc.)
  • The estimated costs and number of rows.

Exploring EXPLAIN with an Example

Let’s consider an example where we have two tables: products and orders. The goal is to find all orders along with their associated product information for a particular product.

Our SQL query would be:

SELECT orders.order_id, products.product_name, orders.order_date 
FROM orders 
INNER JOIN products ON orders.product_id = products.product_id 
WHERE products.product_id = 5;
SQL

To determine the execution plan for this query, we prepend it with the EXPLAIN command:

EXPLAIN SELECT orders.order_id, products.product_name, orders.order_date 
FROM orders 
INNER JOIN products ON orders.product_id = products.product_id 
WHERE products.product_id = 5;
SQL

This could return something like:

Hash Join  (cost=8.27..45.67 rows=120 width=58)
  Hash Cond: (orders.product_id = products.product_id)
  ->  Seq Scan on orders  (cost=0.00..32.60 rows=1260 width=36)
  ->  Hash  (cost=8.26..8.26 rows=1 width=36)
        ->  Seq Scan on products  (cost=0.00..8.26 rows=1 width=36)
              Filter: (product_id = 5)
SQL

Step-by-step breakdown

The query plan above is a representation of how the database will execute this SQL query, which involves joining the orders table with the products table on the product_id field. The database planner generates this query plan, and it is crucial to understanding the performance implications of the query.

Here’s a step-by-step breakdown of the query plan:

Line 3 – Seq Scan on orders (cost=0.00..32.60 rows=1260 width=36): This operation represents a full table scan on the orders table. The cost of this operation is estimated to be between 0.00 and 32.60. This cost is a measure of how much computational effort is required to perform the operation. The lower the cost, the less effort needed. The estimate suggests that the operation will return 1,260 rows, each row being 36 bytes wide. The database will go through each row in the orders table, without any index to speed up the process.

Line 5 – Seq Scan on products (cost=0.00..8.26 rows=1 width=36): This operation is another full table scan, but this time on the products table. The estimated cost is between 0.00 and 8.26. Here, it’s expected to return just 1 row of 36 bytes. The Filter: (product_id = 5) line indicates that this operation is looking for rows where product_id is equal to 5.

Line 4 – Hash (cost=8.26..8.26 rows=1 width=36): The rows resulting from the sequential scan on the products table are put into a hash table. This hash table allows for faster lookup during the hash join operation. The estimated cost of this operation is 8.26.

Line 1 – Hash Join (cost=8.27..45.67 rows=120 width=58): This is the final step, which joins the rows from the orders table and the hash table of products. It uses the condition specified in Hash Cond: (orders.product_id = products.product_id). The expected cost of this operation is between 8.27 and 45.67, producing an estimated 120 rows, each 58 bytes wide.

Remember that the costs are not absolute measurements, but relative values used by the PostgreSQL query planner to compare different plans and select the most efficient one. Also, these are estimates; actual rows returned or scanned may be different.

Going Beyond with EXPLAIN ANALYZE

While the EXPLAIN command gives you the planned execution path, EXPLAIN ANALYZE actually runs the query and provides real runtime statistics. This can be invaluable when trying to determine the performance of a query.

However, remember that EXPLAIN ANALYZE executes the query, so exercise caution when using it with UPDATE, DELETE, or INSERT commands, or any SELECT that may have a long execution time.

Impact of Indexing on Query Performance

Indexes are crucial for query performance. They allow the database to find and retrieve data much faster than it would if it had to scan the entire table. The influence of indexes can be seen when we analyze queries with the EXPLAIN or EXPLAIN ANALYZE commands.

Let’s suppose we create an index on the product_id field of the orders table:

CREATE INDEX idx_orders_product_id ON orders (product_id);
SQL

If we run the EXPLAIN command for the earlier query, PostgreSQL might now opt to use an index scan instead of a sequential scan, significantly reducing the query cost:

EXPLAIN SELECT orders.order_id, products.product_name, orders.order_date 
FROM orders 
INNER JOIN products ON orders.product_id = products.product_id 
WHERE products.product_id = 5;
SQL

You might see something like:

Nested Loop  (cost=0.29..16.66 rows=1 width=22)
  ->  Index Scan using idx_orders_product_id on orders  (cost=0.29..8.31 rows=1 width=16)
        Index Cond: (product_id = 5)
  ->  Materialize  (cost=0.00..8.27 rows=1 width=6)
        ->  Seq Scan on products  (cost=0.00..8.27 rows=1 width=6)
              Filter: (product_id = 5)
SQL

In line number 2, it can be observed that the index idx_orders_product_id is being looked up instead of a sequential scan. 

Conclusion

EXPLAIN and EXPLAIN ANALYZE are indispensable tools in your PostgreSQL arsenal for understanding and improving your query performance. Pair them with the right indexing strategy, and you’ll be well on your way to achieving a robust and efficient database system.

In this follow-up post, we discuss how to interpret EXPLAIN plans in greater detail.