SQL EXPLAIN Explained: Stop Guessing Why Your Queries Are Slow

Reading a SQL EXPLAIN plan for the first time is a bit like looking at a digital junk drawer—it’s full of cryptic labels and numbers that don’t seem to make sense. But once you know what to look for, it’s the ultimate "cheat sheet" for making your queries run faster.

Think of EXPLAIN as the database’s way of saying: "Here’s exactly how I plan to find your data, and here’s how much it’s going to cost me."


1. The "Big Two": Explain vs. Explain Analyze

Before you dive in, you need to know which version you're looking at:

  • EXPLAIN: The "Weather Forecast." The database estimates what it will do based on statistics. It doesn't actually run the query.
  • EXPLAIN ANALYZE: The "Actual Weather." The database executes the query, measures the time, and gives you the real numbers. Warning: Don't run this on a DELETE or UPDATE statement in production unless you want those changes to actually happen!

2. Reading the "Tree"

Execution plans are usually hierarchical (a tree structure). You read them from inside-out and bottom-to-top. The most indented operations happen first, and their results are passed "up" to the next level.

Key Terms to Know:

  • Cost: This is a unitless value representing the "effort" (CPU and Disk I/O) the database thinks it needs. Don't obsess over the raw number; focus on the relative difference between steps to find the heaviest lifting.
  • Rows: This shows the number of rows the database expects to process (in EXPLAIN) or actually processed (in EXPLAIN ANALYZE). If this number is huge, you’ve found your bottleneck.
  • Width: The average size in bytes of the rows being passed through that specific step. If the width is high, you might be selecting too many columns (the "SELECT *" trap).

3. Red Flags (What to look for)

When you're scanning the plan, your goal is to find the "bottlenecks." Look for these specific operations:

The Scans

  • Seq Scan (Sequential Scan): The database is reading the entire table from start to finish. This is fine for small tables, but on a million-row table, it’s a performance killer.
  • Index Scan: The database is using an index to jump straight to the data. This is usually what you want!
  • Index Only Scan: Even better—the database found everything it needed in the index itself and didn't even have to touch the main table.

The Joins

  • Nested Loop: Good for small sets of data. It takes one row from Table A and looks for matches in Table B.
  • Hash Join: Good for large datasets. It builds a "map" of one table in memory to quickly find matches in the other.
  • Merge Join: Used when both datasets are already sorted. It's very efficient but requires that sorting step.

4. SQL EXPLAIN in Action: A Real-World Example

Imagine we have a users table with 100,000 rows. We want to find a specific person by their email.

The Query:

EXPLAIN SELECT name, email FROM users WHERE email = 'alex@example.com';

The Output:

Seq Scan on users  (cost=0.00..2240.00 rows=1 width=32)
  Filter: (email = 'alex@example.com'::text)

Breaking Down the Result:

  • Seq Scan on users: This is the "Ouch" moment. The database is doing a Sequential Scan, meaning it’s checking every single row in the table one by one to find Alex.
  • cost=0.00..2240.00:
    • 0.00 is the startup cost (how long it took to start the engine).
    • 2240.00 is the total estimated cost to complete the scan.
  • rows=1: The database expects to find only one row that matches this filter.
  • width=32: The average size of each row being passed up is 32 bytes. Since we are only selecting name and email, the database only calculates the storage space for those two specific columns.
  • Filter: This shows the specific logic being used to exclude data.

5. The "After" (The Optimization)

Now, let's say we add an index to the email column: CREATE INDEX idx_users_email ON users(email);. When we run the same EXPLAIN, the output changes drastically:

The New Output:

Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=32)
  Index Cond: (email = 'alex@example.com'::text)

Why this is better:

  • Index Scan: Instead of reading the whole book, the database used the "Index" at the back to jump straight to the correct page.
  • Total Cost (8.44 vs 2240.00): The cost dropped from 2240 to 8.44. That is a massive performance gain (roughly 265x more efficient).
  • Index Cond: This tells you exactly which index was used to satisfy the WHERE clause.

Pro Tip: If the estimated rows in a plain EXPLAIN is wildly different from the actual rows in an EXPLAIN ANALYZE, your database statistics are stale. Running ANALYZE table_name can often "fix" a slow query without you changing a single line of code.

Read more