Menu

SQL Query explain plan and Optimization

Query optimization is determination by relational database the most efficient way to execute a given query by considering the possible query plans. Mostly query optimizer is used by databases for optimizing the query in the best possible way. User doesn't work directly with query optimizer. But some queries can be optimized by exploring its execution plan.

Prepare test data

Before we dive into queries optimizations and explain plan, let's prepare a table with test data. In our example we'll be using vehicle table which stores make, model, year and price. The vehicle table looks into the following way:

Explain command

EXPLAIN is a command that shows how tables are scanned and cost of the query. It helps to figure out why a certain query is slow. E.g. EXPLAIN can show how changing IN statement to JOIN will affect on query performance.
The EXPLAIN command shows the execution plan of a statement, how tables are scanned, how tables are joined, estimated number of rows and which JOIN method is used.

The basic syntax for EXPLAIN command is:

EXPLAIN sql_statement

where sql_statement is any SQL query for which we want to get execution plan.

Explain example for simple select

Let's select all records form the vehicle table and take a look at explain plan:

EXPLAIN SELECT * FROM vehicle;

The output will look like this:

Seq Scan on vehicle (cost=0.00..184.00 rows=10000 width=32)

rows

rows=10000 is number of selected rows

cost

cost is information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. Generally speaking it's an effort required to execute the query.
Cost for retrieving the first row is 0.00 and cost for retrieving all rows is 184.00. Cost for retrieving all rows calculates into the following way:

wherepage_cost and cpu_tuple_cost are planner-related configuration settings.seq _ page _ cost and number_of_relation_pages could be found into pg_class table.

We can retrieve the same results into the following way:

SELECT relpages*current_setting('seq_page_cost')::numeric +
reltuples*current_setting('cpu_tuple_cost')::numeric as query_cost
FROM pg_class WHERE relname='vehicle';

The output will look like:

query_cost
184

width

width is the average width of the tuple in bytes. This information is stored into pg_stats table and could be retrieved into the following way:

SELECT sum(avg_width) FROM pg_stats WHERE tablename = 'vehicle';

Average width for each column could be retrieved into the following way:

SELECT attname, avg_width FROM pg_stats WHERE tablename = 'vehicle';

The output will look into the following way:

attname

avg_width

vehicle_id

4

make

8

model

9

price

6

year

5

Explain with Analyze

Analyze collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.
It should be kept in mind that EXPLAIN ANALYZE shows the actual runtime of the query and can take a long time for large tables and complex queries.
Full syntax for Analyze looks into the following way:

ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]

Detailed explanation of parameters could be found in official documentation: ANALYZE

In the output we can the the additional information such as Planning and Execution times are present.Planning time is how much time potentially it will take to run the query.Execution time how much time actually required for running the query.

How to optimize queries

Here is a list of useful materials related to optimization queries on different relational databases: