7/17/2011

In the first part of our optimization story ("How to ruin Firebird performance...") we mentioned that indices should be analyzed before dropping, even if they appeared to be very bad in IBAnalyst in terms of garbage and number of unique values.

In essence, we need to log all queries used in applications, get their execution plans and check, how indices are used.

For this we need to log all SQL traffic between applications and Firebird database.

There are 2 ways to do it in Firebird 2.5 and later: FBScanner and Trace API, and the only way in early versions of Firebird: FBScanner.

FBScanner (Firebird Scanner) is a tool that can monitor and view all traffic between Firebird and InterBase servers and their client applications.

FBScanner can log all SQL traffic to text files and external Firebird database, it includes FBScanner LogAnalyzer module to analyze SQL performance.

Why not use Trace API?

For Firebird 2.5 users using TraceAPI could be good solution, but for high-load system TraceAPI can add too much load to Firebird. FBScanner add only 5-10% overhead to general operations and, moreover, it can be installed on separate [debug] server, so users and applications will not see any difference.
Also, as you can see below, FBScanner offers convenient user interface to analyze plans and queries.

FBScanner stores all information into the single table. It uses self-links to reduce the amount of stored information and it makes raw log hard to read and understand.

To facilitate log analysis we have created new module in FBScanner 3.0 - LogAnalyzer.

At first start LogAnalyzer will create necessary indices, it can take several minutes.

After that LogAnalyzer will show the last available day in the log at the “Server Load” tab:

“Server Load” tab shows how many SQL queries were run per minute, and how much time they took to execute. Effectively it shows server load, i.e., number of queries and their execution times.

Zoom in (button in the top left corner of the tab “Server load”), drag graph by holding right-button of the mouse and select the peak you are interested to investigate – click right-button to show popup-menu

It will show you tab “All statements”, where you can browse SQL queries

Select any query to see its text and, if plan logging feature is enabled, its plan.

To follow the execution flow, you can right-click on the query and look for connection and transactions for this query

LogAnalyzer marks bold queries in the same transaction:

You can sort queries and, for example, find query with the longest execution time:

To know more about this query - double-click on it and see more details

To perform it, we can use the following script (to run it immediately after database creation script) :

set stat on; /*show statistics*/

select * from ins1mln;

update tableind1 SET MALEORFEMALE = 3;

delete from tableind1;

select count(*) from tableind1;

and store result for future analysis.

Then we will create database with the same structure - but add index for MALEORFEMAIL column

CREATE INDEX TABLEIND1_IDX1 ON TABLEIND1 (MALEORFEMALE);

As you can see, we insert to this column only 0 or 1 integer values.

Then we repeat the script with this index and compare results.

Results are in the following table:

Without index for MALEORFEMAIL

With index for MALEORFEMALE

SQL> set stat on; /*show statistics*/

SQL> select * from ins1mln;

INSERTED_CNT

============

1000000

Current memory = 10487216

Delta memory = 80560

Max memory = 12569996

Elapsed time= 13.33 sec

Buffers = 2048

Reads = 0

Writes 18756

Fetches = 7833503

SQL> update tableind1 SET MALEORFEMALE = 3;

Current memory = 76551788

Delta memory = 66064572

Max memory = 111442520

Elapsed time= 15.04 sec

Buffers = 2048

Reads = 16166

Writes 15852

Fetches = 6032307

SQL> delete from tableind1;

Current memory = 76550240

Delta memory = -1548

Max memory = 111442520

Elapsed time= 3.27 sec

Buffers = 2048

Reads = 16147

Writes 16006

Fetches = 5032277

SQL> select count(*) from tableind1;

COUNT

============

0

Current memory = 76552064

Delta memory = 1824

Max memory = 111442520

Elapsed time= 1.35 sec

Buffers = 2048

Reads = 16021

Writes 1791

Fetches = 2032278

SQL> set stat on; /*show statistics*/

SQL> select * from ins1mln;

INSERTED_CNT

============

1000000

Current memory = 10484140

Delta memory = 75524

Max memory = 12569996

Elapsed time= 23.94 sec

Buffers = 2048

Reads = 1

Writes 23942

Fetches = 11459599

SQL> update tableind1 SET MALEORFEMALE = 3;

Current memory = 76548712

Delta memory = 66064572

Max memory = 111439444

Elapsed time= 29.30 sec

Buffers = 2048

Reads = 16167

Writes 19492

Fetches = 10035948

SQL> delete from tableind1;

Current memory = 76547164

Delta memory = -1548

Max memory = 111439444

Elapsed time= 3.41 sec

Buffers = 2048

Reads = 16147

Writes 15967

Fetches = 5032277

SQL> select count(*) from tableind1;

COUNT

============

0

Current memory = 76548988

Delta memory = 1824

Max memory = 111439444

Elapsed time= 0.69 sec

Buffers = 2048

Reads = 16021

Writes 1901

Fetches = 2032278

So, bad index decreases performance by approximately 2 times while inserting or updating. Also we can see that non-optimal index greatly increases number of writes and record fetches.

Let’s get statistics for this sample database (with bad index for MALEORFEMALE) and try to find some details.

To gather statistics, we run the following command:

gstat -r e:\testfb25index.fdb > e:\teststat.txt

TABLEIND1 table and indices statistics section looks intriguing, but what useful information it gives to us?

TABLEIND1 (128)

Primary pointer page: 166, Index root page: 167

Average record length: 0.00, total records: 1000000

Average version length: 27.00, total versions: 1000000, max versions: 1

Data pages: 16130, data page slots: 16130, average fill: 93%

Fill distribution:

0 - 19% = 1

20 - 39% = 0

40 - 59% = 0

60 - 79% = 0

80 - 99% = 16129

Index RDB$PRIMARY1 (0)

Depth: 3, leaf buckets: 1463, nodes: 1000000

Average data length: 1.00, total dup: 0, max dup: 0

Fill distribution:

0 - 19% = 0

20 - 39% = 0

40 - 59% = 0

60 - 79% = 1

80 - 99% = 1462

Index TABLEIND1_IDX1 (1)

Depth: 3, leaf buckets: 2873, nodes: 2000000

Average data length: 0.00, total dup: 1999997, max dup: 999999

Fill distribution:

0 - 19% = 0

20 - 39% = 1

40 - 59% = 1056

60 - 79% = 0

80 - 99% = 1816

To understand the meaning of shown numbers and percentage values, we can use IBAnalyst tool, which offers visual interpretation of database statistics:

By clicking Reports/View recommendations we can find the appropriate explanation for this index:

Bad indices count: 1.

By `bad` we name indices with many duplicate keys (90% of all keys)

and big groups of equal keys (30% of all keys). Big groups of equal

keys slowdown garbage collection - MaxEquals here is % of max groups

of keys having equal values. Index search for such an index is not

efficient. You can drop such indices (if they are not on FK

constraints).

Index ( Relation) Duplicates MaxEquals

TABLEIND1_IDX1 (TABLEIND1) : 100%, 50%

In production databases often we can see many bad indices, which can greatly affect database performance. In this example we can see table with 13 millions of records which have 7 bad indices, which are (most likely) useless and greatly decrease Firebird performance.

Usually the first move of developer is to drop these bad indices. On other hand, it’s possible that bad index was used in some particular SQL query, which required specific combination of indices (including bad one) in order to run fast enough.

Firebird SQL optimizer has complex rules, and it often happens that dropping of bad index can lead to significant performance degradation in some queries.

Certainly, these SQL queries should be rewritten to use more optimal execution plan without bad index, but there is a still a problem: how to find SQL queries which use bad index? The answer is in the next episode of our Optimization series of articles.