Category Archives: Postgresql

The biggest hurdle I have had to overcome in order to use Tsung for load-testing Postgresql servers has been a conceptual mismatch between Tsung and what I wanted to do. Tsung’s model probably originates in the load-testing of web servers: everything is described in terms of user arrival rate, hits, pages, transactions, thinktimes. Database usage may not be readily described in these terms.

Before going any further, I should probably make clear I didn’t need Tsung to do performance testing. Performance testing may be easily done by throwing a specific set of SQL queries to the database server (in controlled conditions) and checking/timing the results (this could be a separate tutorial :-). Tsung gives you the tools to model proper user interaction and real-life usage and I have been trying to determine a server’s load capacity.

In other words, how many times our typical or target load could a particular server/set-up handle?

And this load had to be expressed in a Tsung-compatible xml file describing mainly:

alternative user sessions (with associated probabilities)

user arrival rate

Here’s a quick reminder of what Tsung transactions mean:

Different parts of a session may be grouped into transactions (Tsung-speak — nothing to do with your normal database transactions) for statistical monitoring of SQL groups. Transactions are characterised by their name, and names may be shared across sessions. This way, there are tremendous reporting possibilities, as all sessions may have a “connection” transaction offering global connection statistics, while transactions with unique names produce statistics on a specific use-case basis (e.g. complex data search, typical page load etc.).

For simplicity, I have opted to include only two “transactions” in each alternative user “session” (use-case):

a connection transaction (identified as “connection” in all “sessions”)

a SQL block transaction (with a unique, “session”-specific name)

Know your (target) usage

Here comes the obvious but imporant bit: you need to know your real-life or your target usage to proceed! Expressing your (target) usage into Tsung values is the only thing that binds your experiment to real-life and allows some conclusions to be drawn from the tests.

The defined “sessions”, should, of course, reflect your usage profile. This boils down to including a representative variety of use-cases, with the right probability factor assigned to each case.

But you also need to express the number of new “sessions” per second Tsung initiates against your system, i.e. the Tsung user arrival rate.

Adapting the scenario file

This is a quick summary of what you should edit in your Tsung scenario file to specify the desired load:

allocate different probabilities to your alternative “sessions” (do they add up to 100?)

make sure you wrap the important bits of each session into unique “transactions”

Analyzing the results

Assuming have managed to run your tests, now comes the tricky part of interpreting your results. The Tsung helper perl script generates a multitude of graphs, but here’s a quick shortcut. The files which have been most useful to me are the following:

report.html

images/graphes-Transactions-max_sample.png

images/graphes-Transactions-mean.png

images/graphes-Users-simultaneous.png

When looking at these graphs, the two most important things to remember are the length (in seconds) of each load phase and what each phase represents. For example, the following graph (manually colored for convenience), may be divided into four sections, each representing a particular load phase (each phase lasted 1800 seconds, i.e. half an hour). This graph basically tells us things start to fall apart at 8x our target load.

The reason the interpretation of this graph is easy is that we are not using any loops in each user “session”. Each Tsung “user” simply connects, sends a particular SQL block to the server, receives some results and exits. The user arrival rate stays constant throughout a particular load phase. Statistically speaking, if the server is responding properly, the number of new users in the system is always matched by the number of users exiting. Therefore, you only get simultaneous Tsung users if things start going wrong, when the server’s response times are increasing. And when you see the green and red lines splitting, things have gotten out of hand: Tsung is introducing new users which are not even able to connect!

We should always, of course, check, if the server’s performance was acceptable while it was “coping” with our load. In addition to the numbers in report.html, you could get the big picture by simply looking at images/graphes-Transactions-max_sample.png. The horizontal line for each “session” corresponds to the longest response time ever recorded for a particular use-case.

Armed with this knowledge, you may start experimenting further. Does your server recover from brief spikes of activity (e.g. long 4x phase, brief 16x phase, 4x phase etc.)? What effect do particular server configuration changes have on load capacity? And so on… This could easily turn into a full-time job 🙂

Tsung has a “proxy mode” which records SQL statements and produces an appropriate Tsung scenario file. What could be simpler? I shall just point my web application to speak to the Tsung proxy instead of the database and I will use it to generate “typical usage” cases.

Unfortunately, this is not an option if, say, your application uses a web framework which maintains several open connections to the database server. The Tsung proxy can only handle one connection at a time. So your application does not function properly and you are not able to use it to generate the “typical usage” scenaria.

Then there is pgFouine, a PostgreSQL log analyzer, which shows some promise, which produces Tsung compatible output on demand. But pgFouine principally analyzes log files to group and rank statements according to how well they perform in the database, and this approach has spilled over to Tsung scenario file generation: the order of the SQL statements is not preserved! This, by itself, perhaps would not be a problem, but I often record multiple use-cases in one go and pgFouine mixes them up.

The best way to create our test cases, therefore, is to use the log files from an idle Postgresql server, after enabling the logging of all SQL statements in the server. I have written a few scripts which help with the process, but this was after already changing the logging format of our Postgresql server to pgFouine’s requirements (syslog). Thus, the Postgresql server needs to log in this particular style:

For the changes to have effect, you need to restart the syslog service (/etd/init.d/syslog restart) and Postgresql.

You are now ready to start capturing SQL statements in the Postgresql log file. To make sure you shall be able to filter the log file into separate use-cases, you should choose a unique string identifier (e.g. ‘complex search 001’) to throw at the database server at the beginning and end of a particular use-case. You may do this by connecting to the server via ssh and typing:

echo "SELECT 'complex search 001';" | psql -U postgres

… before using your web application (which must be configured to talk to this particular Postgresql server). At the end of this use-case (‘complex search 001’) all you need to do is repeat the line above.

When you have finished recording all batches (use-cases) of SQL statements, you need to locate the postgresql log file (e.g. /var/log/postgresql/postgresql.log) and use it as input for the perl script below:

I have created syslog-filter, a simple perl script you may run from the command line, like so:

… assuming the script has permission to be executed and is located in the same directory as the postgresql.log file. This command creates complex-search-001.log, which contains only those SQL statement that belong to this use-case.

This generates a partial Tsung file in the proper format. This process need to be repeated for every different use-case we would like to include. The resulting xml files may be concatenated into a single file, like so:

cat *.xml > my-tsung-scenario.xml

The resulting file (my-tsung-scenario.xml) will be completed into a full valid Tsung scenario file in section 2.4 In order to run the above scripts, you obviously need a working Perl environment and the Parse::Syslog perl module, which may be installed by typing (as root):

cpan Parse::Syslog

Before proceeding any further, you may want to manually edit all occurences of

<transaction name="requests">

…in my-tsung-scenario.xml, changing the name each time to reflect the use-case which follows. E.g.

<transaction name="complexSearch1">

Another required manual edit concerns the probability factors assigned to each use-case (session). Therefore, you need to adjust the probability settings of all such occurences:

… to get a head-tsung-scenario.xml file which we can then edit accoring to our needs. If we keep the existing settings, Tsung will attempt to load-test a server called myserver (the names needs to be resolvable, please check your DNS service and/or your /etc/hosts file) from a single client, myclient, while trying to monitor hardware load on both machines. In the load section, two load phases have been defined, starting at “new user every 4 seconds” and then doubling the rate. Each of these phases is meant to last half an hour (1800s), but once the server reaches its breaking point, user sessions do not terminate properly and the duration of the load phase we are in is expanded, as Tsung waits for all users to finish before proceeding to the next one. Once you have changed head-tsung-scenario.xml according to your needs, you may complete the generation of a new scenario file by typing:

This file (temp-tsung-scenario.xml) is actually a full valid scenario file which may be used for testing. But you probably want to tweak one or two things to make this testing relevant to your system, which is what we shall discuss in the next installment of this tutorial.

If you suddenly needed a cronnable Postgresql database update command for SQL text files, you would probably just type:

cat /path/to/some/dir/*.sql | psql -U postgres someDatabase

So, I am asking myself, have I created something pointless?

As it turns out:

pgBee keeps track of the update process. If a pgBee instance is killed, the next invocation will carry on from where the previous one has stopped. And if it finds SQL errors, it will report how far it got in the input files before quitting.

pgBee is actually faster than psql when executing SQL statements from a text file. psql took 112m (with one transaction for each statement), psql -1 took 97m (with one transaction for the entire file) but pgBee finished in 21m !!! (with one transaction per batch) That’s a whopping 898 operations per second. All tests were run on the same database server (localhost), pgBee was batching groups of 100 statements at a time and a real data file was used, with 1131753 SQL statements in total (511335 DELETEs and 567577 INSERTs).

In a previous post, I promised some examples/tutorials on load-testing Postgresql servers with Tsung. Well, I have tried to develop a database performance testing methodology that may be: a. application-specific, and b. easily applied to different servers and configurations, to assess their relative performance.

Tsung is ideally suited for application-specific Postgresql testing, as it supports a “proxy mode” to record SQL sessions, which are then turned into a scenario file and replayed any number of times. It also supports including alternative sessions in the same scenario file, so that each simulated new user may send a different set of SQL statements, according to the probability assigned to each session.

Different parts of a session may be grouped into transactions (Tsung-speak — nothing to do with your normal database transactions) for statistical monitoring of SQL groups. Transactions are characterised by their name, and names may be shared across sessions. This way, there are tremendous reporting possibilities, as all sessions may have a “connection” transaction offering global connection statistics, while transactions with unique names produce statistics on a specific use-case basis (e.g. complex data search, typical page load etc.).

I’d say there are two main preparation stages for meaningful Postgresql load-testing with Tsung:

pgBee is a set of Java classes I wrote for automating bulk updates of Postgresql databases on Linux servers. It requires Java (doh!) and Ant (as a build/execute front-end), it is cronnable and performs very well, especially in multi-threaded mode, which takes full advantage of multi-core CPUs in modern servers. The source of inspiration for pgBee has been previously described.

All configuration is done in the settings.xml file, but some options may be set through the command line, e.g.

ant -f /path/to/build.xml -Dlock=yes -Dthreads=8 -Dparallel=yes run

pgBee processes all files it finds in a particular (in) directory and moves them to either a done directory or a rejects directory, if there were SQL errors. You’ll need to create the right directory structure and configure pgBee settings before starting. The pgBee process catches SIGTERM, SIGHUP etc. signals and exits gracefully, ready to resume from where it stopped the next time it is run. So, it should be quite reliable, in the absence of hard resets and kill -9. Having said that, I am supplying no guarantees of fitness for any purpose of any kind 🙂 Please use at your own risk.

If you need to make sure a particular set of statements is processed in the same transaction, you only have to include all statements in the same line of an input file, separated by semi-colons. There’s no limit to how many SQL statements you may include in a single line. More information about input file format, usage and configuration may be found in the downloadable tarball

Data models are good and they are clear, if you’re the person writing the application and devising the model. Hell, sometimes, they are not clear even then! So, imagine what happens when you get someone from the street to connect to your database and read your schema in order to understand it. No chance!

Now, this is not about some poor wardriver who doesn’t know how to read the implicit relationships between tables in your model – they had it coming! But what about your legit users, working on a particular aspect of your infrastructure or application, such as developers, DBAs etc. ? How on earth do they make sense of it all when they first start?

Yes, yes, in an ideal world everything’s properly documented, but when was the last time you saw that in a real life situation? Real IT people don’t write helpful comments when they create their tables, views, functions etc. Referential integrity? Don’t make me laugh! Most developers avoid database constraints, to keep the application portable between database systems and database error messages to a bare minimum. Integrity rules are usually enforced at the application level. From a DBA’s perspective, most enterprise-level databases are big collections of seemingly unrelated tables, with no business logic in the DB system itself.

Yet another work-related post. I have been asked to write a better automatic database update system and against my natural tendencies toward Perl and Python I have opted to do it in Java. Now, previous attempts in Java had been abandoned because they were not performing very well, but I wanted to build something with potential for integration with the company’s infrastructure, so I rolled up my sleeves and decided to investigate.

A quick Google search produced some interesting discussions (please see the Interesting Links below). In summary, the official JDBC Postgresql driver does not support COPY operations and people complain that it’s slow for bulk updates, however, our update sql files are not very structured and, in fact, may contain any (as in different each time) valid SQL code. So, COPY is not what I’d use, anyway.

Some hope for reasonable performance appeared in the form of the driver’s batch mode. So, I wrote some Java classes which read multiple lines of sql statements from an sql text file into a String buffer of configurable size. When this size is reached, these sql statements are added to the reused Statement object with addBatch() and are executed in their own transaction (I have set auto-commit to off) through executeBatch().

Now, I have tried inserting one million rows into a table using a different buffer size each time, i.e. grouping sql statements in batches of one, ten, hundred and thousand statements per transaction. The results are quite promising, don’t you think? (low spec machine, btw)

One of the nicest things I have done recently was to attend the First International Erlang eXchange in London (http://www.erlang-exchange.com/). It was jam packed with exciting information on a variety of topics, and I expect a lot of this information will be popping up over the next few weeks one way or the other. Now, one of the things I discovered at the eXchange was Tsung, a distributed performance-load-stress testing tool for http and postgresql servers written in Erlang – loads of scaling-up potential there. Now, this happens to be an important part of my new job, so please expect more on the topic very soon (real examples, tutorials etc.)

Apparently, there are subtle differences between the terms performance, load and stress testing, you may read an opinion here: