MariaDB parameters matter!

While performing the technical audit for a new Client (eCommerce website, Bitrix CMS) we quickly noticed that the average load time for their customers facing website was slow:

Top page load speed (initial readings).

attempt #1 (11.5 sec)

attempt #2 (12.9 sec)

attempt #3 (12.1 sec)

Products group page load speed (initial readings).

attempt #1 (10.7 sec)

attempt #2 (13.2 sec)

attempt #3 (12.5 sec)

Obviously those readings were beyond the acceptance levels for a website with the main intention to sell products (and to keep aggressively increasing the sales volumes). Therefore, this issue needed the immediate attention!

When this performance issue was brought up with the Client, they confirmed they're aware of it as it had been there for a long while. The Client's was pushing for the migration to a new more powerful but more expensive hardware to improve the load time. But the Roki Team were reluctant to agree with the migration plans until the proper investigation is conducted to confirm whether the hardware is really outdated or doesn't keep up with a users volume.

A better hardware would almost surely improve the figures, however, spending more money instead of addressing the route cause is not what we like doing in Roktech. Instead, the request to perform the deeper technical investigation was issued, and the Client approved such request.

Initial investigation

The starting point for the investigation was to look at the site utilization stats and to correlate those figures with the capabilities of the hosting there the front-end system was placed.

Looking at the Google Analytics statistics and grepping the server's logs quickly gave us the picture of the website utilization in terms of users/connections:

Daily average: 750 users

Simulations users: 15 (at peak)

Then the actual hosting hardware was inspected. The investigated site was hosted on a dedicated server (with no other sites being hosted there nor resources sharing). The server itself turned out to be a very decent powerful machine:

And a number of various hardware test ruled out any HW defects/issues with the server.

By having these figures collected and by looking at the performance of similar systems (using information available via public sources) the Team drew a conclusion that under the current load the existing hardware should perform 3-5 times faster, i.e. allowing to load the pages within a timeframe of 2-4 seconds.

At this point the migration to a new hardware was not the option anymore as it became obvious that the hardware was sufficient and was unlikely causing any bottlenecks.

What was it then?

The next step was to have a closer look at the actual page load process and its nature. Bitrix CMS has the built-in debugger which was very handy for this purpose. A few refreshes of the top page with the enabled debugger revelaed that the significant portion (10-11 secs) of the total load time was generated by execution of SQL queries.

This fact alone eliminated many other possible scenarios and set the fundamental vector for further investigation - we needed to look at the DB side of the system, i.e. queries, MariaDB setting etc.

And it wasn't long before we identified the type of the queries taking that long to execute:

In a short while we were already running those SQLs in the Staging environment (created as a very close copy of the Production environment) just to confirm a similarly long execution figures:

Attempt #1: Empty set (10.88978 sec)

Attempt #2: Empty set (10.55892 sec)

Attempt #3: Empty set (10.91749 sec)

So that was the problem. But why? What was the root cause?

Root cause

Before diving in deep waters of DB design and queries structures, we decided to have a quick Google shortcut/search to see if anyone had ever documented any issues with similar DB queries against MySQL/MariaDB. And... after checking a number of no-use forum threads the luck was finally on our side!

it's a known bug they have with MariaDB, just had to comment out the config about join_buffer_size, apparently anything higher than the default value somehow causes problem to that specific query.

To make it explicitly clear, here we're talking about the join_buffer_sizei parameter located inside the my.cnf configuration file for the MariaDB. And then it was not a surprise for us to find out that the value of this parameter was way higher than the default value (128K) in our case:

Commenting it out was the fix which went to the Staging first, then further deployet to the Production after the performance improvement had been verified:

Attempt #1: Empty set (1.992351 sec)

Attempt #2: Empty set (1.511927 sec)

Attempt #3: Empty set (2.027851 sec)

Top page load speed ( initial vs. post-fix readings).

attempt #1 (11.5 sec)

attempt #2 (12.9 sec)

attempt #3 (12.1 sec)

Conclusion

Not optimal configuration of MySQL/MariaDB has been known for causing performance issues. In this example changing one value in the configuration file helped to significantly speed up the load time of the poorly performing pages, and, as a result, helped to avoid of the risky and costly migration in the immediate term.

There were still a number of things in that system to address from the performance point of view to keep the load time constantly under 1.5 seconds, and we'll mention them in our future articles.