As we have recently rolled out the new 8.4.0 version of the CUBRID Database, one of our CUBRID users has approached us with a proposal to test the new CUBRID on one of their live services which currently uses MySQL as a back-end database. It is a SNS (Social Networking Service) which operates a base of over 6 million active users. Their major queries mostly comprised of IN and UNION operators, where UNION prevailed. So they wanted to see if CUBRID's IN and UNION operators perform faster or not, and which one is actually more efficient.

This article explains the entire flow of the test we have conducted to determine the performance benefit of the new CUBRID 8.4.0 over MySQL 5.1 deployed at present on their live servers. The test environment, scenarios as well as results are discussed below.

Test Database Information

The tables used in the test

friends

posts

The indexes used in the test

friends table

unique(user_id,friend_id)

posts table

primary key(id)

index(reg_date)

index(author_id,reg_date)

Scenario

The following is a list of scenarios where users are separated based on the number of friends they have:

T1: users with 50 or less friends

T2: users with 51~2000 friends

T3: users with 2001 or more friends

T4: this scenario contains 40% of T1, 50% of T2, 10% of T3 users

T5: this scenario contains 10% of T1, 50% of T2, 40% of T3 users

When the scenarios are executed one by one, we retrieve the list of all friends and their posts in one transaction.

Then all transactions used in the scenarios are divided into those which use IN query and those which use UNION query.

Test User Groups

In this test users are divided into three groups based on the number of friends they have. Based on the real proportion of these groups retrieved from the live SNS service, each group is stored in a separate test database table as explained below.

user_group_1 table

stores 1,442,329 user IDs who have 50 or less friends

user_group_2 table

stores 85,568 user IDs who have 51~2000 friends

user_group_3 table

stores 836 user IDs who have 2001 or more friends

After each user ID is mapped with a sequentially generated ID in each of the tables, we randomly choose one of the sequentially generated ID and retrieve the mapped user ID.

The following is the SQL statements used to create and populate these user group tables.

Statistics Comparison

The major goal of this experiment was to obtain the performance difference of IN and UNION operators in the new CUBRID 8.4.0 and MySQL 5.1. As we expected, the new CUBRID 8.4.0 can leverage its improved database engine along with its new index structure and algorithm very well.

Besides this experiment, we have conducted a comparison test of the resource usage pattern between the new CUBRID 8.4.0 and previous CUBRID 8.3.1. With the same configurations we have executed the following query.

The workload was generated with 40 threads. We have been monitoring the resource usage patterns for 10 minutes. As a result the following figures were obtained (the less, the better).

CUBRID 8.4.0

CUBRID 8.4.0 without in-place sorting

CUBRID 8.3.1

Num_data_page_fetches

378

2836

229316

Num_data_page_dirties

16

2279

206182

Num_data_page_ioreads

135

2103

121851

In the above table we can see that the new CUBRID 8.4.0 uses much less resources than the previous version.

CUBRID 8.4.0 fetches over 600 times less data pages than 8.3.1.

As a result CUBRID 8.4.0 leaves 1290 times less dirty data pages than 8.3.1.

Moreover, CUBRID 8.4.0 performs for over 900 times less IO operations than 8.3.1.

More than that we can evidently notice that the new in-place sorting feature we have implemented in 8.4.0 has a significant impact on resource usage. It is compared in the second and third column figures obtain when in-place sorting was enabled, then disabled. In fact, it is not possible for a user to enable or disable it, as it is an internal feature in CUBRID, which is always ON. We have tested this ourselves to see the impact of the new in-place sorting feature.

Results Analysis

In case of the T4 test, which is the most realistic scenario, CUBRID’s IN query has performed twice faster than MySQL UNION query and 8 times faster than MySQL IN query.

In case of the T3 scenario, the performance was much higher when compared to other scenario cases because the buffer hit ratio was very close to 100%.

CUBRID 2008 R4.0 performance analysis:

The effect of Key limit has decreased the IO operations for about 50 times

When Key limit is applied the in-place sorting is resulted in approximately 10 times decrease of the IO operations.

If in-place sorting is not used, the T4 scenario results in 6 TPS which will be similar to the test results of the UNION query.

In case of the UNION query, when sorting, the partial results were repeatedly created in a temp file which had an impact on the performance which also resulted in high number of dirty pages.