How Big Can Your Galera Transactions Be

While we should be aiming for small and fast transactions with Galera, it is always possible at some point you might want a single large transaction, but what is involved?

First, this is supposed to be controlled by two settings,
wsrep_max_ws_rows and
wsrep_max_ws_size . The first variable is not yet enforced and has no effect – see here and here – so don’t bother tuning this knob just yet. In my opinion, I would rather implement only one – having a limit by rows is hard to control as a DBA since each row’s size can be very different per workload.

The second variable restricts the writeset size in bytes and has better control on cluster performance. If your network and CPU can only process N amount of bytes per second, this is a good variable to enforce. Additionally, the maximum allowed value for this setting is only 2GB, with a default of 1GB. You can actually set this higher than 2GB, but only the 2GB is being enforced in my tests.

Let’s see what happens when we adjust this setting to large values. I have a sysbench table with 10M rows below and will update all rows in a single transaction.

If you really must process a large number of rows, one way to try and reduce the amount of writeset size is to set
binlog_row_image to
minimal . Let’s test this again with a 2GB
wsrep_max_ws_size and see how it goes:

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

node1 mysql>set global wsrep_max_ws_size=1024*1024*1024*2;

Query OK,0rows affected(0.00sec)

node1 mysql>select@@wsrep_max_ws_size;

+---------------------+

|@@wsrep_max_ws_size|

+---------------------+

|2147483648|

+---------------------+

1row inset(0.00sec)

node1 mysql>SET GLOBAL binlog_row_image=minimal;

Query OK,0rows affected(0.00sec)

node1 mysql>select@@binlog_row_image;

+--------------------+

|@@binlog_row_image|

+--------------------+

|MINIMAL|

+--------------------+

1row inset(0.00sec)

node1 mysql>show global status like'wsrep_replicated_bytes';

+------------------------+-------------+

|Variable_name|Value|

+------------------------+-------------+

|wsrep_replicated_bytes|13211964556|

+------------------------+-------------+

1row inset(0.00sec)

node1 mysql>updatetsetk=k+1;

Query OK,10000000rows affected(11min18.33sec)

Rows matched:10000000Changed:10000000Warnings:0

node1 mysql>show global status like'wsrep_replicated_bytes';

+------------------------+-------------+

|Variable_name|Value|

+------------------------+-------------+

|wsrep_replicated_bytes|13402597135|

+------------------------+-------------+

1row inset(0.00sec)

node1 mysql>select(13402597135-13211964556)/1024/1024asws_size;

+--------------+

|ws_size|

+--------------+

|181.80139446|

+--------------+

1row inset(0.01sec)

In our tests, we reduced it to at least 10x the original writeset size. Thanks to my colleague Przemyslaw for pointing this out.

Now we know how big we can go in terms of size and how to go further with minimal row image, although this does not mean that you can and should be allowing it. Make sure to set a sane limit on the sizes depending on your workload and realistic performance expectations. Large transactions will not only cause unexpected performance issues with your cluster, but will lead to usability issues as well in terms of increased deadlocks. Lastly, make sure to review what limitations there would be when switching to Galera-based clusters here for an enjoyable experience :).

Related

As Senior Consultant, Jervin partners with Percona’s customers on building reliable and highly performant MySQL infrastructures while also doing other fun stuff like watching cat videos on the internet. Jervin joined Percona in Apr 2010.

6 Comments

I think there is another interesting note, thing to test when it comes to large transaction sizes.
If I’m not mistaken the write set serialization is serialized, which means if you have transaction which modifies 10M or rows and it takes 2sec to certify (for example) the cluster will have to pause all other write traffic for this duration. Or was this changed in more recent version ?

Really none to be specific. You can check flow control and wsrep_cert_deps_distance but these can be approximations. From this idea I opened https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1512940

Perhaps we can also extend slow logging to include certification time.

Really none to be specific. You can check flow control and wsrep_cert_deps_distance but these can be approximations. From this idea I opened https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1512940

Perhaps we can also extend slow logging to include certification time.