He could then examine the checksum.checksum table and see all was well, however there were various tables with different crc values.

Shell

1

2

3

4

5

6

7

8

9

db:mydb

tbl:Foo_History

chunk:0

boundaries:1=1

this_crc:30627c76fe658fd9b77eaddf1ea8c03a

this_cnt:2593

master_crc:bdbadd7dae2636a8cf515bb886fb1295

master_cnt:2593

ts:2008-09-2404:50:05

So, now I needed to find out what was updating the table. Here is where tools like mysqlsla and Maatkit’s mk-log-parser come into their own as they both allow you to quickly parse the binary log files, extracting the relevant statements.

Shell

1

mysqlbinlog bin_log.000001|mysqlsla-lt binary-

Check out http://hackmysql.com/mysqlsla_filters for how to filter by statement.

Looking through the binary logs I could see this table is actually an audit table for changes to the Foo table. The trail is kept using two triggers on that table.

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

select trigger_name,

event_object_table,

Event_Manipulation

from information_schema.triggerswhere trigger_schema='mydb'

andaction_statement like'%Foo_History%'\G

***************************1.row***************************

trigger_name:Foo_Update

event_object_table:Foo

Event_Manipulation:UPDATE

***************************2.row***************************

trigger_name:Foo_Delete

event_object_table:Foo

Event_Manipulation:DELETE

So whats the problem with that?, well there is a situation where two overlapping transactions updating the Foo table can be reordered once serialized on the slave.

Here is an example:

I recreated the tables and triggers, populating the Foo table with a handful of rows and then ran the following.

Now when these statements get run on the slave they will be serialized, thus changing the order of the inserts made by the trigger. The Foo_History table is now out of sync.

Master:

Shell

1

2

3

4

5

6

7

8

9

10

***************************1.row***************************

Foo_History_Id:1

Name:maxlength

Value:7

Field_Id:3

***************************2.row***************************

Foo_History_Id:2

Name:maxlength

Value:7

Field_Id:51

Slave:

Shell

1

2

3

4

5

6

7

8

9

10

***************************1.row***************************

Foo_History_Id:1

Name:maxlength

Value:7

Field_Id:51

***************************2.row***************************

Foo_History_Id:2

Name:maxlength

Value:7

Field_Id:3

As you can see from the above, the updates were performed in a different order, with the inserts being assigned a different Foo_History_Id. This is because the statements are written to the binary log in commit order.

Share this post

Comments (10)

I think you’re subject line might be a bit misleading. I mean, replication is technically behaving exactly as it should be. If MVCC rules are correct and we do not have dirty reads, then from a client perspective, the order of events on the slave is more correct.

Depending on how the replica will be used, this might be fine. From a historical perspective, the order can’t truly conflict or you’d have gotten a deadlock in the above process. So historical analysis shouldn’t be broken. If your replica exists for emergency backup/restore, then having a few lines out of order shouldn’t cause any problems.

One obvious way around this is to find a way not to have an auto-incremented key on the history table. If you can make a unique, combined key as primary key, it should be the same regardless of the insert order. If that can’t be done, then consider skipping a primary key altogether – maybe the history table isn’t accessed enough for this to be a performance issue. If you can’t do either of those, then I would suggest that you disable the crc check on this table and just have mk-table-checksum do a row count.

I admit I had problems inventing a title for this post, the truth is that there is slightly more to the story. I also saw some statements in the binary logs that were deleting rows on the value of the autoincrement column, therefore deleting different row data. You are correct in stating that replication is behaving as expected, however from both the point of view of the developers and the system architect things are a little different. They are both expecting the slave to contain the same data as the master.

Good points all. Another major thing here is that the client doesn’t want to have any checks showing “orange lights” that are to be ignored, because that leads to bad things — people get numb to any sort of warnings that the slave has different data.

I think this table probably doesn’t need a primary key. I had pondered how to avoid the problem but there’s no candidate key at all in the data (Ewen didn’t show the real client data), so if there’s to be a PK it has to be a surrogate.

I guess this is one of those situations where not being able to see the full story can lead to all kinds of incorrect assumptions. For instance, when I saw there were deletions on the auto_increment value I immediately jumped to the conclusion that the application is deleting audit data, which would be a major major no no.

I’m sorry, but can you clarify how OLD.Foo_History_ID comes from? Looking at you trigger definition, it looks like it is comes from the table being updated, but from the result it looks like an auto_increment column on the history table.

If the former is the case, then they should be the same on the master and the slave.

If the latter is the case, then even without replication there might be a problem if the updated table also has an auto_increment column (which is my impression is not an issue here), as I believe MySQL doesn’t support having two auto_increments in the same “statement” (including triggers).

Of course, one should remember that the exact same issue exists, if the master is reloaded using the binlog.

I think that actually Jesper is right. In case of statement based replication there is also a restriction on number of auto-increment columns used by statement as whole.
Please see http://bugs.mysql.com/bug.php?id=19630. Using more than one auto-increment column in statement should be fine for row based or mixed mode replication though.

Very similar to what we are doing today, although order doesn’t matter since our application orders by timestamp. Giuseppe Maxia posted something recently on Revision Engine (http://www.ddengine.org/versioneng) and we started looking at this to gain logging without writing scripts to build triggers on every table.