Timezone and pt-table-checksum

Timezone and pt-table-checksum

I recently worked through an issue with a client trying to detect data drift across some servers that were located in different timezones. Unfortunately, several of the tables had timestamp fields and were set to a default value of CURRENT_TIMESTAMP. From the manual, here is how MySQL handles timezone locality with timestamp fields:

Values for TIMESTAMP columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval.

This can result in every row being deemed out of sync when each server is using in it’s own timezone. Fortunately, there is a workaround for this as a result of how pt-table-checksum/pt-table-sync calculate their checksums. Lets look at an example:

Shell

1

2

3

4

5

6

7

8

9

Node1>PDT>select timediff(now(),convert_tz(

->now(),

->@@session.time_zone,'+00:00'

->))astz_offset;

+-----------+

|tz_offset|

+-----------+

|-07:00:00|

+-----------+

Shell

1

2

3

4

5

6

7

8

9

Node2>EDT>select timediff(now(),convert_tz(

->now(),

->@@session.time_zone,'+00:00'

->))astz_offset;

+-----------+

|tz_offset|

+-----------+

|-04:00:00|

+-----------+

Node1 is configured in PDT time while node2 is configured in EDT time. Now, lets insert some rows into a table with timestamp field configured to use CURRENT_TIMESTAMP as the default and verify that they look different when no timezone manipulation takes place:

Shell

1

2

3

4

5

6

7

8

Node1>PDT>select*from foo.bar;

+----+-------------+---------------------+

|id|data|date_created|

+----+-------------+---------------------+

|1|node1,row1|2012-09-2409:38:49|

|2|node1,row2|2012-09-2409:38:49|

|3|node1,row3|2012-09-2409:38:49|

+----+-------------+---------------------+

Shell

1

2

3

4

5

6

7

8

Node2>EDT>select*from foo.bar;

+----+-------------+---------------------+

|id|data|date_created|

+----+-------------+---------------------+

|1|node1,row1|2012-09-2412:38:49|

|2|node1,row2|2012-09-2412:38:49|

|3|node1,row3|2012-09-2412:38:49|

+----+-------------+---------------------+

So, when we run pt-table-checksum against node1, we see that even though MySQL is storing these values in UTC internally, we have “data drift” on both nodes:

This is where we can use the @@session.time_zone variable to our advantage. If both servers were set up using UTC as the timezone (which is the recommended practice – store everything in UTC and then perform the timezone logic in the application), we wouldn’t see any data drift. And as explained above, the timestamps are actually stored in UTC on the server regardless of the timezone. So, to overcome the timezone inconsistencies, we can use the –set-vars option with pt-table-checksum like this:

This effectively sets both servers to UTC (in terms of display logic) and the checksums are calculated in the same locality:

Shell

1

2

3

4

5

6

Node1>PDT>select this_crc,master_crc from percona.checksums;

+----------+------------+

|this_crc|master_crc|

+----------+------------+

|4cdfe786|4cdfe786|

+----------+------------+

Shell

1

2

3

4

5

6

Node2>EDT>select this_crc,master_crc from percona.checksums;

+----------+------------+

|this_crc|master_crc|

+----------+------------+

|4cdfe786|4cdfe786|

+----------+------------+

This same option can and should be applied to pt-table-sync as well, especially if the checksums were created without it. Since it uses the checksum table to find the chunks but still recalculates the checksums on the fly to get the actual rows in need of syncing, it will not find those rows even though pt-table-checksum reported the chunk as out of sync. You can verify this yourself looking at the actual SQL generated by pt-table-sync when looking at rows:

When you update your session timezone (SET @@session.time_zone = “+01:00” for example), you will see different output for this. Experiment with different timezones and each time, the checksum will change. However, the underlying data isn’t touched and remains the same.

In general, it is safest to always use UTC as your timezone for MySQL to prevent this extra logic from being required, but the use of –set-vars can be a very powerful tool when you need to influence the session variables on different servers for whatever reason.

Related

Author

Michael joined Percona in 2012 as a US based consultant. Prior to joining Percona, Michael spent several years in a DevOps role maintaining a SaaS application specializing in social networking. His experiences include application development and scaling, systems administration, along with database administration and design. He enjoys designing extensible and flexible solutions to problems.
When not working, he enjoys golfing, grilling, watching sports, and spending time with the family.

Share this post

Comment (1)

Hey Mike, thanks for sharing this experience. I’ve never had to manage cross-timezone servers, but this is an apt reminder to ‘plan for the future’ and so I’ll be making sure my server config files include the default_time_zone option…just in case I forgot to set the server locale to be UTC 😉