WL#3970: Safe slave positions using XA Position Participant

SUMMARY
=======
- To fix BUG#26540
- To ensure the the slave always have a correct position, we
need to update the position using 2PC.
IMPLEMENTATION
==============
- Create handlerton for relay-log.info file ("rli file"). Register it
for slave transactions so that it handles all updates of the rli file.
- On Prepare:
- Add a new line with XA to rli file (something like this):
"prepared: pos12456 Xid123456"
- On Commit:
- When called as a handlerton, then
remove the "prepared" line and update position.
Something like this:
rli->group_relay_log_pos= rli->group_relay_log_pos_prepared;
rli->group_xid.null();
memcpy(rli->group_relay_log_name, rli->group_relay_log_name);
flush_relay_log_info(rli);
- On Rollback
- When called as a handlerton, then
remove the "prepared" line and update position
same as for commit.
(There can the non-transactional updates, that
have been logged, so position needs to be updated
also for rollbacks.)
COMMENT BY GUILHEM:
Yes and no, there can be two types of rollbacks in STATEMENT mode:
1) on master, transaction was only about transactional engines,
and slave had to stop in the middle of it (STOP SLAVE,
mysqladmin shutdown, or some unexpected duplicate key error),
then a rollback is going to happen, but next time slave
should restart from the transaction's start.
So we must NOT update the position.
2) on master, transaction updated a non-transactional table
and then rolled back so was binlogged as:
BEGIN;
UPDATE innodb;
UPDATE myisam;
ROLLBACK;
Then this ROLLBACK, when executed on slave, must update the position.
In ROW and MIXED modes, any update on a non-transactional table will be
logged outside the context of a transaction. So there will be no need to
log rollbacks in such modes.
- On Recovery
- On recovery() - return the Xid from the relay-log.info
- On commit_by_xid() - update the position
if xid matches, otherwise ignore
(or assert, perhaps - it should not happen)
- On rollback_by_xid - remove prepared line
without updating the position or assert if not matches
rli->group_xid.null();
flush_relay_log_info(rli);
- Remove the old code that updates the position in relay log info,
i.e. slave.cc:flush_relay_log_info()
This now needs to be handled by the handlerton (as described
above.)
- slave.cc:flush_relay_log_info() needs to have the
following lines added to store the prepared info:
if (!rli->group_xid.is_null()) {
pos=strmov(buff, rli->group_relay_log_name_prepared);
*pos++='\n';
pos=longlong2str(rli->group_relay_log_pos_prepared, pos, 10);
*pos++='\n';
pos=strmov(buff, rli->group_xid);
*pos++='\n';
}
- If the server does not decide to do two phase commit
(this happens when at least one participant can't do XA),
then the code needs to update position without the
prepare line in the commit/rollback calls.
TESTING
=======
- Testing transactional recovery
Test should (in the ideal case) include tests for:
a) "Crashing" (actually DBUG_EXECUTE_IFs) slave at various points
b) Testing with and without XA engines
c) Testing with mixed XA and non-XA engines
d) Testing with transactional and non-transactional engines
e) Testing with all three kinds of groups that we have:
1) statement groups (e.g. intvar + statement)
2) RBR group (table map + rbr event)
3) transaction (BEGIN, statement, COMMIT)
The below text is a modified description on the test control that
already exist in the code. It is from two emails written by Serg:
Use the test control injections that exist in
handler.cc:
% grep crash_ handler.cc
DBUG_EXECUTE_IF("crash_commit_before", abort(););
DBUG_EXECUTE_IF("crash_commit_after_prepare", abort(););
DBUG_EXECUTE_IF("crash_commit_after_log", abort(););
DBUG_EXECUTE_IF("crash_commit_before_unlog", abort(););
DBUG_EXECUTE_IF("crash_commit_after", abort(););
You can crash at any specific point in two phase commit.
Two possibilities, either you start mysqld as
mysqld -#d,crash_commit_after_prepare
and it'll crash at this point on the first commit. Or you can
activate it runtime:
SET SESSION debug="d,crash_commit_after_prepare"
for the current connection or
SET GLOBAL debug="d,crash_commit_after_prepare"
to do it for all connections. The next commit will crash at the
specified point. On restart the recovery must happen automatically,
and the transaction should be rolled back or committed, depending on
where you crash - first two "crash-points" are before a transaction is
written to binlog, meaning a rollback on recovery; others are after
binlog write, meaning a commit. So, Falcon will always be consistent
with the binlog. After a crash, even before the restart and recovery
you can examine binlog independently with mysqlbinlog to see if a
transaction was logged or not.
It is also important to be able to crash when some storage engines
has already commited and others did not. That is in the middle of
commit loop. Serg didn't have a "crash-point" there, as there were only
one XA-able storage engine anyway.
Here's a patch (untested):
===== handler.cc 1.313 vs edited =====
--- 1.313/sql/handler.cc 2007-06-28 00:55:29 +02:00
+++ edited/handler.cc 2007-06-28 00:54:47 +02:00
@@ -781,6 +781,7 @@ int ha_commit_one_phase(THD *thd, bool a
my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
error=1;
}
+ DBUG_EXECUTE_IF("crash_commit_between_engines", assert(ht == trans->ht););
status_var_increment(thd->status_var.ha_commit_count);
*ht= 0;
}
NOTES
=====
- Note that in the future we would like it to be possible to select
storage media for rli file (WL#2775) - either use file or use a table.
The solution above is for file, but it should be relatively easy to
change the flush to flush to table instead (if the user has
configured his system to use table instead of file). The
code to store the rli file into a table is not part of this
fix.
DRAWBACK OF THIS SOLUTION
=========================
- The rli needs to be synced twice to make it safe.
This could be controlled via options
SET SYNC_RLI_TIME=COMMIT, PREPARE_AND_COMMIT
SET SYNC_RLI_PERIOD=10 (example for every 10th transaction, not very safe)
This new sync will make a transaction sync to file
five times or so:
1. At prepare rli
2. At commit rli
3. At prepare storage engine
4. At commit storage engine (perhaps this is not needed)
5. At prepare/commit binlog (one-phase participant)
With WL#2775 the storage engine sync and the rli sync
can be done at once, provided that rli is stored in the
same storage engine as the affected tables of the transaction.
13:50 <serg> binlog needs to be synced too (once) and all affected storage
engines (innodb) - twice
13:52 <serg> there's a trivial optimization that can help immensely without
sacrificing the transactional safety
13:52 <serg> (unlike SET SYNC_RLI_PERIOD=10)
13:53 <serg> simply ignore some commits. like, issue only 1 out of 10 commits
13:53 <serg> it will make transaction larger
13:53 <serg> who cares
13:54 <serg> and more of a binlog to re-execute in a crash
13:54 <serg> ah, yes. one problem
13:54 <serg> in begin ... commit; begin ... commit; begin .. rollback;
13:55 <serg> the second commit cannot be ignored, naturally
13:55 <serg> but until rollback you don't know that
13:55 <serg> the solutuion could be:
13:55 <serg> instead of commit do savepoint "COMMIT_WAS_HERE"
13:56 <serg> then on rollback you rollback to savepoint
13:56 <serg> and every 10th commit you do a real commit
13:56 <serg> or every 100th
13:56 <serg> reduces the number of syncs everywhere, not only in ril
13:56 <serg> and still completely safe in case of a crash
COMPANION TASK (Guilhems suggestion)
====================================
- At Recovery, the relay log could be corrupted, master.info too,
or master.info not in sync with relay log, and so we need to
automatically re-fetch the pieces of relay log starting
from the "safe position".
Otherwise it's not crash-safe (WL#3970 makes
relay-log.info trustable but not master.info...).
- If we don't do it, this can happen:
- write event to relay log; crash;
- so master.info does not account for this event,
and when slave is restarted, it fetches the
event a second time, and so event will be executed twice.
- So the idea is that if slave crashed we throw away
master.info and "replace" it with relay-log.info
which is now safe thanks to WL#3970.