Why does MySQL replication fail?

When considering active-active multi-master, you must consider it’s foundation technology. Although MySQL replication is straightforward to setup, it can fail in a myriad of ways. Most of those are known and well understood. We can solve them only if we use the technology in the standard way.

* No built in checksums – pt-table-checksum is a must
–> There is from 5.6, but pt-table-checksum solves a different need and is still required. You could change this point to “no built in consistency checking”.

* temp tables disappear after restart
–> This applies only to statement based. Row based doesn’t replicate temp tables. And you are correct, if you modify data via temp tables – don’t use statement based.

* even with row-based mysql can fall back to statement
–> This is the case for DDL, but it’s fairly deterministic in how it works. I don’t recommend MIXED myself for this reason, because the behaviors are different (ROW: triggers execute on master, STATEMENT: triggers execute on slave).

* row-based does not include SQL in binlogs
–> Available in 5.6 as informational events.

* harder to do online schema changes by switching masters
–> The binlog_format is a session variable, pt-online-schema-change changes to STATEMENT itself for this to work. The specific issue is “not possible to do online schema change with a tiered replication architecture” since the slave in the middle breaks things.