- Replication is the concept of taking data from one machine and copying it over to one or more separate machines. - Why would we want that? It can be used for a multitude of tasks including as part of a foundation to build larger high performance systems, keeping a “hot” spare of your server, provide a place to generated backups away from the production system, providing a development area with real data .

-

Who knows what the binary log is? Who knows what the relay log is? Replication is based on - master server keeping track of all changes in its binary log. - binary log serves as a written record of all events that modify database structure or content (data) - relay log is a log kept on the slave that consists of the events read from the binary log of the master. - implementation is one-way, asynchronous. * slaves pull the information from the master * they do not have to be connected to the master all the time. So updates can occur over long-distance connections and even over temporary or intermittent connections such as a dial-up service. Not too bad. pretty easy to understand. But each step is actually multiple complex steps.

bullet 1 sub-bullet 1: Right before a transaction on the master that alters data commits... bullet 1 sub-bullet 1 sub-sub-bullet 2: even if the transactions are interwoven on the master during execution - Can you see any problems with this? Potentially you could have a binary log entry written but never run on the master... How? (Answer: server crash between the writing to the binary log and the commit of the transaction. When the server comes back up the transaction will be rolled back, even though it is already in the binary log. Potential to get master/slave out of sync. )

Slave pulls the data from the master, rather than the master pushing the data to the slave. This will happen for each slave. bullet 1: The state of this thread is shown as Slave_IO_running in the output of SHOW SLAVE STATUS or as Slave_running in the output of SHOW STATUS. bullet 3: thread identified in the output of SHOW PROCESSLIST on the master as the Bin log Dump thread. It acquires a lock on the master&apos;s binary log for reading each event that is to be sent to the slave. As soon as the event has been read, the lock is released, even before the event is sent to the slave.

bullet 4: need to know about this for security concerns. If it makes it into the relay log - it will happen. bullet 5: master server can be writing to the binary log with N threads (parallel) but the slave only has the one thread to repeat all the commands done on the master (serial). Slave should be more powerful then the master - it will be doing everything the master does *and* it’s own workload

-

also called logical replication bullet 1: replicates entire SQL statements bullet 2 sub-bullet 2: the SQL is written to the log - not all the rows changed and how they are changed bullet 2 sub-bullet 3: contain all statements that made any changes bullet 3 sub-bullet 1: Def deterministic: guaranteed output with a given input - unfortunately there are quite a few Examples: 1) DELETE and UPDATE state ments that use a LIMIT clause without an ORDER BY 2) using any of the following functions: UUID(), UUID _SHORT (), USER (), FOUND_RO WS(), LOAD_FILE(), MASTER_POS_WAIT() , SLEEP(), VERSION(), et c. bullet 3 sub-b ullet 2: Examples: INSERT ... SELECT requires a greater number of row-level l ocks, UPDATE statements that require a table scan (because no index is used in th e WHER E clause) must lock a greater number of rows

bullet 1: Row-based binary logging logs changes in individual table rows. The master writes events to the binary log that indicate how individual table rows are changed. bullet 2 sub-bullet 4: - On the Master: INSERT ... SELECT , INSERT statements with AUTO_INCREMENT, UPDATE or DELETE statem ents w ith WHERE clause s that do not use keys or do not change most of the examined rows. - On the Slave: INSERT, UPDATE, or DELETE statement s bull et 3 sub -bulle t 1: SBR lo gs jus t the UPDATE statement. RBR logs each row changed by that UPDATE. - More data means it may take longer to use the binary logs to recover the server and the binary log will be locked for the writing of the data to it bullet 3 sub-bullet 2: - Examples: - Until 5.1.29 you couldn’t read the actual statements that caused changes. After that you can use --base64-output=DECODE-ROWS and --verbose. with mysql binlog - Prior to 5. 1.24, it was possible to get dif ferent re sults on the slave then from on the master. Caused by a bug that handled locking of rows as they were accessed. Corrected now.

bullet 3: Some examples: UUID() one or more tables with AUTO_INCREMENT columns are updated and a trigger or stored function is invoked any INSERT DELAYED is ex ecuted. call to a UDF is involved individual engines can also determine the logging format used when information in a table is updated

-

- slaves should be more powerful from Master since they have to do all the work from the master and all the reads for the slave - master can only expand so much. For each slave it has it will have to handle the connection and the sending of the binlog - multiple layouts - Master/Slave, Master/Master (not recommended unless Hot Master/Cold Master), Pyramid etc.

Having a copy of the data on the master: bullet 2: you can stop the slave to get a clean backup of the master without interfering with the availability of public facing system bullet 3: using MMM you can handle failover to a “hot swap” system that has been updating to keep up with the original. No single point of failure. bullet 4: allows you to test with real world data to have a better idea of your applications interaction with it bullet 5: have a different storage engines between the master and the slave on tables to take advantage of a specific storage engines abilities (full-text searching, support of transactions

Reporting queries tend to be very different then the queries that are run by the application. This also gives the DBA an area to query the data to learn what about it - helps with query tuning or learn about trends in the data (data mning). All separate from the Master production server so it doesn’t interfere with its work.

- take into account latency on the network, so it will not be able to be completely “up-to-date” but something may be better then nothing. - Office/branch/developers/contractors can have a local copy without having access to the master

-

bullet 1: If this has not already been done, this part of master setup requires a server restart. bullet 2: If this has not already been done, this part of slave setup requires a server restart. bullet 3: Each slave must connect to the master using a MySQL user name and password, so there must be a user account on the master that the slave can use to connect. Does not require a specific replication account - but be aware that user name and password will be stored in plain text within the master.info file - SQL account solely for the purposes of replication

bullet1 sub-bullet 1: Look for File and Position in MASTER STATUS bullet1 sub-bullet 2: Pick your poison for how you want to do this. Both methods have manual pages for how to do to it. Maybe a want to test your backup procedures to see if it works... bullet 2 step 4: mysql&gt; UNLOCK TABLES;

Bold is all that is really required. [] are optional configs if you need them not all options are shown

-

Known Gotcha: default database and qualified tables (database.table) can cause a query to not be replicated when you think it should.

-

Slave_IO_State: A copy of the State field of the SHOW PROCESSLIST output for the slave I/O thread. Master_Log_File: master binlog file from which the I/O thread is currently reading. Read_Master_Log_Pos: position in the current master bin log file that I/O thread has read to. Relay_Log_File: relay log file from which the SQL thread is currently *reading* and executing. Relay_Log_Pos: position in relay log file up to which the SQL thread has read and executed. Relay_Master_Log_File: name of the master binlog containing the most recent event executed by the SQL thread.

Exec_Master_Log_Pos: position in the binlog up to which the SQL thread has read and executed. - The coordinates given by (Relay_Master_Log_File, Exec_Master_Log_Pos) in the master&apos;s binary log correspond to the coordinates given by (Relay_Log_File, Relay_Log_Pos) in the relay log. Relay_Log_Space: total combined size of all existing relay log files. Seconds_Behind_Master: In essence, this field measures the time difference in seconds between the slave SQL thread and the slave I/O thread. This field is an indication of how “late” the slave is: - When the slave SQL thread is actively processing updates, this field is the number of seconds that have elapsed since the timestamp of the most recent event on the master executed by that thread. - When the SQL thread has caught up to the slave I/O thread and is idle waiting for more events from the I/O thread, this field is zero. Gotcha: If the network is slow, this is not a good approximation; the slave SQL thread may quite often be caught up with the slow-reading slave I/O thread, so Seconds_Behind_Master often shows a value of 0, even if the I/O thread is late compared to the master. In other words, this column is useful only for fast networks . Last_IO_Errno/Last_IO_Error: error number and error message of the last error that caused the I/O thread to stop. Last_SQL_Errno/Last_SQL_Error: error number and error message of the last error that caused the SQL thread to stop.

6.
At a high level <ul><li>On the master </li></ul><ul><ul><li>makes a change to the data </li></ul></ul><ul><ul><li>writes it to the binary log </li></ul></ul><ul><li>On the slave </li></ul><ul><ul><li>copies the masters binary logs to the relay logs </li></ul></ul><ul><ul><li>runs the relay logs applying the changes </li></ul></ul>

7.
Nitty Gritty of Master Side <ul><li>Master makes a change and writes the binlog entry </li></ul><ul><ul><li>Details: </li></ul></ul><ul><ul><ul><li>it writes the the changes to the binary log </li></ul></ul></ul><ul><ul><ul><li>writes the transactions serially </li></ul></ul></ul><ul><ul><ul><li>After writing to the binary log, the master tells the storage engine to commit the transaction. </li></ul></ul></ul>

8.
Enter the Slave IO thread <ul><li>Slave creates an I/O thread which connects to the master </li></ul><ul><li>Slave connects to the master just like any other client then starts a binlog dump process </li></ul><ul><li>Master then creates a thread to send the binary log contents to a slave when the slave connects </li></ul><ul><li>Slave IO thread writes the binary log events to the slaves relay log </li></ul><ul><li>Once slave catches up with master, IO thread goes to sleep and waits for the master to signal it has new events </li></ul>

9.
Slave SQL Thread <ul><li>Separates the actual execution of the binary log events from the retrieval of it on the master </li></ul><ul><li>Read and replays the events from the relay log </li></ul><ul><li>updates the slaves data to match the masters </li></ul><ul><li>Has all privileges so it can run any query that is sent </li></ul><ul><li>potential bottleneck </li></ul>

11.
Basic Info <ul><li>3 binary log formats: </li></ul><ul><ul><li>Statement Based Replication (SBR) </li></ul></ul><ul><ul><li>Row Based Replication (RBR) </li></ul></ul><ul><ul><li>Mixed </li></ul></ul><ul><li>The format of the binary log has no relevance to how the slave handles it. The SQL thread on the slave can and will handle any binary log format given to it </li></ul><ul><li>controlled by setting the binlog_format </li></ul><ul><li>each format has its pros and cons </li></ul>

12.
Statement Based Replication (SBR) <ul><li>Been used by all previous versions of replication </li></ul><ul><li>Pros: </li></ul><ul><ul><li>Proven </li></ul></ul><ul><ul><li>Less data written to log files. </li></ul></ul><ul><ul><li>Can be used for audit purposes </li></ul></ul><ul><li>Cons: </li></ul><ul><ul><li>Some statements are unsafe </li></ul></ul><ul><ul><ul><li>Any nondeterministic behavior is difficult to replicate </li></ul></ul></ul><ul><ul><li>More locking may be needed then Row Based </li></ul></ul><ul><ul><li>Complex statements will have to be evaluated and executed </li></ul></ul><ul><ul><li>Deterministic UDFs must be applied on the slaves </li></ul></ul><ul><ul><li>InnoDB: INSERT statement using AUTO_INCREMENT blocks other nonconflicting INSERT statements. </li></ul></ul>

13.
Row Based Replication (RBR) <ul><li>Replicates only the changed rows </li></ul><ul><li>Pros: </li></ul><ul><ul><li>All changes can be replicated </li></ul></ul><ul><ul><li>Safest form </li></ul></ul><ul><ul><li>Same as most other RDBMS </li></ul></ul><ul><ul><li>Fewer locks required </li></ul></ul><ul><li>Cons: </li></ul><ul><ul><li>Generally more data to be logged </li></ul></ul><ul><ul><li>Some problems with older versions but fixed now </li></ul></ul><ul><ul><li>large BLOBs can take longer to replicate </li></ul></ul>

14.
Mixed Replication <ul><li>Uses both SBR and RBR </li></ul><ul><li>Statement-based logging is used by default </li></ul><ul><li>Automatically switches to row-based logging in particular cases </li></ul><ul><ul><li>http://dev.mysql.com/doc/refman/5.1/en/binary-log-mixed.html </li></ul></ul><ul><li>Can provide best of both worlds - but requires testing </li></ul>

16.
ScaleOut <ul><li>Very common use case </li></ul><ul><li>Scale out load to multiple servers </li></ul><ul><ul><li>Reads can be sent to slaves </li></ul></ul><ul><ul><li>Writes are done on the Master </li></ul></ul><ul><li>Good for high read workloads </li></ul><ul><li>Some improvement to writes if Master only writes </li></ul>

29.
Table Filters 1 Start (Following DB options) Any replicate-*-table options? execute UPDATE and Exit Which logging format? Statement Row For each statement that performs an update.. For each update of a table row... No Yes

30.
Table Filters 2 (do/ignore) Any replicate-do-table options? execute UPDATE and Exit Any replicate-ignore-table options? Does the table match any of them? Yes No Yes No ignore UPDATE and Exit Does the table match any of them? Yes Yes No No

31.
Table Filters 3 (wild do/wild ignore) Any replicate-wild-do-table options? execute UPDATE and Exit Any replicate-wild-ignore-table options? Does the table match any of them? Yes No Yes No ignore UPDATE and Exit Does the table match any of them? Yes Yes No No

32.
Table Filters 4 Is there another table to be tested? Any replicate-do-table or replicate-wild-do-table options? Yes No No ignore UPDATE and Exit Yes execute UPDATE and Exit