How Logs Work On MySQL With InnoDB Tables - What Happens With Data-Change Statements?(Page 2 of 4 )

When you change data with UPDATE, INSERT or DELETE queries you're changing the data in two places: the log buffer and the data buffers. Buffers are fixed-length, typically a multiple of 512 bytes. And they are in memory – InnoDB won't write them to disk... yet.

For example, after "INSERT INTO Jobs VALUES (1,2,3)" the log buffer will have one new log record -- call it Log Record #5 -- containing a row identifier and the new contents. Meanwhile, the data buffer will have one new row, but it will also have a mark in the page header saying "the latest log record for this page is Log Record #5". In this example "#5" is the Log Sequence Number (LSN), and it's vital for scheduling later operations.

Some details about the data-change:

(a) An INSERT log record contains only new data, which is enough so that the procedure can be repeated on the same page if necessary. This is called a "redo" entry.

(b) The LSN isn't a field in the log record, instead it's an absolute address for a file and byte offset.

After InnoDB has changed the log buffer and the data buffers, it's all over but the disk writing. But that's where things get complex. There are several threads monitoring buffer activity, and three situations -- overflow, checkpoint, and commit -- that result in disk writes.

What Happens With Overflows?Overflow is rare because InnoDB takes pro-active measures to prevent buffers from filling up (see "what happens with checkpoints" below). Still, let's discuss the two possible cases.

One: if the log buffer gets full, InnoDB writes the buffer at the "end" of the log. I've put the word "end" inside quotes because a log file, or more precisely a group of log files, looks like a snake swallowing its tail. If there's only room for four log records and we're writing #5, then it will have to go at the start of the file.

There's no such thing as a log that grows forever. Even though InnoDB uses some compression tricks, the log files would get too big to fit on any disk drive. So InnoDB writes "in a circle" and that means it must overwrite old log records. This circular logging policy has implications which we'll come back to later.

Two: if the data buffers get full, InnoDB writes the least recently used buffer to the database -- but not too quickly! This is where the page header's LSN becomes useful. First, InnoDB checks whether it's greater than the LSN of the last log record in the log file. If it's greater, then InnoDB must write out log records first, until the log catches up with the data, and only then can it write the data. In other words data page writing cannot occur until writing of the corresponding log record has occurred. That is the "Write-Ahead Logging" principle which is common to all important DBMSs except InterBase.

What Happens With Checkpoints?I said that InnoDB takes some pro-active measures against overflows, and the most important of these measures is checkpointing. There is a separate thread, or a combination of threads that are separate from the thread that changes the buffers. At fixed intervals the checkpointer will wake, look for buffer changes, and ensure that writes happen.

By my count, most DBMS's would write everything at this time so there are no buffers left which are changed but unwritten. To use the usual jargon, the DBMS would flush all "dirty" buffers with a "Sharp Checkpoint". But InnoDB only ensures that (a) log and data buffers aren't getting fuller than a fixed threshold point, (b) log writing stays ahead of data page writing, (c) there is no data buffer whose page-header LSN corresponds to a log record that's about to be overwritten. In the jargon, this means InnoDB is a "Fuzzy Checkpoint" aficionado.

At checkpoint time it's possible to write another log record which says, in effect: at this moment a checkpoint occurred, so it's certain that the database is up to date except for a few dirty pages, and here is a list of the dirty pages. This information can be useful during a recovery, so I'll mention it again later.