Just something whacky to get you thinking. At least it got me pulling apart my hair, and now, I'm half bald.

Saturday, October 11, 2014

Deamortizing disk writes during log rotation

According to this page on deamortization, "deamortization refers to the process of converting an algorithm with an amortized bound into one with a worst-case bound."

Log Rotation refers to the automated process used in system administration in which dated log files are archived. logrotate is one such UNIX command that lets you perform log rotation.

Many services (daemons in the UNIX world) write out log files that need to be periodically rotated and compressed. This is achieved using the logrotate command (mentioned above). Certain tools such as multilog also assist in augmenting the log with extra information such as the timestamp, etc...

When log files are rotated and compressed, it ends up reading the complete log file from disk (if not present in the kernel caches), and using a lot of CPU for the duration of compressing the log file. Depending upon the size of the log file, this could take anywhere from a fraction of a second to 10s of seconds. If the log file is large (which is usually the case), it ends up starving other processes that need to use the disk or CPU. If the supervising process (performing the job of log rotation and compression) is synchronously reading from the daemon service's stdout/stderr, and is taking too long to rotate/compress the logs, then it will block writes to the daemon's stdout/stderr file descriptors since the pipes between the supervisor and the daemon will be full.

This can be mitigated by deamortizing the costs of log rotation/compression by writing out the compressed file incrementally instead of compressing the uncompressed log file when it is rotated and a new log file is started. This not only deamortizes the I/O and CPU costs, but also ensures that we log a lot lesser in terms of bytes (on disk). This is because we were earlier writing out the uncompressed file and subsequently compressing it and writing the compressed log file, whereas we will now only write out the compressed file (eliminating writes of the uncompressed file, which is a large fraction of total log file writes). If we write out just the compressed log file incrementally, we're saving greatly on the IO cost associated with writing out the uncompressed log file. This simple change can go a long way in helping you scale your services to be able to handle a larger number of overall requests.