Tag: speed

Here’s a simple trick that I recently rediscovered when I worked on a boot time reduction project for a customer. It’s not rocket science, but you may not be aware of it.

Our customer was using fbv to display its logo right after the system booted. This is a way to show that the system is available while you’re starting the system’s main application:

fbv -d 1 /root/logo.bmp > /dev/null 2>&1

With Grabserial and using simple instrumentation with messages issued on the serial console before and after running the command, we found that this command was taking 878 ms to execute. The customer’s system had an AT91SAM9263 ARM SOC from Atmel, running at 200 MHz.

Even if fbv is a simple program (22 KB on ARM, compiled with shared libraries), decoding the logo image is still expensive. Here’s a way to get this compute cost out of your boot sequence. All you have to do is display your logo on your framebuffer, and then capture the framebuffer contents in a file:

fbv -d 1 /root/logo.bmp
cp /dev/fb0 /root/logo.fb

The new file is now a little bigger, 230400 bytes instead of 76990. However, displaying your boot logo can now be done by a simple copy:

dd if=/root/logo.fb of=/dev/fb0 bs=230400 count=1 > /dev/null 2>&1

This command now runs in only 54 ms. That’s only 6% of the initial execution time! The advantage of this approach is that it works with any kind of framebuffer pixel format, as long as you have at least one program that knows how to write to your own framebuffer.

Note that the dd command was used to read and write the logo in one shot, rather than copying in multiple chunks. We found that the equivalent cp and cat commands were slightly slower. Of course, the benchmark results will vary from one system to another. Our customer had heavily optimized their NOR flash access time. If you run this on a very slow storage device, using a much faster CPU, the time to display the logo may be several impacted by the time taken to read a bigger file from slower storage.

To get even better performance, another trick is to compress the framebuffer contents with LZO (supported by BusyBox), which is very fast at decompressing, and requires very little memory to run:

lzop -9 /root/logo.fb

The new /root/logo.fb.lzop file is now only 2987 bytes big. Of course, the compression rate will depend on your logo image. In our case, the splashscreen contains mostly white space and a simple monochrome company logo. The new command to put in your startup scripts is now:

lzopcat /root/logo.fb.lzo > /dev/fb0

The execution time is now just 52.5 ms! With a faster CPU, the time reduction would have been even bigger.

The ultimate trick for having a real and possibly animated splashscreen would be to implement your own C program, directly writing to the framebuffer memory in mmap() mode. Here’s a nice tutorial showing how easy it can be.

As Michael stated in his review of the interesting features in Linux 2.6.30, new compression options have been recently added to the kernel. We therefore decided to have a look at those compression methods, from a compression ratio and decompression speed point of view.

This comparison will be based on “self-extractible kernels”, that is, kernel images containing bootstrap code allowing them to extract a compressed image. As underlined in the previous article, this approach is not used on all architectures. Blackfin, notably, chose a different path and compresses the whole kernel image, without including bootstrap code. While this has the clear advantage of making compression much simpler with respect to kernel code, it forces decompression out to the bootloader code.

Each of those methods has its advantages. Indeed, the Blackfin approach relies on the bootloader to provide the necessary functions, so that may be a problem to do things like bypassing u-boot to reduce the boot time. On the other hand, implementing it only once in the bootloader (as architecture-independent code) makes it unnecessary to write the low-level bootstrap code for each architecture in the kernel, which is surely interesting on virtually all architectures, the notable exceptions being x86 and ARM.

Gzip (also known as Zlib or inflate) has been the traditional (and, as a matter of fact, only) method used to compress kernels. Consequently, we’ll use it as the reference in the following tests. Our test environment is as follows:

This comparison includes figures for LZO, a new kernel image compression method that I have contributed to the Linux sources, and which hopefully will make its way into the mainline kernel. LZO support in the kernel is only new for kernel decompression, as it is already used by JFFS2 and UBIFS. LZO is a stream-oriented algorithm, and although its compression ratio is lower than that of gzip, decompression is lightning-fast, as we will soon find out.

So, here are the figures, average on 20 boots with each compression method:

Uncompressed

3.24Mo

–

–

200%

LZO

1.76Mo

0.552s

70%

109%

Gzip

1.62Mo

0.775s

100%

100%

LZMA

1.22Mo

5.011s

646%

75%

Bzip2 has not been tested here: the low-level bootstrap file, head.S, only allocates 64Kb for use by malloc() on ARM. Some quick tests showed that the kernel would not extract with less than 3.5Mib of malloc() space. That would require to modify head.S so that malloc can use more memory, which we will not do here. However, given that enough memory is usable on the system, one could well use bzip2. All the other algorithms performed the extraction using the standard 64Kib malloc space.

From the results, we can clearly see that LZMA is nearly unsuitable for our system, and should be considered only if the space constraints for storing the kernel are so tight that we can’t afford to use more space that was is strictly necessary.

LZO looks like a good candidate when it comes to speeding up the boot process, at the expense of some (almost neglectable) extra space. Gzip is close to LZO when it comes to size, although extraction is not as fast. That means that unless you’re hitting corner cases, like only having enough space for a Gzip compressed image but not for one made with LZO, choosing the latter is probably a safe bet.

Besides, the LZO-compressed kernel size is about 54% the size of the uncompressed kernel. As the kernel load time varies linearly with its size, load time for an uncompressed kernel doubles. While 0.55s are won because there’s no need to run a decompression algorithm, you spend twice as much time loading the kernel. This time is not negligible at all compared to the decompression time. Indeed, loading the uncompressed image takes roughly 0.8s. That means that at the cost of slowing down the boot process by 0.15s (compared to an uncompressed kernel), one gets a kernel image which is roughly twice as small. Rather nice, isn’t it?