Henrik Mühe

A while ago, I stumbled upon a
guide to writing optimized C code.
I had read a few of the ideas before and previously discounted them as “something the
optimizer should really do for you”. In this and the next few posts I will take
individual pieces of advice and fact check whether rewriting your code as
suggested actually helps or not.

Let’s start with a simple proposition: Counting down to zero is cheaper
than counting up. A similar proposition is sometimes encountered which this
post will investigate as well: Comparing != (not equal) is faster than
< lower than in a loop.

This should be easy enough to code, benchmark and look at the assembly. Please
scroll down to the end of the post for a full disclaimer and test environment
discussion.

As you can see, the code executes the same number of noops but the up
implementation is counting up and comparing using lower then while the down
implementation counts down and compares not equal with zero.

Yep, it looks like I objdump -Sed the same file twice, however, I did not.
This actually generates the same assembly. Even cooler, it generates completely
different assembly with -O0 but optimizes to the exact same assembly. Our
benchmarking results make a lot of sense.

Therefore, both

counting down to zero is cheaper than counting up, and

comparing != (not equal) is faster than < lower than in a loop

are
busted for cases where the compiler is free to make this optimization.
One must not discount that compilers have advanced quite a bit
in the 10 years from the time the article I referenced earlier
was written but this is still great to know. I remember at least one occasion when someone
suggested rewriting a somewhat hot loop this way and this basically tells us
that – as long as the loop is changed the way it was changed here – it’s
probably not gonna lead to anything whatsoever. While the whole idea of rewriting
a loop construct like this is only sensible for the absolute hottest of loops,
counting down is not a bulletproof solution.

Before we look at the case when the compiler can not optimize the loop, let’s
quickly check the results on more common platforms.

Interestingly, this machine counts down but also optimizes both programs to the
exact same assembly. A guess as to why this is done is because mov can actually
take the constant 0x5f5e100 as an immediate, therefore the compiler saves one
register.

Wisely, Thomas Neumann suggested
going beyond whether writing your C code nop loop in a “count up” or “count down” manner
matters. This adds a much more important question that has so far not been
neglected in this writeup: What if the compiler simply can not change
the way your loop is counting because you are actually using the loop variable inside
the loops body in a way that does not give the compiler the freedom to change
the direction of the loop.

To generate a sensible benchmark for this deeper investigation, we examine
the instructions required to count up versus
down and stop gcc from optimizing one into the other. From the previous
examples, we assume that there is in fact a difference in terms of execution time
of the instructions used to count up versus those used for counting down. If there
was none, gcc would likely not go through the hassle of optimizing one into the
other (and in different directions as well: ARM gcc converts to loops counting up,
the two x86 platforms examined convert to instructions counting down).

In C, we go with Thomas’s suggestion (see code on GitHub, shortened here for
readbility) and implement the benchmarks along the lines of this C code snippet:

This looks really great, however it is on x86. For ARM, the C code leads to
terrible code being generated. For the purposes of this evaluation, we will
therefore not look at ARM for now as the arm benchmark would require manually
written (inline) assembly.

Measuring this on the same Atom CPU used above, we get the following results:

Quite obviously, counting down is the better choice on this machine (Intel Atom)
when it comes to machine code. This is also visible in the fact that insn_down
requires one instruction less as seen above. Also, gcc makes the right decision
and optimizes both ways of expressing the loop in C to the faster option

Running the same benchmark on the Intel Xeon mentioned before, we see a completely
different picture, first the assembly:

All implementations are equally fast. Regardless whether the loop is expressed in
C as counting up or down and also regardless whether the machine code used counts
up from zero or down to zero, the performance of the loop is exactly the same.

Summary:

In summary, we can say the following: If the compiler is free to choose the
direction in which the loop counts, you don’t need to worry – it will do the
right thing. If a data dependency stops your loop from being optimizable in this
fashion, the question becomes more interesting. On a fairly wimpy core like my
Intel Atom, couting down and comparing for (in-)equality was faster. On a premium
CPU like the Intel Xeon tested here, it did not matter at all, all implementations
of the for loop were equally fast.

As a general take away, one should also remember that the difference (if there is one
on your architecture) between
the direction of the loop only matters significantly if there’s next to no work
being done in the loop’s body. For instance, if you were to access memory in the loop, this will likely completely
dominate the cost of executing the loop.

Disclaimer and test environment:

Unless otherwise noted, all measurements quoted here were taken with this setup:

I am trying to write down everything that I remembered to do before leaving for the US so that posterity can benefit from it. I will add what I forgot later on.

DMV Report, Car insurance report

I have requested a report from my file with the Germany equivalent of the DMV just in case. I also asked my car insurance provider to send a letter stating that I am in good standing with them and have not had them pay for anything in the last 11 years. Apparently, some auto insurance carriers in the US will give you a discount if you can provide such information but no matter if useful or not things like this are easier to do before leaving.

I wanted to register for Zipcar back when I was in Atlanta and needed the letter from the DMV which was excruciatingly hard to get since I had to mail things back and forth using USPS and my parents had to forward the letter back to the US.

Cancellations

We had to cancel our newspaper, the internet provider and lots of insurance. Many of these are tricky as there might be a mandatory term on the contract. In Germany, though, most contracts can be canceled at any point in time if you move, especially if the service you were subscribed to is not available at your new address. Debating availability of service when moving from Germany to the US is usually quick. The required documents, though, suck. You have to provide a confirmation from the government that you gave up your residence address (notice of departure confirmation). This document is only issued to you when or after you move so for us, my parents will probably have to mail it to everyone concerned by us leaving. Some people can be persuaded to accept a letter confirming cancellation of your lease but not everyone.

Weekly newspaper: Cancellation at any time.

M-Net Internet: Three months notice, must provide cancellation of lease AND official notice of departure letter.

HUK Medical Insurance: Official notice of departure letter.

HUK Personal liability insurance: Three months notice, revoked cancellation as the insurance works in the US as well.

Luckily, car insurance automatically ends with selling a car in Germany so that’s taken care of.

Taxes

I prepared my tax filing for last year as early as possible this year. First, I have all the documentation handy and second I received all returned documents at my home address. For next year, I have already scanned everything that might be important from 2014 and will leave the original documents with a friend. For the year you move you will have to prepare tax filings for every country you have earned money in. Germany has a double taxation agreement with the US so it will likely not be too bad.

Insurance companies

I opted for getting a prospective entitlement with many insurance carriers. You will be a small fee for keeping your contract running without any actual insurance coverage. This will allow you to reinstate your original contract with no questions asked should you return. In Germany, health insurance is very good and prospective entitlement is cheap. I literally asked my carrier if the entitlement contract means that “as long as I make it back across the border I can be unbelievably sick and will still be covered as soon as I return” and they agreed. Also, in Germany, you will not be able to get back into the mandatory insurance program if you earn more than a certain amount of money once you return. Although there are loopholes everywhere I prefer knowing that I could return at any time without a problem.

We use Docker for various teaching webservices (Codematch, Xquery, Datalog) we want to offer to students but which should not cause our webserver to be more exposed. Docker has been good so far and we have already used the architecture to migrate all containers from the original dev host to the production webserver host. Here, we’ll talk about switching from AUFS to Devicemapper as a storage backend.

Why device mapper?

I can not attest to either storage backend being strictly better or worse than the other. However, for us the benefit is being able to run IBM DB2 without resorting to mounting an external volume for the DB2 container. This is beneficial, as using external storage break the versioned architecture of docker while keeping all data inside the container’s filesystem yields a nice separation of concerns.

1) Exporting all important images

We first committed each image we care for so that we had the most recent version tagged somewhere with all changes included. This can be done roughly like this

This gives you loadable copies of each of your important images. Alternatively, you could also export the container using export and import but I had less success with this: import failed to load a 1500MB export (I killed it after 15 hours of “importing”). Your milage may vary.

2) Switching storage backends

On ubuntu, this is simple. You want to add an argument to the docker deamon when it’s launched on system startup. Storage selection is done using the --storage-driver=x flag. AUFS seems to be the default so I changed /etc/default/docker to enable device mapper:

# Use DOCKER_OPTS to modify the daemon startup options.
DOCKER_OPTS="--storage-driver=devicemapper"

and restarted docker. With this, all containers and images should be gone (as there are none in devicemapper storage, if you removed the command line options, you’d see everything again). Now we import the original images like this:

This should actually allow you to start each image just like before except that you are running using the device mapper backend now. Of course, once you are confident that everything works as expected, you can get rid of the original images and containers stored in AUFS.

My Stripe-CTF Writeup actually made it 4th place on HackerNews and I just wanted to share how that looks in Google Analytics. Sufficient to say I was very happy with hosting a static, Jekyll generated blog on Github.io when I was the realtime analytics numbers climb beyond 300 concurrent visitors. Also, my Alexa rank jumped about 25,000,000 places to 300000th most visited website in the US. Good times, thank you.