This is quite subtle. The explanation is in the LOCK_SECTION_START() and LOCK_SECTION_END() macros. These macros puts the code in-between in another section, far away from the current. So, in fact, after the 1:, what you have in memory, is not the "lea", but the next instruction after the down().

So, if you write some C code like :

a = 2;
down ();
c = 1;

It will end up like this :

- some assembly code to set a to 2
- LOCK decl %0
- js 2f /* Jump only in the contended case */
- some assembly code to set c to 1

and then, far away :

- lea %0, %%eax
- call __down_failed

The idea is to optimize the non-contended case (when there's no contention on the semaphore). The idea is to not trash the i-cache with instructions that are useless most of the time, and probably to optimize the pipeline usage by making sure that prefetched instructions are the one that are most likely to be executed.