In the first three parts, I gave an overview, explained a bit of the ABI used by Objective-C, and took a near instruction by instruction tour of what happens on the fast path of Objective-C method dispatch. By fast path, I mean what happens 99.9% of the time; a very fast, no overhead, no function call, no locking, set of instructions that grabs the method implementation from the cache and does a tail-call jump to that implementation.

The slow path is used rarely. As little as once per unique selector invoked per class with a goal of filling the cache such that the slow path is never used again for that selector/class combination. A handful of operations will cause a class’s cache to be flushed; method swizzling, category loading, and the like.

Note that during +initialize, methods won’t always be cached. Yet another reason to not do any real work during+initialize!In part 3, the cache lookup loop contained a NULL check and, if NULL was encountered in the cache, then the code jumped to a cache miss label. It looked like this (with the original source interleaved):

Note that the disassembly shows that the cache miss label is located at address 0x0000515c. Not at all coincidentally, that is exactly where this particular post’s tour starts. Namely, what happens when the cache lookup misses and the cache must be filled.

When I referred to this code path as the slow path, I wasn’t kidding! This is the one spot where the messenger actually makes a call into a C function which is then responsible for traipsing about the runtime metadata to resolve the method and fill the cache. Beyond that, the C function — _class_lookupMethodAndLoadCache() — is also responsible for ensuring that any +initialize methods of the class (and superclasses) are invoked prior to the method itself being invoked. This also implies that objc_msgSend() must effectively be recursively safe across this particular call site.

And that requirement leads to the need to preserve all of the various registers and push a stack from for the purposes of making the call. This is actually considerably more involved than a normal call site because the runtime is effectively hijacking the method invocation call to call something totally alien to the original method’s implementation!

The movq (%rdi),%rdi instruction dereferences the contents of %rdi and shoves the result into %rdi. Effectively, it grabs the isa of the targeted object and passes it as the first parameter to the function. In the case of an instance, this will be the class of the instance. In the case of a class, it will be the metaclass of the class.

Frankly, the movq %rsi,%rsi instruction makes no sense to me. It is clearly loading the second argument, but it is effectively a no-op since the source and destination are the same and the movq instruction doesn’t set or reset any of the processor’s status flags. Then again, I could easily be missing something.

The movq %rsi,%rsi instruction looks like nonsense in this context. It is here, but I forgot that this is actually an expanded macro (thanks, Greg, for the explanation!). If we return to the original source and grab the line from the macro, you will see:

movq $0, %a1
movq $1, %a2
call __class_lookupMethodAndLoadCache

The movq $1, %a2 instruction generates the movq %rsi,%rsi instruction when expanded. Note that the source is a parameter to the macro, though! For other variants of objc_msgSend() — for the ones that return values on the stack and not in registers — the “self” argument is actually in %a2 and “_cmd” is in %a3. Thus, the reason for the sometimes meaningless instruction.

If this particular codepath was performance sensitive to the degree where one or two instructions matters, the assembly macro language does have conditionals that could be used to eliminate the instruction when source and destination are the same.

Finally, the return value is passed back in register %rax and is stowed away into register %r11 for use shortly.

And… all these instructions are simply to restore all the registers back to the state they were in prior to the call to _class_lookupMethodAndLoadCache().

0x00005215 cmpq %r11,%r11
0x00005218 jmp *%r11d

Finally, dispatch! The cmpq resets the status registers to indicate a non-structure return value. In other words, the above two instructions are no longer contained in the method lookup macro, but are found in the messenger itself (and will be different from the other messengers).

Note that _class_lookupMethodAndLoadCache() will never return NULL and, hence no need for a NULL check above. If a method isn’t found, then _class_lookupMethodAndLoadCache() returns the address of the forwarding handler. Because forwarding may actually come into play often, the forwarding handler is actually put into the cache such that future invocations can leverage the fast path.

That is it; that is both the fast and the slow path of method invocation.

But what are these instructions?? There appear to be some leftovers!?!

Way back as the very first two instructions to objc_msgSend() there was a nil check. If the target of method invocation is nil, then jeq 0x0000521b.

This is the nil handling code. The last five instructions — the two movq, two xorps, and ret — take care of actually zeroing out all of the return value registers and returning control to the caller. This also explains why not some types of return values are undefined on message-to-nil. Message-to-nil can only safely zero out values that are returned in the return registers. There isn’t enough metadata in the C ABI to know how much of the stack to zero for values returned on the stack.

THe first three instructions load the value — movq 0x000b32be(%rip),%rdi — contained in the _objc_nilReceiver global into the register %rdi. The testq %rdi,%rdi instruction sets the processor’s zero flag in the status register if the value contained in %rdi is zero. The jneq 0x0000510a instruction will jump if the value is not zero. That address is actually back into objc_msgSend and, in particular, basically does a dispatch to the nil message receiver with the same selector.

These final two instructions are what happens when you invoke one of the ignored selectors under GC. It effectively causes the method to return self; by moving the target of the method call from the first argument register %rdi into the return value register %rax.

Note that this is also the reason why -retainCount returns such an outrageous value under GC; it is the address of the object.

There you have it. That is every instruction that may be executed in objc_msgSend() from beginning to end.

Frankly, the movq %rsi,%rsi instruction makes no sense to me. It is clearly loading the second argument, but it is effectively a no-op since the source and destination are the same and the movq instruction doesn’t set or reset any of the processor’s status flags.

The instruction does nothing. It’s a side effect of the macros used to generate the code; in some other variants of objc_msgSend(), the selector is not yet in %rsi, and the instruction at that point puts it there.

Thanks for a really useful series. Do you have a pointer to more information about the “method triplet” that you allude to? Looking at the code, the cache seems made up of a method_name pointer (offset 0) and a method_imp pointer (offset 16). I assume that the signature string-pointer is at offset 8 and is just excluded in the structure definition because it’s not needed in this file? Or am I misreading it? This seems to match a method_t. Is a method_t the same as a “method triplet?” (I’m having trouble finding how or if method_t relates to Method; I don’t immediately see how runtime.h compiles for OBJC2 since objc_method appears to be #ifdef’d out.)

Excellent article! I was scanning for an explanation why various sources discourage the use of objc message calls in CoreAudio render callbacks. Are there any additional implications with multiple threads? Perhaps it’s just because of the number of cycles required in the slow case. This is easily remedied by ensuring the IMP cache is populated which I’ve read is faster than even C++ method calls. Any thoughts?