Saturday, May 3, 2014

The sp(id)y subset, or Avoiding Copeland 2010 with Objective-C 1984

In my recent post on Cargo Cult Typing, I mentioned a
concept I called the id subset. Briefly, it is the subset of
Objective-C that deals only with object pointers, or id's.
There has been some misunderstanding that I am opposed to types. I am
not, but more on that another time.

One of the many nice properties of the (transitive) id subset is that it
is dynamically (memory) safe, just like Smalltalk. That is, as long as all arguments and return values
of your message are objects, you can never dereference a pointer incorrectly,
the worst that can happen is that you get a "Message not understood" that can
be caught and handled by the object in question or raised as an exception.
The reason this is safe is that objc_msgSend() will make sure that methods
will only ever be invoked on objects of the correct class, no matter what the
(possibly incorrect, or unavailable) static type says.

So no de-referencing an incorrect pointer, no scribbling over random bits
of memory.
In fact, this is the vaunted "pointer safety" that John Siracusa says requires
ditching native compiled languages like Objective-C for VM based languages. The idea
that a VM with an interpreter or a JIT was required for pointer safety
was never true, of course, and it's interesting that both Google and
Microsoft are turning to Ahead of Time (AOT) compilation in their newest
SDKs, for performance reasons.

Did someone mention "performance"? :-)

Another nice aspect of the id subset is that it makes reflective code
a lot simpler. And simplicity usually also translates to speed. How
much speed? Apple's NSInvocation class has to deal with
interpreting C type information at runtime to then construct proper stack
frames dynamically for all possible C types. I think it uses libffi, though
it may be some equivalent library. This is slow, around 340.1ns
per message send on my 13" MBPR. By restricting itself to the id subset,
my own MPWFastInvocation class's dispatch is
much simpler, just a switch invoking objc_msgSend() with
a different number of arguments.

The simplicity of MPWFastInvocation also pays off in
speed: 6.2ns per message-send on the same machine. That's 50 times
faster than NSInvocation and only 2-3x slower than
a normal message send. In fact, once you're that close, things like
IMP-caching (4 ns) start to make sense, especially since they can
be hidden behind a nice interface. Using a C Macro and the IMP
stashed in a public instance var takes the time down to 3 ns, making
the reflective call via an object effectively as fast as the
non-reflective code emitted by the compiler. Which is nice, because
it makes reflective techniques much more feasible for wider varieties
of code, which would be a good thing.

The speed improvement is not because MPWFastInvocation is better
than NSInvocation, it is decidedly not, it is because it is solving
a much, much simpler problem. By sticking to the safe id subset.

4 comments:

I don't think I said that a VM is required for memory safety. (Just look at Perl, for example: no VM, not interpreted, memory-safe.) Also, you should read my Copland 2010 Revisited article from (appropriately) 2010. I also did a long, rambling podcast on this topic with Guy English last month.

Thanks, I both read the "revisited" article (it's actually one of the links above) and listened to the podcast all good stuff! Heck Guy and I had a little twitter-exchange about him mangling my last name :-)

The, er, point stands that the pointer-safe language you are looking for is already present inside Objective-C, we just need to give it a chance to come out. And I have reason to believe that it isn't even that hard...not necessarily to make such errors impossible, but to make you have to really go out of your way to get them, which seems sufficient to me.

Yes, the "Objective-C without the C" idea. As I think I mentioned on the podcast, that sure appears to be where Apple is going, but it still seems like a bit of an uncomfortable truce between order and chaos to me.

I think you're right that Apple's current direction is an uncomfortable truce between order and chaos.

I wouldn't characterize it as "Objective-C without the C", though. In fact, it seems more the direction of Java or C++, keeping the intermingling of objects and structs/pointers that is the source of the problems and then trying to get out of the mess through static analysis and static restrictiveness, but with the default still being raw pointer access that can and will crash.

As examples, see CoreAnimation and more recently SceneKit, which both uses structs fairly randomly in unexpected places where they could easily have used objects instead. And performance really isn't the issue here, these abstractions move sufficient other information that the object overheads shouldn't be significant (as also demonstrated by Quartz use of equally heavy CF-style objects).

Or the "NSInteger" madness, which causes great confusion because programmers that don't have the history don't understand that these are primitives that behave different from objects such as NSNumber (and how could they?)