Tuesday, March 26, 2013

It's mostly pretty easy to eliminate null in a language: just replace it with Maybe/Option types. Unfortunately it's only mostly easy to get rid of null that way in class based OO languages. There's one corner where null is surprisingly stubborn without limiting OO languages: initialization. I'll demonstrate using Java but I'll show that the same or similar problem can manifest in most class based mainstream OO languages.

Once again we get a null (well nil) related exception. The difference is that in Ruby and Python you have to explicit about calling the super class constructor and you have more flexibility about the placement of that super call, so the Ruby code can be fixed by writing

If you don't feel like compiling and running that then I'll cut to the chase: it prints "Hello". That's because in C++ 'this' is constructed in stages and during the Base constructor 'this' is only a Base and not yet a Sub. Even a variant which explicitly passes 'this' around will print 'Hello'

C++'s rule works very well to prevent many uninitialized 'this` problems, but the downside is that it prevents some perfectly good code from working polymorphically. The following still gets "Hello" even though "HI!" would cause no problems.

The Big Hole

C++'s rule goes far, but it doesn't go far enough. If you're lucky the following code will seg fault. Formally it's completely undefined what will happen. In other, more safe languages, it will be a null pointer exception.

or only let 'this' leaking from a constructor represent the part of the object that has been fully initialized, e.g. a 'this' leaking from Sub's constructor must be a Base just as it is during Base construction.

or require that all fields be initialized immediately at declaration site (or use an equivalent mechanism like C++'s initializer lists)(2)

or do expensive whole program analysis to ensure that a leaking this isn't a problem

That's what it would take to make null go away. But it would also prevent perfectly good code from either working as desired or compiling at all.

Footnotes

I'm avoiding initializer lists and needlessly using "new" and pointers on C++ strings to illustrate my point. If that bothers you then pretend I'm using something where pointers are actually useful. I should also be doing copy constructors, assignment operators, virtual destructors and all that other fun C++ stuff, but all that boilerplate would be a distraction from my point here.

If you squint just right, the 'every field initialized at declaration site' rule is exactly how many statically typed functional languages like Haskell and ML deal with 'records' and algebraic data type constructors without requiring a null like construct for uninitialized fields.