NSNumber is a deceptively simple class with some interesting implementation details. In today's edition of Friday Q&A, I'll explore how to build a class that works like NSNumber, a topic suggested by Jay Tamboli.

OverviewLike many (but not all) object-oriented languages, Objective-C has a divide between objects and non-objects. Objects respond to messages, can be queried at runtime without knowing their exact type, placed in collections, compared for equality, and share a common set of behavior. Non-objects are largely compile-time constructs, with all of their type information essentially gone at runtime. In Objective-C, these non-objects are everything that comes from C, from the integer 42 to the string "Hello, world" to complicated structs.

Boxing is the process of placing these non-objects into an object so that they can be used like other objects, typically so that they can be placed in a collection. NSNumber is the Cocoa class used to box C numbers. You can't have an NSArray of int, but you can have an NSArray of NSNumber. NSNumber shows up a lot in Cocoa programming. Just about any place a Cocoa collection is used to store a number, NSNumber is there. Among many other places, NSNumber objects are what NSUserDefaults stores and retrieves when you ask it to save a number.

InterfaceOur surrogate NSNumber will be called MANumber. Unlike the Cocoa version, which is a subclass of the more general boxing class NSValue, this one will directly subclass NSObject:

@interfaceMANumber : NSObject

There are a lot of methods for initializing an instance. There's one initializer for each C numeric type, plus some extra ones for types specific to Cocoa:

Implementation StrategyMANumber will use a union to store the underlying numeric value. union is a rarely-seen feature of standard C. It looks just like a struct, but works differently. A struct stores many values together in one spot. A union does this as well, but you can only access the last one you stored. When you store a value in a union, the value of all other fields becomes undefined.

In typical unhelpful-but-efficient C fashion, the compiler doesn't enforce that rule, nor does it help you follow it by, say, letting you query which field was the last one set. You have to keep track of this yourself, typically with an accompanying enum.

The union could be used to hold every C numeric type, with a big enum to say which one is in use. However, this is unnecessarily complex. All we really need is three fields: the largest possible integer type, the largest possible unsigned integer type, and the largest possible floating-point type. From the types we have to handle, these are long long, unsigned long long, and double. Everything else can be converted to and from those without loss.

This implementation does not precisely match that of NSNumber, which keeps track of the specific type used to create it. However, using these three types is plenty close enough, and eliminates a lot of extra repetitive code. The fact that NSNumber precisely tracks the original type isn't visible most of the time, and only shows up when using a method like -descriptionWithLocale: or -objCType.

The _type variable holds an anonymous enum saying whether the value is an INT (long long), UINT (unsigned long long), or DOUBLE (guess). The _value variable then holds the actual number, using a union so that it only ends up storing one.

The code will set _type and the corresponding _value in the initializers. The getters can then check the _type and extract the value accordingly.

InitializersThere's a ton of boilerplate to deal with all of the different types. All of the signed integer types just call through to initWithLongLong:, and the unsigned types call through to initWithUnsignedLongLong:

Those initialisers then simply set the _type, _value, and return self. (Note that I'm leaving out the traditional call to [super init] for brevity, as it's not strictly necessary when your superclass is NSObject, although still a good idea.)

GettersThe getters are even more similar then the initializers. They all check the _type, then return the appropriate field of _value. The compiler will handle the final conversion from the active field of _value to the requested return type.

Since these methods all contain the same code, this is a perfect candidate for a macro to encapsulate the identical bits. Here's a macro that checks _type and then returns the corresponding field of _value:

The one exception to this uniform sea of macro invocations is the -boolValue method. Since BOOL pretends to be a real boolean value, this method should always return YES for any non-zero value stored in the MANumber object. The compiler's built-in conversion won't do this. For example, the integer 256 will return NO if converted to a BOOL, since BOOL is just a signed char, which is an 8-bit integer. Because of that, -boolValue replicates the macro logic, but with an explicit check for zero:

-descriptionWithLocale: then uses -[NSString initWithFormat:locale:] to build the string. There's no fancy way to deal with the different numeric types here, so it simply checks _type and uses a different format string for each case:

Note that I'm using ARC, which is why there are no autorelease calls here.

ComparisonThe comparison methods get interesting, because they need to work between MANumber objects of different types. For example, the double value -1.1 should compare less than the unsigned integer value 99999.

There are nine permutations of the types, so nine different cases to handle. This can be reduced to only six cases by enforcing an order. If the two objects have types INT and UINT, the two cases for that can be reduced to one by only handling the case where self is INT and the other object is UINT, and swapping the two objects if they show up the other way around.

To help with comparison between the different types, I wrote a simple macro that takes two numbers and returns the appropriate NSComparisonResult. All it does is take two arguments, save them into temporary variables to avoid multiple evaluation, then return the appropriate constant depending on how they're ordered. There's also a bit of floating-point trickery here. With floating-point numbers, NAN (not a number) never compares equal to anything, and all comparisons with it are false. Since NSComparisonResult has no way to represent an ordering which means, "this number is not equal to anything, not even itself," I arbitrarily decide to make NAN equal to itself and less than any other number, for the purposes of MANumber comparison:

If the two types aren't in order, we reverse the comparison by calling compare: again with the arguments reversed, and returning the inverse of the result. Since NSComparisonResult is just -1, 0, or 1, we can invert its meaning by negating it:

if(selfType>otherType)return-[otherNumbercompare:self];

Now we're left with sorted types. There are six cases. If selfType is INT, then otherType could be anything. If selfType is UINT, then otherType can only be UINT or DOUBLE. If selfType is DOUBLE, then otherType must be DOUBLE as well.

Let's look at the cases where selfType is INT. If both values are INT, the code is easy:

If otherType is UINT, there's a bit of extra work. Directly comparing with [otherNumber unsignedLongLongValue] won't work. C will promote [self longLongValue] to unsigned before the comparison, turning negative numbers into positive numbers and wrecking the comparison. -1 will compare greater than 1 because of this. To prevent that, we make a special check for negative numbers, then compare their unsigned values if both are known to be positive:

Next comes the case for DOUBLE. This gets pretty complicated, because floating-point numbers work fairly differently from integers. There are several different subcases here, which I'll take one by one. However, the first thing it does is extract the doubleValue from the other number to make it more convenient to work with:

else{doubleother=[otherNumberdoubleValue];

double can hold a much larger range than long long. The first subcase is to figure out the largest possible number a long long can hold, and see if other is beyond it. If it is, it's obviously larger than self, since self is a long long.

The built-in macro LLONG_MAX gives us the largest number a long long can hold. However, we can't directly convert this to a double. That number is equal to 263-1, which can't be represented in a double. Due to the internal format of double, it can only represent even numbers when it gets beyond 254. To perform the comparison accurately, we calculate one number beyond the largest long long, careful to use an unsigned one when adding, and compare against that:

We also check in the negative direction. This is a bit easier, as the smallest possible long long can be directly represented in a double:

if(other<LLONG_MIN)returnNSOrderedDescending;

If we're still running at this point, then the double is within the range of a long long and they need to be compared directly. However, we can't just whip out the > operator, because there are a lot of doubles that can't be represented in long long (e.g. 1.5), and there are a lot of long longs that can't be represented as a double (e.g. any odd number above a threshold, as mentioned above).

Beyond a certain threshold, double can only represent integer values, as the magnitude of the value exceeds the precision of the representation. When beyond that threshold, and below the maximum possible long long, the double can safely be converted to a long long with no loss of precision. The two values can then be compared as long longs. Below that threshold, double can represent any integer, and so the long long can safely be converted to a double with no loss of precision, and the two values compared as doubles.

The location of that threshold is actually fairly easy to figure out. C provides a macro, DBL_MANT_DIG, which gives the precision of the double type. By raising that to a power of two (since double is a binary representation), we get the threshold:

doublepureIntegerStart=1LL<<DBL_MANT_DIG;

Then we simply compare based on where other lies relative to that. Note that the threshold applies equally for negative numbers, so we must check it in both directions:

Note that we don't have to handle INT, due to the type sorting performed above. We move on to DOUBLE, which is once again complicated. As before, we fetch the value of otherNumber into a local variable:

else{doubleother=[otherNumberdoubleValue];

The first thing we do is see if other is negative. If it is, then we know the order, as self is unsigned (and thus either zero or positive):

if(other<0)returnNSOrderedDescending;

Otherwise, we do the same basic threshold calculations as before. This time we have to compare other against the largest possible unsigned long long. Doing this is a bit tricky. Just like with long long, we have to add 1 to get a number that works as a double. However, we can't represent anything greater than the largest possible unsigned long long as an integer, since unsigned long long is the largest integer type we have. Instead, we calculate (LLONG_MAX + 1) * 2, which gives one greater than the largest unsigned long long, carefully doing so with all the right types to avoid overflow or imprecision:

Finally, since we override isEqual:, we must also override hash. The implementation of hash gets mildly tricky due to the semantics of floating-point numbers. For non-floats, we can simply return the straight integer value as the hash:

-(NSUInteger)hash{if(_type!=DOUBLE)return[selfunsignedIntegerValue];

For floats that are integer values, we want to do the same thing. Since our isEqual: considers an integer-valued DOUBLE equal to an INT or UINT of the same value, we must return the same hash as the INT and UINT equivalent. To accomplish this, we check to see if the DOUBLE value is actually an integer, and return the integer value if so:

if(_value.d==floor(_value.d))return[selfunsignedIntegerValue];

Beyond this, we have non-integer values. The ultimate goal is to simply return the bit pattern of the double, which will give a nice hash. However, this only works for numbers where bit pattern equality implies isEqual:. This is not true for all doubles. First is NAN, which we made compare equal to itself, but which has many different possible bit representations. To handle that, we check for NAN explicitly and return a constant hash for it:

if(isnan(_value.d))return0;

The other special case is a bit weirder. IEEE 754 floats (the kind used by just about any modern CPU) have two possible values for zero: positive and negative. These are typically indistinguishable, as they compare equal and produce the same results for most calculations. However, they have different bit patterns, so we have to special-case them. I take advantage of the fact that negative zero compares equal to positive zero to make a simple check and return a constant hash for both zeroes:

if(_value.d==0.0)return0;

Having ruled out all the special cases, if the code reaches this point then the number must be one where numerical equality is the same as bit pattern equality. Thus we simply return the bit pattern for the hash. We do this by returning the u field of the union:

return_value.u;}

But wait! Previously I said that you're not allowed to access any field in a union besides the one that was last set, so this is clearly not allowed. While technically correct according to the language spec, C compilers have generally settled on allowing it and simply reinterpreting the existing value. This code takes the double that's stored in the union and reinterprets its bits as an unsigned long long, which is exactly what we want. Technically this relies on undefined behavior, but it's officially blessed by the compilers we're actually using.

ConclusionNSNumber is a conceptually simple class which mainly exists so that we can stuff numeric values into Cocoa collections, but its flexibility implies a fair amount of underlying complication. By implementing a workalike MANumber class, we can see what kinds of things NSNumber has to be doing on the inside. Automatic conversion to different integer types requires a fair amount of boilerplate code, and reliable conversion between number of different types can get pretty complicated.

That's it for today. Come back next time for yet another Friday Q&A. As always, Friday Q&A is driven by reader suggestions, so if you have a topic you'd like to see covered, please send it in!

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Chris L: Good point on that. I'm not sure if there's even a public way to obtain that information nicely, though. The documentation explicitly states, "Note that number objects do not necessarily preserve the type they are created with." And yet, preserving things like BOOL and float/int are important for plist and, as you point out, JSON serialization. It's a bit of an odd situation.

Charlie Monroe: I'm thinking of doing that as a followup, so you may get it in a couple of weeks!

Even older necro, but as someone who type-puns a lot I had to repond that accessing a different member of a union then the last one asssigned is allowed. The one caveat is that the value may be a trap representation.

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ``type punning''). This might be a trap representation.