Sunday, March 13, 2011

Platform Dependent Versus Universal Numeric Types

Implementing numeric types in a programming language is surprisingly difficult. At this time, Crack supports Universal Numeric Types (UNTs - for example int32, uint64, byte) and Platform Dependant Numeric Types (PDNTs - int, uint, float). The PDNT names are just aliases for the corresponding UNTs in the target compiler - so for example, if the C++ compiler you use to build crack has a 32 bit int, Crack's "int" will be an alias for "int32". Additionally, the philosophy behind implicit type conversions is to allow them only if they do not result in loss of precision.

This overall approach is not without its problems:

It makes Crack code platform-dependent because there are expressions that will work on one platform but will result in a compile-time error on another platform. For example, "int32 i = int(v);" works on platforms with 32 bit integers, but breaks on platforms with 64 bit integers.

You end up writing a lot of explicit type conversions in places where you really don't care that much (like when using a signed integer value for a function argument of type "uint"). For a scripting language valuing terse syntax, this is kind of lame.

So after some discussion on IRC, we've decided to change our approach a little bit. The general philosophy now is that you should use a PDNT in situations where you care about performance or interoperability with C/C++ code and you should use a UNT in situations where you care about precision. The manifestations of this decision are:

PDNTs will be promiscuous: any numeric type will implicitly convert to any PDNT type. So "int i = float64(v);" will be perfectly legal on any platform.

There will be max-size, min-size assumptions about PDNTs. In particular, they will all be at least 32 bits but no more than 64 bits in size. Like everything else in the language, these assumptions are subject to change across major versions of the language.

UNTs will continue to apply the strict conversion rules. However, because of the min-size/max-size assumptions, certain conversions from PDNTs will always be legal. An example of this is "int64 i = int(v)".

There's still a platform dependency problem here because expressions like "int i = int64(v);" will vary in behavior at runtime depending on the size of an integer on the platform. So we've essentially converted a compile-time portability issue to a runtime portability issue :-/.

To mitigate this effect, there will be a warning flag that allows you to identify the places where you could potentially lose precision with something like this. We are also considering allowing the generation of a runtime check that would throw an exception if specific values will be truncated.