Learning C++

Friday, April 02, 2010

Floating point comparisons can be a painful work to achieve. And always must one avoid laziness in doing equality comparisons using inbuilt operator== for floating point numbers.

There could be roundings involved and/or infinite decimal expansions that can lead to inaccurate representations or better say, losing of some precision off the number. Owing to that, even though you would expect a number to match a definite expansion that the calculation may have lead to, it may not.

An easy way to handle that is to choose a reasonable tolerance in the comparison. For example, we could choose a tolerance of 0.0000001 when comparing two doubles for equality such that the absolute value of their difference if less than this value then, you could consider them to be equal or better say, 'close enough' for most reasonable cases. This tolerance can, however, be insufficient if:

1) The numbers are very large or if the numbers are very small in value for the tolerance to be able to compare to their difference. 2) If there have been multiple arithematic operations performed that could lead to compounded incorrectness (owing to multiple roundings/loss of precision involved) in the result.

In C++, there is a value std::numeric_limits::epsilon() that is referred to sometimes as machine epsilon value/tolerance or machine accuracy. It gives the difference between 1 and the smallest value greater than 1 that is representable for data type for which the template's instantiated, which is double here. This helps us with the standardization of the tolerance value that we previously had chosen as 0.0000001. But doesn't help us with the limitations 1) and 2) pointed out above.

There could more factors than that mentioned in 2) that can cause the rounding error. Apart from arithematic operations, it could be type promotions or binary conversions that can cause more than epsilon() rounding error. So, combination of 2 arithematic operations could lead to more than 2*epsilon. I don't know how quantifiable that is, but by a safer choice of N (number of arithematic operations involved), we hope to be able to overcome the problem in most practical cases. :-)

For the short-coming mentioned in 1), we would need to bring the epsilon to the degree of large-ness or small-ness of the numbers involved. Considering this, you might have something like:

NaN expands to 'not a number'. Various floating point operations can lead to this value. IEEE-754 is the most widely used floating point computation standards. It defines 5 possible exceptions that can occur during floating point arithematic:

1) Invalid Operation: when an invalid operation is performed on a number, for example, taking square root of a negative value, taking log of 0 or a negative number, etc. 2) Division by Zero: when a number is divided by zero 3) Overflow: when the result is too large to be representable 4) Underflow: when the result is too small to be representable 5) Inexact: when the result is inexact meaning the number was rounded and caused discarding of any non-zero digits

The cases falling under invalid operation would result in a value of NaN as per the above mentioned standard. (I get -INF with gcc 3.4.4, but NaN for any negative number's logarithm). You would check for NaNs and Infinities resulting out of computations in order to determine the success and validity of the operation but that can lead to code that is cluttered with ifs and elses and can become almost un-readable. You can, however, do the checks at every logical chunk of the computation. However, NaNs and INFs can get lost mid-calculations. The standard dictates that an exceptional value NaN or INF should get propagated throughout the calculations so that the checks at the end of a computation could catch them but certain operations can lose them, for example:

1) NaN operation with INF value would result in INF, as in NaN + INF = INF. 2) INF can become 0, as in x/INF = 0

The way to catch these would be to check the sticky bits that are essentially the exception flags that are set on occurence of floating point exceptions anywhere during the calculation. But let's not get too carried away and leave it to the credibility of the programmers and users to be able to handle the various cases in the most sensible way. :-)

To conclude, here is a function that will help check if a value is a NaN or not, based on the fact that any comparison involving NaNs would result in 'false' result:

template<typename T>bool myIsNaN(T value){ return value != value;}

C99 has a function isNaN() and TR1 for C++ has included the support for this function but just in case your compiler doesn't support it. Similarly, you have isinf() that will help catch both positive and negative infinities. If you are feeling too experimental, try your own version of isinf() using numeric limits. :-)

Tuesday, July 28, 2009

Here's a short one on string literals in C++. Ask yourself: what is their type? It is 'array of n const char', correct! So, we might think:

char* literal = "Hello World!";

would be "invalid"/"illegal" in C++. But you'd be surprised it is not and even Comeau online compiles it successfully without even a warning.

The C++ standard, however, tries to protect you hinting that the above is wrong by stating that it is a deprecated feature in C++ that probably was allowed to keep backward compatibility with C.

Here is what the standard says as part of section [2.13.4/2]:

[quote]A string literal that does not begin with u, U, or L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char”, where n is the size of the string as defined below; it has static storage duration (3.7) and is initialized with the given characters.[/quote]

So, the following would have definitely been invalid in C++:

char* literal = "Hello World!";

"Hello World!" is an array of 13 [spooky :-)] constant characters. 'literal' is the pointer to the first element of the array and since that element is const, the pointer cannot be declared as a non-const char*. The pointer has to be of the type 'pointer to a const char'.

But as mentioned above, to have backward compatibility with C where the above works an implicit conversion is defined for array to pointer conversion where a string literal would be converted to an r-value of type "pointer to a char". The standard mentions this in section [4.2/2]:

[quote]A string literal (2.13.4) with no prefix, with a u prefix, with a U prefix, or with an L prefix can be converted to an rvalue of type “pointer to char”, “pointer to char16_t”, “pointer to char32_t”, or “pointer to wchar_t”, respectively. In any case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [ Note: this conversion is deprecated. See Annex D. —end note ][/quote]

But, the thing to be happy about is the note above, that is re-iterated in Annexure D section [D.4/1] as:

It is also allowed to be able to access the static members via an instance/object as below: (modifying the above main function and keeping reset the same)

int main(){ Test testObject; std::cout << testObject.getClassName();}

Usually though, it is better to use the previous notation using the scope resolution operator (op '::'). That makes the code more readable in the sense that you know it is a static member and not a non-static one. It speaks for itself.