Saturday, May 1, 2010

C Language Quirks

Background

During the years, I’ve had the privilege to work with prototypes of processors still under development that sometimes existed only in a handful of copies in the world. My task has been to port operating systems to these new processors. Similar the processors, the accompanying tools were also early beta versions. As with most beta software, the compilers I used were somewhat buggy and they required a quite deep knowledge about the C language in order to debug issues introduced by incorrect compilations.

Learning the ins and outs of the C language was something I really enjoyed and at the time I knew more or less the entire C language standard with all its weird behavior that are pretty unknown to most regular developers. In this article I’ll try to describe some of the more interesting things I learned over the years.

Shift Operator Oddities

Many developers interested in the details of the C language know that shift operators don’t always have an intuitive behavior. For example, shifting down a negative number may on some targets be a division while it isn’t a division on others. This is called undefined behavior and the result of the operation depends on whether the CPU fills the high bits with the sign bit or not. Basically it is strongly recommended not to use shift operators on signed operands (unless you are absolutely sure you are only working with positive numbers, which you probably can’t be).

My favorite quirks on shift operators are on unsigned operands, for which shift operations generally work fine. Consider the following example and try to guess the output:

The example is performing left and right shift on a byte and a word, and the amount shifted is bigger than the width of the type, in this case 9 for the byte and 33 for the word. The idea is to perform the same operation on two different types and highlight some artifacts. The example actually shows a couple of different quirks of the C language, and the result also dependent on the processor it is executing on.

First off, the result of the shift operation is undefined if the right hand operand is greater or equal to the width of the left hand operand. This means that if the amount shifted is larger than 32 on a 32 bit type, the result depends on the compiler and the processor. Power PC and ARM processors typically yield the result 0 for both left and right shift of more than 32 bits, which is what most humans would think is logical. But most Intel processors since 80286 use only the low five bits in the right hand operand, so the expressions above are in reality shifting the integers only one bit.

The second issue is that shift operations always promote the left hand operand to (16 or 32 bit) integer. Although it looks like the bytes in the example are shifted 9 bits which is more than a byte, it is not affected the undefined behavior described above. The reason is that the byte is promoted (i.e. converted) to an integer before the shift operation is executed, and 9 bits is of course a perfectly valid amount to shift an integer value without any side effects.

The result of the small program is shown in the table below. As we are dealing with undefined behavior, I captured the results of the most common architectures to highlight the difference:

Intel PPC & ARM c >> 9 0 0 i >> 33 1 0 c << 9 1024 1024 i << 33 4 0

More Shift Operator Confusion

Shift operators also have another unexpected behavior. The shift operators work logically much like multiplications or divisions and most programmers would probably expect the same precedence as the multiplicative operators. That is, the expectation may be that shift operations are executed before additions. This is however a big mistake. The shift operators have in fact very low precedence, and most arithmetic operators including additive operators have higher precedence. This can easily lead to bugs for example:

int i = a << 2 + 1; /* Calculate a * 4 + 1 */

But because of the weaker precedence of the shift operator, the expression above is actually equivalent to

int i = a << (2 + 1);

which is far from what the programmer intended.

Bitwise Operator Mistakes

Many developers that are programming embedded systems and applications that are close to hardware are well aware of the powers of bitwise operators and use them frequently. There is really nothing strange with these operators, but sometimes there are bugs caused by not understanding their behavior or just too speedy programming. The example below shows one case where the behavior isn’t what the programmer intended, and it is quite well hidden:

The function is nicely written, well commented, and the logic is straight forward. The developer designed it to test one bit at the time by masking other bits, and if the result is non-zero, the bit is set. So for example invoking the function like

printBits(0x05);

should ideally show that bit 0 and 2 is set and bit 1 is cleared. Somewhat surprising though is that the output is

Bit 0 is set Bit 1 is set Bit 2 is set

The reason is however quite simple. Most mistakes with bitwise operators are caused by not understanding the precedence rules. In this case the inequality operator has higher precedence than the bitwise AND operator. What this means is that the expressions in fact look like:

if (val & (1 != 0))

A logical expression X != Y returns 0 if X and Y are equal and non zero if they are different. Most compilers always generate 1 as result if the test is true, which means that all if statements in the function turns out to be the same:

if (val & 1)

and basically test if bit 0 is set or cleared regardless of the numeric value, which wasn’t at all what the developer had in mind when writing the function.

Bitwise Not Operator

The bitwise not is a unary operator that calculates the bitwise complement of the operand and is a handy operator when performing bitwise arithmetic. Below is an actual real life example on when the behavior may not be exactly as one would expect. The code sets the bitwise complement of two bytes into an integer:

unsigned char a = 0x44;unsigned char b = 0x88;

unsigned short val = (((~a) << 8) | (~b));

This looks like a safe operation, the code carefully left shifts one of the values to avoid overlap, and then OR the two bitwise complements into the integer. The expression is well parenthesized to ensure there are no precedence issues. The expected value is naturally 0xBB77, with the high 8 bits being the bitwise complement of variable a, and the low 8 bits being the bitwise complement of variable b.

There is a small catch though; the developer writing this program overlooked that the bitwise complement operator performs integer promotion of the operand. In the example above, this means that the bitwise complement of 0x88 is not 0x77 as the developer assumed. The integer promotion is done prior to the operator being executed, and the result is actually 0xFFFFFF77. When OR-ing with the bitwise complement of the other byte, and after truncation to unsigned short, the resulting value turns out to be 0xFF77.

Bizarre Promotions

Some of the examples so far have shown unexpected consequences of automatic integer promotion. Sometimes these cause bugs that are quite hard to find. The following example shows a perfectly valid and commonly used test to see if an addition of two positive integers overflows.

This function could for example be called with two large numbers, and the function would print “overflow”, e.g.

testOverflow(INT_MAX, INT_MAX);

It is easy to think that this piece of code can be reused with different data types. Assume that someone needs to do the exact same test, but he is working with signed bytes instead. It may seem to be straightforward to just take the code and change the types:

Considering the only change that was made was to change the type, the different output may be a bit confusing. But there is a very good reason for the behavior, and you guessed, it is spelled integer promotion, and the variables a and b are promoted to integer before the addition is executed, and thus the result is always greater or equal to the right hand operand in the comparison.

More Bizarre Promotions

Another example when the promotion rules play some tricks is shown below. Consider the code:

signed char a = 127;

printf("%d\n", (a + 1)); printf("%d\n", ++a);

In this example it looks like both print statements would show the same output. Since the variable a is a signed character with the value 127, the natural expectation is that when adding one, the value would overflow and wrap around, and become -128.

This is however only true for one of the statements. The ++ operator adds one to the operand and the result is stored in the operand. Since the variable is a char, the addition will overflow and give the result -128. The printf function takes an integer as argument, and the char value will be promoted to integer. Promoting a negative char value to integer, keeps the sign and value, and thus the output of the printf function will be -128.

For the first statement the integer promotion works differently. The variable a is promoted to integer prior to executing the addition, which results in an integer addition 127 + 1, which equals 128. The result is already an integer so no further promotion is needed for the printf call, and the output is 128.

Even More Bizarre Promotions

Integer promotion can also be a problem in conditions. The following example illustrates this:

signed char cs = -1;unsigned char cu = cs;

signed int is = -1;unsigned int iu = is;

if (is == iu) printf("equal\n");else printf("not equal\n");

if (cs == cu) printf("equal\n");else printf("not equal\n");

Again, we are comparing the behavior of the same operations made on an integer and on a byte type and as you may expect we get different result and the cause is not surprisingly integer promotion.

The first if statement compares a signed and an unsigned version of the same integer value. Both operands are promoted to the biggest type larger or equal to int. In this case unsigned int is consider biggest and both operands are thus converted to unsigned int. The conversion is trivial, and since we assigned the variables to the same value the comparison will show the values are equal.

More interesting is the second comparison. The same rules apply, both operands are promoted to the biggest type larger or equal to int. In this case both types are smaller than int, so both operands are promoted to int before the comparison is executed. Here comes the interesting part. The variable cs is a signed char with the value -1.

Promoting it to integer preserves the sign and the resulting iteger is thus -1. The variable iu on the other hand has the value 255 (same binary representation) but promoting unsigned characters to int also preserves the sign, and the resulting integer will be 255. It is clear then that the test will show that the two values are not equal.

Lexical Ambiguities

The C language has many lexical and syntactical issues including case statements without breaks, if – else statements without brackets, and other weak constructions that allow for mistakes. There are however a couple of cases, where the language actually is ambiguous and you can write a statement that doesn’t do what you expect. One of my favorites is shown the following example:

The program is very straight forward with a a method, inverse, that is implemented as a macro and does just what it suppose to do, calculating the inverse of the input argument. The output is not as straight forward though. You would expect the two print statements to output the same value, 0.5, since the program is passing the value 2.0 to inverse in both cases. At least that what it looks like and the output is correct for the second statement. But the first statement outputs 1.0 (!). So did we exploit some weird floating point bug here or what is going on?

Well, the answer may get clearer if we perform macro expansion on the first statement:

printf("%.1f\n", 1.0/*d /* print inverse */ );

Maybe not entirely clear yet, we all know that the indirection has higher precedence than the division, so the program should fetch the first value of the array which is 2.0.

The problem is however not at all related to any pointer indirection issues or floating point bugs. What happens here is that the division followed by the indirection is parsed as a multi-character token in the C language, in this case the start of a comment. As you know a comment ends with the */ token, and everything in between is treated as a comment and not code. So the division and the dividend are interpreted as part of a comment and are ignored.

Fun with unary operators

Although the following example is really not that weird and it’s a perfectly valid and well specified C expression, I find it somewhat amusing.

int i = 2 - - - - - - 2;

printf("i = %d\n", i);

Most programmers know that unary operators have higher precedence than additive operators which basically means that e.g. 2 + (-1) equals 1. Usually the unary operator is placed immediately before the operand, but this is not required by the language. Furthermore, the parentheses aren’t needed either, because the unary operator has higher precedence than the additive operator. A more explicit version of the same expression would look like

int i = 2 – (-(-(-(-(-2)))));

and it is more clear that it is a subtraction of two values where the latter has a series of negations. The output of the small program would thus be 4.

Conclusion

I tried to show a couple of interesting cases where the C language may not be as clear and straight forward as it perhaps should be. These issues have resulted in many bugs and I think a stricter definition had helped a lot, but I’m sure there are good reasons for all the undefined and unintuitive behavior in the C language. The solution is rather simple; spend a lot of time to learn the specification. This may sound harsh, but I’m really happy that I was forced to learn it the hard way. I’ve had plenty of fun with the knowledge over the years and I know I share it with others that also had the privilege to dig deep into the language.

Correction: Right shifting signed integers is not undefined behavior. If the number is negative, it's implementation-defined behavior, otherwise it's defined. Left shifting signed integers is undefined behavior if an overflow occurs or if the number is negative.

Your overflow examples are not valid code in the specification sense. Integer overflow is undefined behaviour . For example a compiler could rightfully optimize the if away, because (a+b < a) is always false for positive ints. Unfortunately many programs rely on this behaviour. Thus, a compiler writer not only has to know the C standard, but also the expectations.

The bug with the inverse macro can only happen on a broken compiler, or maybe a very old one. The compiler has to remove comments and replace them with spaces before it executes preprocessor directives. So when it's doing macro substitution, comments no longer exist.

Thanks for great comments. I am aware that the overflow example is a bit misleading and the behavior with integers is not fully described. I wanted to show the different behavior of a 'simple' type change but I realize after reading comments that the initial behavior of the integers probably are more interesting.

The inverse macro example is also more interesting than I first thought and if someone have some more insight I'd be happy to learn more. I tested the code on several different compilers, and some remove the comments before macro expansion and some after. Interesting is that even two different versions of gcc got different result.

this is very good nice article. this is very useful for C Language students.

==========================================================This is very nice article. This is very use ful for C Language Learners.http://www.bytesonlinetraining.com/c-language-online-training/===========================================================

Packers and movers in pathankot@http://www.movingsolution7.in/packers-and-movers-pathankot/Packers and movers in ambala@http://www.movingsolution7.in/packers-and-movers-ambala/Packers and movers in bareilly@http://www.movingsolution7.in/packers-and-movers-bareilly/Packers and movers in allahabad@http://www.movingsolution7.in/packers-and-movers-allahabad/Packers and movers in dehradun@http://www.movingsolution7.in/packers-and-movers-dehradun/Packers and movers in jammu@http://www.movingsolution7.in/packers-and-movers-jammu/

packers and movers in india@https://quicksoluction.compackers and movers in delhi@http://quicksoluction.com/packers-and-movers-in-delhi.htmlpackers and movers in mumbai@http://quicksoluction.com/packers-and-movers-in-mumbai.htmlpackers and movers in pune@http://quicksoluction.com/packers-and-movers-in-pune.htmlpackers amd movers in gurgaon@http://quicksoluction.com/packers-and-movers-in-gurgaon.html

About Me

Daniel Vik is a software engineer specialized in embedded software architecture and design. He is currently a staff software engineer at Google Inc.
Daniel holds a Master of Science Degree from Uppsala University with a major in Technology. His professional career includes platform design, device drivers, wireless networking and security, analytics, software and system architecture.
In 2003, Daniel founded an open source project a multi system computer simulator targeting systems from the 80's and early 90's.