Data often occupies a low position in the mind of a programmer. It just isn't as exciting as writing the code that does something with that data. In fact nothing could be further from the truth and C in particular is a language that was designed to have data at its core - but not for the same reasons that most modern languages do.

One of the big differences between a language like C and more abstract languages like Java or C# is that C was designed to be close to the way that the machine works in terms of data. C creates abstract constructs that make writing code easier than writing in assembler. It gives you for loop, while loops, if statements and so on, which are much simpler to use then the lower level sequences of assembly language needed to do the same tasks. However, when it comes to data, C stays very close to the addressing and the organization of RAM that you find in a real machine.

This is a big plus point if you are looking to make effective use of memory and it is essential if you are writing code which interacts with hardware that is represented as particular areas of memory. The problem is that with this flexibility and realism come some major responsibilities. It is up to you to organize and use memory in sensible ways. It is all too easy to write program that stray into areas of memory they were never intended to access. This is the reason why C code has a reputation for being buggy and dangerous. It is undeniably low level code and as such the only way to write safe, high quality code is to understand how C works and know what it is you are trying to achieve.

Being so close to the hardware means that not only can you write programs that work with it in ways that other languages make difficult, but you learn how the hardware works and this is a valuable education in itself.

Memory Basics

Computer memory is organized into chunks of storage which are fixed in size, typically 16 or 32 bits. Generally each chunk of storage has a unique address which is used to identify it. Back in the days when C was being invented the standard machine of the day - the PDP 11 - organized its memory as 16-bit words. What this means is that if you specified an address you read or wrote a single 16-bit word. When C was created its fundamental data type was int, an integer, and this was assumed to be a 16-bit or 2-byte storage location.

So what is the size of an int today?

The answer might surprise you - it was and still is decided by the compiler implemented for the machine. Many C programmers believe that C data types have a fixed size - they don't - and they vary according to the machine you are using.

This might seem like madness if you are familiar with other higher level languages, but C is designed to be close to the machine it is running on. When you say you want to use an int, you are asking for a variable that is the fundamental access unit of the machine. That is, reading or writing the variable involves one memory access. For example, suppose the C standard defined int to be 32 bits in size and you were working on a machine that had a memory organized into 16- bit words. Now when you store or retrieve something from an int, 32 bits have to be transferred and this would mean two memory accesses on a machine with a16-bit word. This would slow your programs down significantly.

When you ask for an int variable you get a word size that corresponds to the most efficient memory access the machine can offer.

Of course, even this rule is likely to be broken because it is up to the compiler writer to implement whatever makes sense in the circumstances, but this is the intent.

The same is also true of the other data types and this is the reason that their sizes are not fixed.

The vagueness about int extends beyond its size. An int should be capable of holding a signed value, i.e. both positive and negative values, but the format used to store this isn't specified. There are two common ways to represent negative numbers, one's complement and two's complement. Most common hardware uses two's complement and this is what you encounter in a C int, but it isn't mandated.

Even so many programmers think that when they declare an int they get a 2- or 4-byte memory location holding a two's-complement value but this doesn't have to be so.

An int should be the natural size for the machine in use and the numeric format is whatever the machine uses when it does arithmetic.

All in all this is very vague.

So how do C programmers cope with this vagueness?

Sometimes it doesn't matter because the program would work with a 2-byte int and just as well in a 4-byte int and the numeric representation doesn't matter either.

Sometimes it does matter and in these cases you need to use data types that are guaranteed to be particular sizes - more of this later.

The often overlooked fact is that C is a language that targets specific machines and if you want to know what the data types are you have to ask what the target machine is. For nearly all machines with a 16-bit architecture, a C int is a two's-complement 16-bit value; for 32-bit architectures int is a two's-complement 32-bit value. However, for a machine with a 64-bit architecture int is still usually 32 bits because this is more efficient than using 64-bit integers.

The Numeric Data Types

C has a range of numeric integer data types but how they are implemented depends, just like the basic int, on the archecture of the machine. However the most commonly encountered sizes are:

char

1 byte

-128 to 127 or 0 to 255

short

2 bytes

-32,768 to 32,767

int

2 or 4 bytes

-32,768 to 32,767 or -2,147,483,648 to 2,147,483,647

long

4 bytes

-2,147,483,648 to 2,147,483,647

long long

8 bytes

−9223372036854775807 to +9223372036854775807

The long long type was defined in C99.

The char type is often assumed to be exactly 1 byte, but this is just its most common implementation. The reason it is called char is that it has to be capable of storing the basic character set codes of the machine. Notice that it can be a signed value or an unsigned value, again this depends on the machine.

If you want to be certain that char is signed you can use the qualifier signed in front of char to give signed char.

If you want the values stored in a variable to be unsigned, i.e. just positive integers, you can use the qualifier unsigned in front of the type. For example, unsigned char, means a single byte can store values in the range 0 to 255.

There is one final complication. You can put int after any of the length qualifiers. So a short can be declared as short int, a long as long int and a long long as long long int. This is mostly a matter of preference. Many C programmers prefer the shortest form of a declaration so instead of long long int they would use long long.

Notice that while most implementations of C have these data types it might not be wise to make use of them if they are not efficiently supported by the hardware. For example the ARM 11 architecture is 32-bit and hence short actually uses as much storage as an int and takes longer to do arithmetic. It is not necessarily true that smaller is better.

Floating Point

As with integer types, C pays attention to the hardware that is available in its implementation of floating point numbers. So much so that the basic float, i.e. a number with a fractional part, is whatever the hardware offers. Things are, however, more standardized than you might imagine because in practice most hardware implements the IEEE 745 standard for single-precision binary floating point arithmetic. This means that in C float is a four-byte, 32-bit single-precision floating point value with the range -3.4E38 .. 3.4E38 with between 6 and 9 digits of precision.

For higher precision there is the double data type, which is generally implemented as an IEEE 754 double-precision floating point number. This means that in C double is an eight-byte 64-bit floating point value with the range -1.7E308 .. 1.7E308 with between 15 and 17 digits of precision.

There is also a long double type which is much more varied in its implementation and often not implemented at all. Typically long double is mapped to the IEEE 80 bit extended-precision type but there are also machines where it is mapped to a 128-bit floating point value.

It is worth knowing that many small CPUs don't have floating point in hardware and in this case you either cannot use floating point types or you have to use a software implementation which is very slow. In such situations it is usually possible to avoid the need for floating point by implementing arithmetic using a fixed point scheme. This will be covered in a later chapter.