“Remember, When You Point a Finger at Someone, There Are Three More Pointing Back at You”
— Unknown

It’s easy to meet even long-time C programmers who don’t fully grok pointers, let alone beginners. Because of this and the fact that pointers play such a crucial role in the C programming language, I’ve decided to launch a new series of blog posts on pointers. I want to start off with an episode that sheds some light on similarities and — more importantly — differences between pointers and arrays.

POINTERS AND ARRAYS: THE BASICS

An array is a sequence of same-sized objects, integers, for instance:

1

2

3

4

5

intarray[]={

0xA,

0xBBBB,

0xCC000000

};

On a big-endian machine, ‘array’ could be stored like this (that it starts at memory address 0xB00010 is just an example):

1

2

3

array 0x00B00010: 00 00 00 0A // First integer.

0x00B00014: 00 00 BB BB // Second integer.

0x00B00018: CC 00 00 00 // Third integer.

The compiler (or rather the linker) places the array at a fixed memory location. Thus, When you think array, think memory.

By contrast, a pointer is an object that holds a memory address. Pointers are used to refer to memory where an object of a specific type (like ‘int’) resides.

So as you can see, pointers and arrays use different ways to access memory and hence are fundamentally different beasts.

WHEN POINTERS LOOK LIKE ARRAYS AND VICE VERSA

Nevertheless, there are cases where pointers and arrays appear to be same thing.

The C language comes with a little bit of syntactic sugar. In certain situations you can use an array like you would use a pointer:

1

intx=*array;// Get first element of 'array'.

This looks like you are dereferencing a pointer named ‘array’, but looks can be deceiving. What this really compiles to is this:

1

intx=array[0];

Why? According to the C language standard, in expressions, the name of an array acts as a pointer to the first array element. Hence, the compiler really sees this:

1

intx=*(&array[0]);

which is equivalent to

1

intx=array[0];

Similarly, you can dereference pointers not just by using the ‘*’ operator but also by using the subscript operator [], which is another form of syntactic sugar — one that makes you believe you are accessing an array instead of a pointer:

1

2

3

4

5

6

7

8

9

// Plain pointer access:

intx1=*pointer;// Indirectly access first element.

intx2=*(pointer+2);// Indirectly access third element.

intx3=*(2+pointer);// dito (commutative law).

// Array-like access:

intx4=pointer[0];// Indirectly access first element.

intx5=pointer[2];// Indirectly access third element.

intx6=2[pointer];// dito (commutative law, who knew?).

All this syntactic sugar makes C code involving pointers and arrays easier on the eyes — the compiler will do some access magic behind the scenes. The downside is, that it deludes people into believing that pointers and arrays are the same, which is not the case: arrays employ direct access, pointers indirect access.

Contrary to expressions, such syntactic sugar is not available in declarations. If you define an array in one translation unit (file):

1

2

3

4

5

6

constintVALUES[4]={

0x1111,

0x2222,

0x3333,

0x4444,

};

and foolishly attempt to import it into another translation unit via this forward declaration:

1

2

externconstint*VALUES;// Import 'VALUES' into translation unit.

intx=*VALUES;// Indirect access, undefined behavior!

you risk a crash because dereferencing ‘VALUES’ will indirectly access memory when a direct access was required. Let’s assume that the array is stored like this, as defined in the first translation unit:

1

2

3

4

VALUES 0x00B00210: 00 00 11 11

0x00B00214: 00 00 22 22

0x00B00218: 00 00 33 33

0x00B0021C: 00 00 44 44

Now, dereferencing ‘VALUES’ declared as a pointer will lead to these steps:

What this means in practice depends on whether the address 0x00001111 is a valid address or not. If it is, arbitrary data will be read; otherwise, the memory management unit (MMU) will raise an exception. Therefore, make sure that your array declarations exactly match your definitions:

1

2

3

externconstdoubleVALUES[5];// Matches definition.

intx=VALUES[0];// Direct access.

inty=*VALUES;// dito, syntactic sugar.

PASSING ARRAYS TO FUNCTIONS

So far so good (or bad). Another source of confusion is the fact that arrays are the only objects in C that are implicitly passed by reference:* You always provide a pointer to the first array element to get an array into a function:

1

2

3

4

5

6

7

intsum(int*nums,size_t len){

inti,sum=0;

for(i=0;i<len;++i){

sum+=nums[i]// indirect access, syntactic sugar.

}

returnsum;

}

At the caller’s site, the code looks like this:

1

2

inttotal1=sum(array,3);// Pass pointer to 1st elem, syntactic sugar.

inttotal2=sum(&array[0],3);// dito, but explicitly.

TYPE-SAFETY THAT ISN’T

Sometimes, you want to ensure at compile-time, that only arrays of certain sizes can enter your function. Imagine you have a function that builds a 128-bit random value in an array of eight bytes:

1

2

3

4

5

voidget_random(uint8_t*random){

for(size_ti=0;i<8;++i){

random[i]=*get_random_byte();

}

}

‘get_random’ assumes that it is passed the address of eight bytes of memory, but nobody prevents the caller from passing an array that is not big enough:

1

2

uint8_t myrand[4];// Short by 4 bytes.

get_random(myrand);// but compiles fine...

Which will — of course — lead to a dreaded buffer overrun.

Is it possible to make ‘get_random’ type-safe, such that arrays with a length different to eight lead to compile-time errors?

One (ill-fated) approach is to employ a C feature that allows you to declare arguments using array-like notation:

1

2

3

voidget_random(uint8_t random[8]){

...

}

However, this doesn’t give you any extra type safety. To the compiler, ‘random’ is still a pointer to a ‘uint8_t’ and if you ask for the size of ‘random’ (via sizeof(random)) in the body of the function, you will still get the value returned by sizeof(uint8_t*). Few developers are aware of this fact. To me, it’s a source of nasty bugs.

Since this array-ish syntax fools people into believing that a real array was passed to a function (by value) I don’t recommend using it.

TYPE-SAFETY DONE RIGHT

You can get real type-safety for your “array” arguments through so-called “pointers to arrays”. Alas, this C feature tends to confuse the heck out of programmers.

In the previous examples, we passed an array (conceptually) by passing a pointer to the first element:

1

2

3

uint8_t randval[8];

get_random(randval);// Implicitly.

get_random(&randval[0]);// Explicitly.

The real type of the array and the size of the array is lost in this process; the called function only sees a pointer to a ‘uint8_t’. By contrast, the following syntax allows you to obtain a pointer to an array that preserves the full type information:

1

2

3

typedefuint8_t RANDVAL[8];

RANDVAL randval;

RANDVAL*pointer=&randval;// note the '&'

This ‘pointer’ is completely type-safe:

1

2

3

int*p=pointer;// Doesn't compile, incompatible pointers.

get_random(pointer);// dito.

intx=(*pointer)[9];// OK: extract 10th element.

To add type-safety to our ‘get_random’ function, we could define it like this:

1

2

3

4

5

voidget_random_type_safe(RANDVAL*random){

for(size_ti=0;i<sizeof(*random);++i){

(*random)[i]=*get_random_byte();

}

}

With this change, ‘get_random_type_safe’ only accepts pointers to 8 element arrays of uint8_t’s. Passing any other kind of pointer will result in a compile-time error.

We know that in expressions, using an array’s name like ‘array’ is short for “pointer to first element in array” but that doesn’t mean that ‘&array’ is a pointer to a pointer to the first element — the ‘&’ operator doesn’t create another level of indirection, even though it looks like it did. In the previous example, the value stored in ‘pointer’ is still the address of the first element of the array. Hence, this assertion holds:

1

2

assert((size_t)array==(size_t)&array);// Casting to 'size_t' obtains

// numeric value of address.

Since the actual pointer values are the same, you can still use legacy APIs that only accept pointers to ‘uint8_t’s (like the original ‘get_random’ function), if you apply type casts:

1

2

uint8_t*p=(uint8_t*)pointer;// OK, but type-safety lost.

get_random(p);// Fine.

You don’t need typedefs like ‘RANDVAL’ if you want to employ pointers to arrays. I mainly used it to avoid overwhelming you with the hideous pointer-to-array syntax. Without typedefs, you would need to type in things like this:

1

2

3

4

5

6

7

uint8_t randval[8];

uint8_t(*pointer)[8]=&randval;

voidget_random_type_safe(uint8_t(*random)[8]){

for(size_ti=0;i<sizeof(*random);++i){

(*random)[i]=*get_random_byte();

}

}

The syntax to declare pointers to arrays is similar to the syntax to declare pointers to functions and takes a little getting used to. If in doubt, ask the Linux tool ‘cdecl’ which is also available online:

1

2

cdecl> explain int (*x[10])[42]

declare x as array 10 of pointer to array 42 of int

Do I recommend using pointers to arrays? No, at least not in general. It confuses way too many developers and leads to ugly casts in order to access plain pointer interfaces. Still, pointers to arrays make sense every now and then and it’s always good to know your options.

This concludes my first installment on pointers. There is more to come. Stay tuned!

________________________________

*) The language designers of C believed that passing an array by value (e. g. as a copy via the stack) would be extremely inefficient and dangerous (think: stack overflow), so there is no direct way to do it. However, they were not so fearful regarding structs (which can also get quite large and overflow the stack), so you could pass an array by value if you wrapped it inside a struct:

1

2

3

4

5

6

7

8

9

typedefstruct{

intdata[3];

}MY_ARRAY;

voidsome_func(MY_ARRAY the_array){

the_array.data[0]=...

...

}

MY_ARRAY array2={1,2,3};

some_func(array2);// Pass by value, ie. duplicate array2 on the stack.