C code optimisation

This tutorial will discuss various aspects of code optimisation – (I’m a Brit so we use an ‘s’ not a ‘z’ - let’s just get that out of the way now!).This an enormous, and maybe emotive, subject so be prepared to do a lot of reading !

We will cover items such as:- * How optimisation works (using a pretend processor and pretend compiler) * Discuss my 'post-boy' and 'post-boy-supervisor' and 'wash-your-hands' analogies * The register, volatile and const keywords * Look at how to change optimisation via your makefile * Considerations to be given when interrupts or multiple threads are accessing the same data * How to minimise the amount of code you write by using structures for 'data-driven-programming'. ie like C++ but in C.

Although we will be specifically looking at the AVR microcontrollers and the free avr C compiler, many of these principals apply to other environments, and languages, but you may have to dig around in the documentation for your specific compiler to find the associated equivalent.

Code optimisation covers a myriad of things but can be split down into several basics:-

How to get the compiler to help you to optimise the code you have written by changing command line arguments in your makefile. This means you don’t need to change your code but you will benefit from having a basic understanding of what the compiler is doing on your behalf. Obviously the compiler can only ‘do so much’ – and is no replacement for…

How to write more optimal code in the first place

Also ‘optimisation’ can mean different things to different people, and can also mean something different depending upon what it is that you are trying to do:-

Sometimes we want the code to run as fast as possible (even at the expense of requiring extra memory) , or,

We want the code to take up as little space as possible (at the expense of speed)

Sometimes we want to use both of these methods. For example: you may want an interrupt service routine to run as fast as possible, whereas your main code may be optimised for space so that it fits in the limited memory available on a specific chip.

It may appear strange that you have options for both speed and size as you probably think that smaller code will run faster and so small=fast. As a general rule this is normally correct – but not always. More later.

01 - Segmentation

Before we go any further then it will be useful to
understand the make-up of an executable program and how that maps to the
hardware. This description is fairly generic and so wont get into the
ins-and-outs of .hex, .obj, .exe, .com files etc but attempts to explain the
raw principals.

An executable program file is normally made up of the
following items or ‘segments’:-

Your
code. This is machine code produced from your compiler for the target chip
and is ‘read-only’ – ie your program doesn’t re-write itself when it is
running. It is also of a fixed size (ie a constant number of bytes). This
is normally known as the ‘Code Segment’ or ‘Text Segment’.

Any
global data/variables. These are of a known size and are ‘read-write’. In a
C program then this is any global variables defined outside of any of your
methods. These variables can be split down into two separate types:-

Initialised.
If your program creates a variable with an initial value. Ie ‘int
foo=10;’ then the compiler creates this variable in the data section with
an initial value of 10 – so no code is required to assign it the value of
10. These sorts of variables are normally stored in the ‘Data Segment’.

Un-initialised.
If your program creates a global variable such as ‘char buffer[256]’ then
you are reserving 256 bytes of data but you aren’t actually assigning any
values to any of the 256 bytes. The compiler will not generally put this
into the ‘Data Segment’ as that would make the size of the runnable file
256 bytes larger. Instead it adds 256 to the size of what is often called
the ‘BSS Segment’. Once the program loads it zaps the entire BSS Segment
to a known value – normally zeroes – so all of your numeric variables are automatically initialised to a value of zero. This can be compiler specific so 'best practice' says that you should not rely on this happening - if you want to assume it has an initial value then you should initialise it yourself in your source code.

The
stack. This is never stored in a program file but is an area of memory
used when running the program. It is used by the microcontroller to store
things like return addresses and variables that are defined within a
method. When you call a method the
chip will first of all store the return address (ie where in your code it
needs to go back to afterwards) onto the stack. It will then create enough
space on the stack to store all of the variables defined within the method
that is being called. Once the method finishes then all of this space is
recovered from the stack and made available again. In a multi-threaded environment (say on 'Windows') then each program and thread will need its own stack. Since a microcontroller is normally used with a single thread then we will assume that this is the case and that there is thus only one stack.

The
heap. This is used if your compiler allows you to create new variables at
runtime by calling the ‘malloc’, ‘calloc’ or ‘new’ directives. Note that
most microcontrollers don’t allow you to do this or will implement them in
a limited way.

So your file will have: a) a Code Segment of a fixed size,
b) a Data Segment of a fixed size. It will also store a number for the total
number of bytes required in the BSS Segment.

Let's now look at the different sorts of memory stored in a typical microcontroller (in this case an ATMega8 as used in the $50 Robot).

Flash Memory is where you upload your program using your ISP programmer. It is used to store Read Only information (such as your code) and its contents survive reboots.

SRAM This is Read Write memory that is used to store changing information - like your variables and the stack. The contents are erased every time you reboot.

EEPROM This is used to store Read Write information but, unlike SRAM, its contents are retained between reboots. This is useful for storing information such as user preferences (eg Current Volume for my text to speech amplifier - so that next time I turn on the robot it keeps the same value).

This is the layout of your HEX file and the whole thing is loaded into Flash memory when you upload your program:-

The 'Vector Table' is set up to contain the addresses of all of the Interrupt Service Routines (ISRs) specified in your code. Most of the unused vectors will point to a piece of code that behaves as if the power has been switched off then back on. The big exception is that the first entry in the table holds the address of the start of the code to be executed at power on. You may think that this is your 'main' function - but you'd be wrong. It actually points to the start of the 'Stub Code'.

The 'Stub Code' is automatically linked in at compile time. Its purpose is to set up the microcontroller into a known 'ready' state before any of your code is run. We will have a look at what it does in more detail in a minute.

The 'Code Segment' contains all of your code and any of your unchanging data that you have told the compiler to place in program memory.

The 'Data Segment' contains all of your initialised data - ie variables, arrays, etc, that you have assigned an initial value to.

Now let’s look at how your program sits in memory when it is being run. As mentioned above: the first piece of code to be executed is the 'Stub Code'. This will set up the Stack Pointer to the end (ie top) of your SRAM memory. Next it will copy all of your Data Segment from the Flash memory to the start of the SRAM memory. Next it will carve out the number of bytes in the SRAM required by your BSS Segment (ie for unitialised variables, arrays etc) starting immediately after the end of the memory used for the Data Segment. This memory will be zapped to zeroes so that all these unitialised variables start with a value of zero. It then stores the address of the end of the BSS Segment as being the start address of 'available' memory for use by the heap. The heap will grow
upwards towards the stack and the stack will grow downwards towards the heap.
If the two overlap then you get the dreaded ‘Stack Overflow’ or ‘Out of Memory’
or the program just goes beserk. Having done all this the Stub Code will call your 'main' function. At which point the SRAM is organised as follows:

The answer comes down to managing the heap. A very simple
heap manager will just keep a ‘current end of heap’ marker. If you then ask for
20 bytes then the marker will be incremented by 20. If you free the 20 bytes,
and they are at the end of the heap, then the marker will move back by 20
bytes. But what if you grab three separate lots of 20 bytes and then free the
middle 20? Something has to remember that these are now free so that if the
last 20 bytes are freed then the pointer will move back by 40 (so that only
first 20 bytes are still in use). So the heap is often stored as a linked list
of memory blocks with, at least, a variable to indicate if they are currently
used or unused. Or may be as one linked list of ‘used’ heap and another list of
‘available’ space. But the problem gets worse with what is called
‘fragmentation’. Assume that your middle 20 bytes are ‘unused’ and your program
now asks for 5 bytes – should it carve this out of the available 20 or should
it increase the heap pointer further? This constant allocating and
de-allocating of different sized heap memory means that the heap becomes more
and more fragmented – ie has lots of small lumps of memory available. This
could mean that if you ask for 100 bytes then the system cannot give you an
answer. It may have a total of 100 bytes available but they are scattered all
over the heap and there isn’t one continuous lump of 100 bytes available. Bang
and crash!! More powerful computers use hardware that allows you to shuffle all
the available chunks into one contiguous lump so that your program can
continue.

So that is a very quick, and very dirty, explanation as to
why microcontrollers don’t allow you to dynamically allocate and de-allocate
heap memory – it just doesn’t have the necessary hardware and software to do it
very well.

Some microcontroller compilers do allow you to dynamically
allocate memory – but with this one enormous proviso:- The memory is
automatically freed when the method that allocated it in the first place exits.
How does that work? Well its then very similar to the stack. It will just use
the heap pointer to allocate new memory and then move it back to its original
position when the method exits. Therefore: you can never have any fragmentation
in the heap. However: its usage is fairly limited.

Consider this code:-

void myMethod(){

char buffer[256];

for(int I=0;
I<256;I++){

buffer[I]=I;

}

}

This will allocate space for ‘buffer’ by moving the stack
pointer down by 256 bytes.

Whereas this code:-

void myMethod(){

char buffer* =
malloc(256);

for(int I=0;
I<256;I++){

buffer->[I]=I;

}

}

will allocate space for ‘buffer’ by moving the heap pointer
up by 256 bytes. So they are almost identical.

But you must be careful not to do the following:-

char buffer*;

void myMethod(){

buffer =
malloc(256);

for(int I=0;
I<256;I++){

buffer->[I]=I;

}

}

since ‘buffer’ is assigned a value in the method – but,
unlike a ‘proper’ C compiler - once the method exits then ‘buffer’ is referring
to a lump of memory that is now being used by other things. So changing some of
its values will corrupt something else. Bad!

So dynamic memory allocation should generally be avoided and
we will not discuss it any further!!

Special consideration of Segments in a micro-controller

The main difference to the generic description of segments
is that a micro-controller has different sorts of memory. ‘Flash’ memory which
you can write to using your ISP and survives reboots but otherwise doesn’t
really change, and ‘RAM’ or ‘SRAM’ which is used as a temporary scratch pad
when the program is running.

The ‘Code Segment’ is written into the Flash memory and the
RAM is used to store the runtime data ie the Stack Segment, BSS Segment and
Data Segment.

Since the Stack and BSS segments don’t need to hold any
initial values then that’s fine but there is a problem with the Data Segment.
If we turn off the controller and turn it back on then it must ‘somehow’ know
what values to store into the initialised global variables in the Data Segment.
It can only do this by holding the Data Segment in Flash memory, and then
copying it to RAM before your ‘main’ method is called. So the Flash memory needs to be big enough
to store your Code segment and a read-only copy of your Data Segment. When the
program runs then your RAM must be big enough to hold your Data, BSS and Stack
Segments.

Generally a micro-controller has a lot more Flash
memory than it has RAM and so we sometimes need to change our program to reduce
the amount of RAM required. A common technique is to store read-only data (such
as text for logging messages or lookup tables) only in the Flash area. It is more complicated,
and slower, to access but frees up more RAM for variable data.

02 - Optimisation fundamentals

Now we know how memory is used within a micro-controller
then lets look at the options a compiler can use for optimising your code.

We will start by considering a very simple (but useless!)
piece of code:

int j=0;

for(int i=0; i<100;i++){

j+=i;

}

postIt(++j);

Most compilers will decompose this C syntax by replacing all
of the constructs like (for, while, do until, ++, --, etc) to produce a more
simple language made up only from assignments and arithmetic:-

int i;

int j;

j = 0;

i = 0;

goto while;

loop:

i = i + 1;

while:

if i >= 100 then goto end

j = j + i;

goto loop

end:

j = j + 1;

postIt(j);

Why does it do this? Well you can decompose the syntax of
most languages into such simple steps. Having done so then the process of
converting this ‘simple’ language into machine code can then be common. So it
makes life easier when creating say a Basic and a C compiler – as once the
program is in this ‘common’ language then the same compiler can be used to
convert it into machine code.

If you compile the original program with no optimisation
then you will get something similar to the above.

How can the compiler do it better?

The variables ‘i’ and ‘j’ in our program are stored in RAM
(either in the Data Segment if they are global, or in the Stack Segment if they
are local to a method). So the processor will have created enough space to
store them there. Lets assume for now that an ‘int’ is stored as a single byte
(this is not the real world case!).

The ‘post-boy’ analogy

Think of memory as a long line of pigeon-holes in the post
room. Each slot has its own unique location (or ‘address’ in geek speak) and
can store, in our case, one piece of paper (or a byte) that has a current value
written on it. So our program allocates two new pigeon-holes:- one for ‘i’ and
one for ‘j’ and stores a blank sheet of paper into each one (ie the variables
start off un-initialised).

So if I told the ‘post boy’ that ‘j = j + i’ he would have
to run/walk to get the paper in slot I, look at the value written on it and put
it back, go to slot ‘j’, look at the value on it, put it back, add the two
values together in his head, go to slot ‘j’ and write the answer on the sheet
in slot ’j’ and put it back.

Let’s convert our program into ‘post-boy’:

Our program

Un-optimised Post-Boy code

int i;

Find an empty pigeon-hole, label it as ‘i’, and insert a
blank sheet of paper – ie the variable is ‘un-initialised’.

int j;

Find an empty pigeon-hole, label it as ‘j’, and insert a
blank sheet of paper – ie the variable is ‘un-initialised’.

j = 0;

Walk to ‘j’, write 0 on the paper, and put it back. Now
the variable has an initial value.

i = 0;

Walk to ‘i’, write 0 on the paper, and put it back. Now
the variable has an initial value

goto while;

Goto while:

loop:

Loop:

i = i + 1;

Walk to ‘i’, add 1 to the value, and put it back

while:

While:

if i >= 100 then goto end

walk to ‘i’, look at the value, and put it back. If its
>=100 then goto end

j = j + i;

walk to ‘j’, look at the value, walk to ‘i’, look at the
value, walk back to ‘j’, write the result, and put it back

goto loop

Goto loop

end:

End:

j = j + 1;

walk to ‘j’, add one to it, and put it back

postIt(j);

walk to ‘j’, get the paper, and ‘postIt’

Phew! That’s slow – and has a lot of pointless running
around. Welcome to ‘non-optimised code’. It does what you’ve asked it to do -
but not very well .We’ll come back to this in a minute.

Just like the post-boy a processor doesn’t like having to
access RAM – all that running around is slow.

To make things faster the processor has a very few other
items of memory called registers. These sit right at the heart of the chip, and
are very fast to access. But there ain’t many of them and some are allocated to
special things like stack pointers, program pointers etc. Since there are a
finite number of them available then they are given names like: A,X,Y or
R1,R2,R3 etc depending on the specific processor. Since each processor has a
different number of registers then this is one of the reasons why you have to
tell the compiler what sort of processor you are compiling the program to run
on. We will assume that there are two registers and that they are called R1 and
R2.

N.B. If you look at the instructions available on an AVR,
and most processors, then anything that requires memory access (ie a walk to
the ‘pigeon-hole’) typically takes twice the amount of cycles (time) than
operators that do something with registers. Also there are normally no machine
code instructions that allow you to fetch something from memory, do something
to it (like adding a fixed value) and then writing it back to the same, or a
different, memory location without using a register. Everything has to go via a
register. You can load from memory to a register, add/subtract etc registers
together, then write the answer back to a memory variable. So all processors
must have at least one register.

So back to our post-boy…

He has two registers as well but he calls them his ‘left
hand’ and his ‘right hand’ – on which he can write a value with a pen. Once he
has copied the value of a pigeon-hole onto his hand then he can see its value
very quickly without having to run back to the pigeon-hole every time. Therefore it makes sense if he keeps both
hands full at any given time.

If he then needs to access a third pigeon hole he will have
to ‘dump’ one of the ones he has written on his hands so that he can write down
the new value from the third pigeon hole.

Obviously: the decision as to which hand he should free up
is an important decision.

A common technique is ‘least frequently used’. This means
that he puts back the value from the hand that he hasn’t needed to look at for
a while – by writing it back down on the slot where it belongs.

Other solutions may take the variable size into account – ie
a ‘byte’ only needs one hand but a ‘long’ may require two hands so freeing a
‘long’ may release more ‘hands’.

Alternatively: the value on one hand may not have changed –
in which case he can just rub it out without having to go back to the pigeon
hole to save the new value.

Before we proceed:- there is one more very important thing
the compiler needs to be able to do. In ‘Post-Boy’ speak I will call this ‘Wash
your hands’. This means: copy any changed values written on your hands back into
the pigeon-holes where they belong. Why do we need this? If you have a loop in
your code (for, while etc) then every time the loop starts we have no idea what
information is stored on what hand at the previous end of the loop and so we
need to start with a clean pair of hands. Also if we are going to call another
method then we have no idea what will be in our hands when we call it (since it
may be called by 100 different places in our code) so we need to ‘wash our
hands’ before calling a method. For the same reason a method will normally
‘wash our hands’ when it exits – so that any modified values are written back
since the 100 places that call it cannot depend on what may be in each hand.
The only exception is a function – ie a method that returns a value – in which
case it is assumed that the return value is always in a given register – your
left hand say. Other branches such as GOTO then need to do the same thing –
since, for the code where we end up, we don’t know what will be in each hand.

So now lets look at the previous example but this time we
want to optimise it so that we don’t do as much slow walking about:-

Our program

Optimised Post-Boy code

Left Hand

Right Hand

int i;

Find an empty pigeon-hole, label it as ‘i’, and insert a
blank sheet of paper – ie the variable is ‘un-initialised’.

Unused

Unused

int j;

Find an empty pigeon-hole, label it as ‘j’, and insert a
blank sheet of paper – ie the variable is ‘un-initialised’.

Unused

Unused

j = 0;

Find a free hand (Left) and write ‘0’ on it remembering
that it should be stored back in j.

j=0 (changed)

Unused

i = 0;

Find a free hand (Right) and write ‘0’ on it remembering
that it should be stored back in i.

j=0 (changed)

i=0 (changed)

goto while;

About to do a goto so: Wash-your-hands:

Left hand has changed so walk to j and put the value ‘0’.

Right hand has changed so walked to i and put the value 0.

Goto while:

j=0 (unchanged)

i=0 (unchanged)

loop:

Loop:

Start of a loop so assume its empty

Start of a loop so assume its empty

i = i + 1;

Walk to ‘i’, find a free hand (Left) and write it down

i (unchanged)

Empty

add 1 to the value

i + 1 (changed)

Empty

About to do a goto so: Wash-your-hands:

Left hand has changed so walk to I and put its new value
of I=I+1. Left hand now stores new value of I

Goto while

i (unchanged)

Empty

while:

While:

Start of a loop so assume its empty

Start of a loop so assume its empty

if i >= 100 then goto end

walk to ‘i’, find a free hand (Left) and write it down.

i (unchanged)

Empty

If its >=100 then goto end

The goto will ‘Wash your hands’ but since no value has
changed then nothing needs to be written. Otherwise the left hand keeps the
same value.

i (unchanged)

Empty

j = j + i;

walk to ‘j’, find a free hand (Right) and write it down

i (unchanged)

j (unchanged)

Add left hand to right hand

i (unchanged)

j = j + i (changed)

goto loop

Wash your hands

Left hand is unchanged so do nothing.

Right hand is changed so walk to j and store right hand.

Goto loop

Empty

Empty

end:

Could have got here by various Gotos (which will have
washed-their-hands) so assume everything is empty

End:

Empty

Empty

j = j + 1;

walk to ‘j’, find a free hand (Left) and write it down

j (unchanged)

Empty

Add one to it

j=j+1 (changed)

Empty

postIt(j);

Calling a method so wash-your-hands: save left hand to j

j (unchanged)

Empty

Call ‘postit’ passing variablej

j (unchanged)

Empty

Since ‘postit’ doesn’t return a value then assume all
hands are empty upon return

Empty

Empty

I know it seems more complex – but believe me we have saved
a very small amount of walking and so it runs a bit faster. But it’s still not
very efficient – ie there’s still a lot more ‘walking’ than we need.

How can we do better?

Examining the code above then the variable ‘I’ is used quite
a lot. We only have to walk to/from it when after/before we ‘wash-your-hands’.
Wouldn’t it be great if we could tell the compiler that we actually always
wanted to keep it in our hands and avoid all that walking about. Well you can
do this by using the keyword ‘register’ before the name of the variable. Ie
change

int i;

To

register int i;

This will then try to reserve the first register (left hand)
to be used exclusively for storing this variable and so the normal
wash-your-hands can ignore it. The only exception is when calling another
method which may still use any registers (for example: it may define its own
‘register’ variables). So before calling a method then your routine will still
save any unchanged value for your register variables prior to calling another
method, and will assume that the register is empty upon return. So now our code
looks like this:-

Our program

Optimised Post-Boy code

Left Hand

Right Hand

register int i;

Find an empty pigeon-hole, label it as ‘i’, and insert a
blank sheet of paper – ie the variable is ‘un-initialised’.

If there is an available register (yes – Left Hand) then
reserve it ;for variable i.

i (unchanged)

Unused

int j;

Find an empty pigeon-hole, label it as ‘j’, and insert a
blank sheet of paper – ie the variable is ‘un-initialised’

i (unchanged)

Unused

j = 0;

Find a free hand (Right) and write ‘0’ on it remembering
that it should be stored back in j.

i (unchanged)

j=0 (changed)

i = 0;

I is in the left hand so set it to zero

i=0 (changed)

j=0 (changed)

goto while;

About to do a goto so: Wash-your-hands:

Left hand is a ‘register’ variable so leave it alone.

Right hand has changed so walked to j and put the value 0.

Goto while:

i=0 (changed)

j=0(unchanged)

loop:

Loop:

I (changed)

Start of a loop so assume its empty

i = i + 1;

We have i in our left hand so just add 1 to it

i=i+1 (changed)

Empty

About to do a goto so: Wash-your-hands:

Left hand is a register variable so leave it alone.

Right hand is empty so do nothing

i (changed)

Empty

while:

While:

Start of a loop so assume it has the current value of i

Start of a loop so assume its empty

if i >= 100 then goto end

i is already in left-hand

i (changed)

Empty

If its >=100 then goto end

The goto will ‘Wash your hands’ but since the left hand is
a register variable and the right hand is empty then do nothing.

i (changed)

Empty

j = j + i;

walk to ‘j’, find a free hand (Right) and write it down

i (changed)

j (unchanged)

Add left hand to right hand

i (changed)

j = j + i (changed)

goto loop

Wash your hands

Left hand is a register variable so do nothing.

Right hand is changed so walk to J and store right hand.

Goto loop

Empty

Empty

end:

Could have got here by various Gotos (which will have
washed-their-hands) so assume everything is empty except left hand always
stores i

End:

i (changed)

Empty

j = j + 1;

walk to ‘j’, find a free hand (Right) and write it down

i (changed)

j (unchanged)

Add one to it

i (changed)

j=j+1 (changed)

postIt(j);

Calling a method so wash-your-hands including any register
variables.

Walk to i and save left hand, walk to j and save right
hand

Empty

Empty

Call ‘postit’ passing variable j

Empty

Empty

Since ‘postit’ doesn’t return a value then assume all
hands are empty upon return.

But reload ‘Left’ with register variable ‘i’ since its
always meant to be there

i (unchanged)

Empty

What you should find is that almost all of those expensive
walk-to-i steps have vanished. So this code will be:-

a)
smaller – as it doesn’t have to include the instructions to
keep loading from i, and saving to j

b)
faster – because all of the instructions in a) are quite slow
and are no longer used.

So now you may think that putting ‘register’ in front of
EVERY variable will be great. Well the problem is you may have 16 variables but
only 3 registers – in which case the compiler cannot honour your request. So you
should consider the ‘register’ keyword as a ‘hint’ to the compiler – ie you are
trying to help it optimise the code but it may not be able to honour what you
have asked it to do in which case it may ignore some/all of your ‘register’
keywords. So use them sparingly. In fact most modern compilers will inspect the
code and automatically try to decide which variables are best kept in registers
and may disregard the ‘register’ keyword all together. But whether your
compiler does it automatically, or respects your ‘register’ keywords, the fact is: the code is much faster.

So, in summary, we can make the program smaller and thus
faster by reducing the number of round trips to the RAM to read and/or write
variables un-necessarily.

What is different then when you optimise for speed?

The speed at which the program runs will depend not only on
the number of bytes of program code that it has to execute but may also be
affected by something called pipelining.

We already have post-boy running around doing things – but
how does he know what it is that he should be doing next? He’s controlled by the
program. But the program is also stored in Flash Memory and someone or
something needs to be reading the next command from the program in order to
tell post-boy what to do next. Most processors have a dedicated register called
the Program Counter that contains the offset into the program for the next
instruction. Of course we could potentially tell post-boy that after he has
done something: he should look at the the program counter and run to that memory
location to get the next command, and do what it asks him to do. Again this
is slightly inefficient. Why not have a post-boy-supervisor who tells him what
to do? Then whilst post-boy is running
around doing what he has been told to do (such as walk to ‘j’, and come back
with its current value in your left hand) then post-boy-supervisor could, simultaneously,
be running to the program to find out what he should tell post-boy to do next. Makes
sense. Ooh and by the way – the processor just does this for you so just take
it as read.

So where does ‘pipelining’ come in?

Well lets assume post-boy is very busy running around with
bits of data. In the meantime ‘post-boy-supervisor’ may already have come back
with the next instruction. Rather than waiting for post-boy to come back the
supervisor goes and gets as many more program instructions as he can. So he reads ahead (called
prefetch) and thus helps predict the next instructions that post-boy will need.
This is fine until supervisor reaches a branching instruction (if, for, while,
do, switch, goto, call another method etc). Since most of these will depend
upon the contents of the variables that post-boy comes back with then supervisor
will not know where the program will actually go until post-boy comes back with
the actual answers. So all he can do is keep pre-fetching the next instruction
in the program. When you come to branches then, more often than not, the
supervisor has got it wrong and the program actually ends up going somewhere different.
In which case all of the prefetched instructions are from the wrong part of the
program and must be discarded. Now supervisor has to start getting the instructions from the new position.

Optimising for speed can therefore generate code which tries
to reduce the number of branches. This often means duplicating some code into
both the ‘then’ and ‘else’ elements of an ‘if’ statement - for example. So the program is
bigger but because there are less branches then the supervisor can be more
efficient at pre-fetching and so post-boy doesn’t spend as much time waiting for
the supervisor to come back with the next command for him to do.

So if you tell your compiler to ‘optimise for speed’ then it
will re-arrange your code for you automatically. Just be tolerant of the fact
that it may actually make your program bigger in size.

Things to avoid so that post-boy and post-boy-supervisor can work at a reasonable speed

Here is an example of where trying to optimise stuff
yourself by changing your code can have hidden problems. Lets assume that your
method does ‘i = i + 1’ quite a lot. You may be tempted to create a method:

int increment(int i){

return i+1;

}

and change the rest of your code so that it instead of:

i = i +1

It does

i = increment(i);

This is bad for a number of reasons:

Post-boy-supervisor
cannot pre-fetch very effectively as the program keeps calling your method
so he has to keep throwing away the instructions he has already retrieved.

Post-boy
keeps having to ‘wash-your-hands’ since it has no idea what registers the
method is going to need. So if he has ‘i’ as a register variable then he
will have to keep storing it back into its pigeon hole every time the
routine is called and reloading it upon return.

Every
method has some code automatically inserted by your compiler at the start
and at the end. So this additional code, plus the code caused by
‘wash-your-hands’ whenever its called will probably actually mean that
your program is bigger and runs slower.

A
lot of very small methods tend to make your code very hard to understand
to the reader.

Methods should do something sensible and not just
be one-line bits of code. If you really want to do something like this then see
‘#define’ later on.

Simple things that make a big difference - use the correct variable type

Although the compiler can help to optimise the code based on
what you have written – it is very difficult for it to correct any mistakes you
have made. So the big thing you can do to help is to choose your variable types
sensibly. Most microcontroller code deals with either ‘numbers’ or
‘characters’. Characters = ‘text’ and is normally logged out to an LCD, UART,
computer etc. The standard definition for a ‘char’ is an 8 bit byte and is the
smallest lump of data we can deal with – so cannot be optimised further. But
‘Numbers’ can be a minefield. C, like most languages, allows you to define
‘number’ variables that can store different ranges of numbers. Hence: a ‘long’
can store a bigger integer number than you can store in an ‘int’. But, in my
opinion, the biggest problem with C is that the size of an ‘int’ depends on the
hardware it is running on – (before the purists beat me up I know ‘why’ this is
the case). As a programmer you normally know what the range of legal values
will be and so you need to make sure your program will cope with this range no
matter what platform it actually runs on. The ‘avr-lib’ helps you out here by
creating some type definitions for different sized variables. I recommend that
you use them. They include:-

int8_t – Can store a number from –128 to +127 (requires 1
byte)

uint8_t – Can store a number from 0 to 255 (requires 1 byte)

int16_t – Can store a number from –32,768 to +32,767
(requires 2 bytes)

uint16_t – Can store a number from 0 to 65,535 (requires 2
bytes)

int32_t – Can store a number from –2,147,483,648 to + 2,147,483,647 (requires 4 bytes)

uint32_t – Can store a number from 0 to +4,294,967,295 (requires 4 bytes)

Then there are the other standard ‘C’ types:-

Float – stores a floating point number.

Double – like a float but can store the number with a
greater precision.

Both of these should be avoided whenever possible. They both requires a lot of
space (say 8 bytes or more). More importantly: on platforms without dedicated floating point hardware such as microcontrollers it means that a whole bunch of extra ‘floating point code’ gets added to your program and is quite big. Note that this code is added as soon as you use one 'float' or 'double'. If you add further floats/doubles then no more code is added.

Make sure that you choose one of the above types that can
store the range of numbers you need but requires the fewest bytes of memory. Bear in mind that ‘post-boy’ has hands of
a fixed size – they can only store one byte. So if you use a 2 byte variable
type then you are limiting his options to keep things in his hands and will
cause him to do more walking and slow down your program.

Summary

This section has tried to describe the basics of how
optimisation works – across any processor. This has included descriptions of
our ‘post-boy’ and ‘post-boy-supervisor’ analogies to try and explain what the compiler is doing to make your program run differently. A later section will show you how to tell the compiler what optimisation method you want it to use.

Optimise for speed may make your program bigger but makes
post-boy-supervisor more efficient at pre-fetching the next instructions, by
minimising any branching, meaning post-boy spends less time waiting for him to
get back with the next instruction.

Optimising for size, and by using register variables, means
that the compiler can issue less ‘walks’ for the post-boy.

Always try to choose the ‘smallest’ data type for each of
your variables. Failure to do so will make your program bigger and slower.

Once we start to get our hands dirty with real examples then
we can show the actual effects of these options.

Of course you may not be very interested in all this detail
and just want to ‘make stuff work better’. This, of course, is an option – but
to get the most out of your compiler and hardware then the greater your
understanding then the better results you can achieve.

03 - Telling your compiler how you would like it to optimise

The best way to tell the compiler how you would like it to perform optimisation is via your 'makefile'. This way you can change one line in your makefile and rec-compile the entire program with a new global optimisation setting.

Some compilers allow you to explicitly change the optimisation by placing directives (such as a '#pragma') to change the default optimisation level. For example: you may have a certain method that you never want to be optimised - in which case you can surround that method with the directives to explicitly say 'Dont optimise' regardless of what the global settings are in your makefile. Unfortunately - I've not found a way to do this with avr-gcc.

At the end of this page you will find my 'generic' makefile for our test programs. Please use this, for now, rather than trying to add my changes to your makefile and then bombarding me with questions as to why your file doesn't work. The answer will always be: 'have you tried it with my makefile'!

Create a scratch directory somewhere for your code. Cut and paste my makefile and save it to that folder. The makefile expects a singe C file called 'main.c' and a single H file called 'global.h' (since some of the AVRLIB files expect it to be called global.h). For the puposes of our tests then I suggest you also cut and paste my global.h which just links in some avr-lib routines. Then, for each of our tests, you will just need to edit 'main.c'.

Here is the makefile:-

# On command line:## make all = Make software.## make clean = Clean out built project files.### make filename.s = Just compile filename.c into the assembler code only## To rebuild project do "make clean" then "make all".#

# Where your AVRLIB is locatedAVRLIB = "C:/Program Files/AVRlib"

# MCU nameMCU = atmega8

# Processor frequency.# This will define a symbol, F_CPU, in all source code files equal to the # processor frequency. You can then use this symbol in your source code to # calculate timings. Do NOT tack on a 'UL' at the end, this will be done# automatically to create a 32-bit value in your source code.ie 8000000 = 8MHzF_CPU = 8000000

# List Assembler source files here.# Make them always end in a capital .S. Files ending in a lowercase .s# will not be considered source files but generated files (assembler# output from the compiler), and will be deleted upon "make clean"!# Even though the DOS/Win* filesystem matches both .s and .S the same,# it will preserve the spelling of the filenames, and gcc itself does# care about how the name is spelled on its command-line.ASRC =

# List any extra directories to look for include files here.# Each directory must be seperated by a space. This means you cannot use any directories# that include a space such as 'C:/Program Files/AVRlib'# In that case use the CINCS below and add a -I to each directoryEXTRAINCDIRS =

# Place -I options here to include directories with spaces in their namesCINCS = -I ${AVRLIB}

# Assembler flags.# -Wa,...: tell GCC to pass this to the assembler.# -ahlms: create listing# -gstabs: have the assembler create line number information; note that# for use in COFF files, additional information about filenames# and function names needs to be present in the assembler source# files -- see avr-libc docs [FIXME: not yet described there]ASFLAGS = -Wa,-adhlns=$(<:.S=.lst),-gstabs

This line indicates the target processor type that the compiler should issue commands for. The make process also shows the percentage of memory used based on the amount of memory contained in the specified device.

F_CPU = 8000000

This sets the processor speed ie 8000000 = 8Hz. This will pass this value to your code as a variable so that if you have code, like a delay loop, that changes based on the processor speed then it can use this variable. So you only need to change the variable in the makefile and recompile your code.

OPT = 0

This sets the optimisation level and is the setting we are going to experiment with the most. The valid values are the numbers 0, 1, 2, 3 and the letter s. 0 will not optimise the code at all. The other numbers will apply more and more optimisation. The 's' option will set the various optimisation flags to optimise for the best speed.

And here is the default contents for 'global.h' that just includes some of the most basic files from avr-lib. Note that this shows how the value of F_CPU, passed from the makefile, can be used within your code. In this case it creates another variable/macro that holds the number of cycles that are executed per microsecond.

Lets compile it with no optimisation (OPT = 0 in your makefile). Run the makefile and you should see the following output:-

avr-gcc (GCC) 4.2.2 (WinAVR 20071221)Copyright (C) 2007 Free Software Foundation, Inc.This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The final few lines tell us that the program will require 228 bytes of program flash memory to store the program, and 0 bytes of SRAM memory at runtime. Note the 'cool' percentage figures - this is because the compiler knows how much memory an ATMega8 has.

Now we will edit the makefile and change the line saying 'OPT=0' to say 'OPT=s' - ie we want to change the compiler to optimise for size. Note that when we change the optimisation level then its imperative to run:make cleanto force the files to be re-compiled with the new setting.

So run the makefile again - and the last part of the output should now say:-

Size after:AVR Memory Usage----------------Device: atmega8

Program: 104 bytes (1.3% Full)(.text + .data + .bootloader)

Data: 0 bytes (0.0% Full)(.data + .bss + .noinit)

So our program now takes up less than HALF THE SPACE !!!!!! You may now be tempted to just add the OPT=s option to your own makefile and stop reading this tutorial. Do so at your peril. Otherwise: a lot of the SoR code will no longer run !!You need to keep reading to find out why and how to fix it.

So the size reporting is really useful. Note how each element that makes up the size was discussed in our earlier section: ie .text, .data, .bss. So lets make some changes to our code and see if the results are as we expected. First of all change the makefile to disable optimisation ie 'OPT = 0' and run a 'make clean'.

You may have wondered why the makefile says that the Data size is 0 bytes. The reason is that we don't have any global variables. The only variables we have are local to their methods and only exist when that method is running and are therefore stored in the stack - they dont require any memory to be permanently allocated to them.

So lets add in some 'dummy' global variables just to see what happens:-

We have added two variables. Each of them is an 'uint16_t' so require 16 bits (ie 2 bytes) each. So a total of 4 bytes.

Recompile your program. You should find that the 'Data' area now requires 4 bytes and the 'Program' has also grown by 4 bytes. This is because the variables have been given initial values. So the program has grown in order to store these initial values so they are remembered between power-ups, and the Data has grown to store the actual versions at runtime.

Now replace the two lines you just added for 'fred' and 'jim' with a line saying:

char buffer[256];

ie a 256 byte buffer that could, say, be used to build up lines of text to send out via a UART. Note that although we have asked for 256 bytes - we haven't actually given any value to them. So when we compile the program we notice that:

The Data area is now 256 (ie we need 256 bytes at runtime for our buffer) - but the Program area hasn't grown at all (because there are no initial values that it needs to remember).

This illustrates the difference between initialised and unitialised values. The motto is 'dont initialise variables if you don't need to - as it will make your program bigger'. See this example:

#include "main.h"

uint8_t sensor = 0;

int main(void){

while(1){

sensor = readSensor();

... do stuff ..

}

}

There is no point initialising sensor with 0 as your main program assigns it a real value before it is ever read back in. By removing the initialisation your program will be slightly smaller.

So I suggest that you play with the various optimisation settings and see what happens. One thing I've noticed is that the 'register' keyword doesn't seem to make any difference - I guess the AVR compiler either ignores it or automatically selects the best variables to keep in registers. Which is good.

05 - The perils of optimisation! (volatile and interrupts)

As already mentioned - if you just compile your program with optimisation turned on then your program may no longer work.

Why is this? Surely the compiler shouldn't corrupt my program?

The biggest issue you will come across is pieces of code such as this:-

The purpose of this method is to introduce a delay. ie to waste time and not do anything else!

When optimising the compiler looks at the code and says: ok there is one variable which we decrement but it is never used in anyother way. Hence the code is just wasting time. So your code gets optimised as follows:-

void delay_cycles(unsigned long int cycles)
{ }

In fact it may get rid of the method altogether and remove any calls to it. Because you may be using these delays to control servos or to read sensors then these will stop working as intended.

And here is a big hint with optimisation: when writing new code always compile it without optimisation. If it works correctly then recompile with optimisation and make sure it still works. You are better doing this frequently ie don't write an enormous program and then turn optimisation on as you will now have loads of code to inspect to try and work out what isn't working properly.

Q. So how do we fix this problem and make the compiler introduce the delay loop?

A.There are 3 potential solutions:-

Some compilers allow you to use what are called 'pragma directives' so that you can change the level of optimisation at any point. In the above case you would tell the compiler to turn off optimisation for the 'delay_cycles' method. Unfortunately - I haven't found a way to do this with the avr-gcc compiler. This compiler has loads of inidividual optimisation settings that can be turned on and off individually - and the O1,O2,O3,Os settings are just short hands to turn on different groups of these settings. Although you can turn on/off individual settings inside your code there is no easy way to turn them all off. If anyone knows otherwise then I would love to know...

You could place all your time critical code into separate C files which you then always compile with optimisation turned off. The could then put placed into a library that the rest of your code could then incorporate at link time.

We can fool the compiler by using the keyword volatile (see below)

The volatile keyword indicates that a variable can be modified by another thread or under interrupts. Lets look at that. Assume your program is like this rough example:-

uint32_t long timer = 0;

void delay_ms(int ms)

{

uint32_t endtime = timer + ms;

while(timer < endtime)

{

// keep waiting

}

}

// This method is called once every ms

timer_interrupt()

{

timer++; // add 1 to timer

}

int main(void)

{

..... set up timer interrupt to happen every 1ms

..... do stuff

delay_ms(10); // wait for 10ms

..... do stuff

}

The big problem will be noticed in 'delay_ms'. This calculates the endtime correctly but then goes into the while loop. With optimisation enabled then the value of 'timer' is loaded once into register(s) and then compared against 'endtime'. But since your compiler knows that your while loop never changes the value of timer, and so its value is never re-read from the variable, then it may perform the test once but then effectively enters an infinite loop and will never return. In post-boy speak: the compiler doesn't know that it needs to keep doing a 'wash-your-hands'.

YOU know that the value of 'timer' is being changed under interrupts every 1ms, BUT the compiler doesn't know that.

So to fix the problem we just change the first line to say:-

volatile unsigned long timer = 0;

The addition of the volatile keyword tells the compiler that the variable may be being changed by something else and so it should never keep it in a register. ie if it is referenced then it should ALWAYS be reloaded from memory, and if changed then should ALWAYS be written back out to the memory variable.

But what about our 'delay_cycles' routine - that doesn't use anything that is changed under interrupts?

Well YOU know that, but the compiler doesn't. So you can fix the optimisation by adding the keyword volatile :-

Since we have told the compiler that 'cycles' is 'volatile' then it has to keep re-reading it, decrementing it, and writing it back. So the compiler cannot get rid of the code.

However: there is still one potential problem with volatile variables when they are modified within an interrupt, or in another thread (if you have a pre-emptive multi-tasking kernel).

To understand the problem lets return to our example above for 'delay_ms' where the variable 'timer' is being changed under interrupts.

The 'timer' variable is stored as a 32 bit (ie 4 byte) variable. Since we have marked it as 'volatile' then it needs to be re-read whenever it is referenced; however your microprocessor is probably unable to fetch all 4

bytes from memory in one atomic (ie un-interruptable) instruction. An 8 bit processor may only be able to load one byte at a time, a 16 bit processor may only be able to read 2 bytes at a time, etc. So reading the variable will require more than one instruction.

Lets assume that the variable currently holds the following hexadecimal four byte value (from high byte to low byte): 00, 01, FF, FF

In a non-interrupt situation then your 'delay_ms' may read in the low 2 bytes FF and FF, and then read the upper 2 bytes 00 and 01 - all is well - it has got the correct value.

But what happens if an interrupt happens half way through:-

delay_ms reads in the low 2 bytes (FF and FF)

but then there is a timer interrupt and your interrupt routine adds 1 to the timer variable. Then timer is now set to 00, 02, 00, 00

the delay_ms routine now reads the high 2 bytes (which are now 00,02)

So your delay_ms has read the variable as: 00, 02, FF, FF - which is mixture of its old value and its new value.

In interrupt driven code this is a common mistake, and in a multi-threaded environment is even more of a problem. These problems are horrendously difficult to detect and fix. The above case would only happen 1 time in 65,535 and ONLY if the interrput happens at the exact moment. So it will only happen once in a blue moon. So is it important? Well it depends on the side effects. If the code is reading a sonar to decide which way to turn and one reading in 65,535 is wrong then it may not be too important as the effect will be dampened by all the other valid readings. But what if it the outcome is to press the red button and launch the big bomb !!

The easiest way to avoid this mistake is to disable interrupts whilst the variable is being read. Since this could be a common requirement then I use a '.h' file to define a macro to do this for me:-

The CRITICAL_SECTION_START macro will remember whether interrupts are current enabled or not, and will then disable them. The CRITICAL_SECTION_END macro will return the interrupatable flag back to how it was. So now you can bracket your code with these two macros. eg

volatile uint32_t long timer = 0;

uint32_t getTimer()

{

uint32_t rtn;

CRITICAL_SECTION_START; // make sure 'timer' doesn't get changed

rtn = timer;

CRITICAL_SECTION_END; // allow 'timer' to be changed again

return rtn;

}

void delay_ms(int ms)

{

uint32_t endtime = getTimer() + ms;

while(getTimer() < endtime)

{

// keep waiting

}

}

// This method is called once every ms

timer_interrupt()

{

timer++; // add 1 to timer, We dont need CRITICAL_SECTION macro since we are in an interrupt routine and interrupts are already disabled

// We could also write

// CRITICAL_SECTION_START;

// timer++;

// CRITICAL_SECTION_END;

// but the result will be the same

}

int main(void)

{

..... set up timer interrupt to happen every 1ms

..... do stuff

delay_ms(10); // wait for 10ms

..... do stuff

}

In a true multi-tasking environment, whereby you can have loads of threads executing simultaneously, then disabling all interrupts could become too restrictive. Since the majority of users will not be in this situation then I will kind of skip it - other than to say that the use of 'semaphores' could be a solution.

The last BIG caveat with compiler optimisation settings is that they sometimes get it wrong and turn your code into nonsense. If your program stops working, or is doing something unexpected, then try compiling with all optimisation disabled. If this fixes the problem then it is normally a good idea to contact the compiler manufacturer to report a bug. Unless we all do this then the compilers will never get fixed and we will just accept that optimisation "doesn't work".

06 - Macros #define, #ifdef etc

I will keep this section short - because it is covered by most C/C++ tutorials.

#define can be used to prevent you having to write the same code over and over.

For example:-

#define max(a,b) (a>b)?a:b

in your code you can then say:-

int val = max( read_sensor_left(), read_sensor_right() );

the 'max' macro is expanded (where 'a'='read_sensor_left()' and b='read_sensor_right()' ) and is therefore just as if you had written:-

Note the side effect here:- one sensor is read once and the other is read twice.

#define can be used in conjunction with #ifdef to conditionally include or exclude code. Lets assume you are building a maze solving bot for a competition. Whilst developing the bot you may have a whole bunch of debugging info that may, say, be written out to a serial port for logging to a PC. However, for the competition, then all of this code needs to be removed. This can easily be done via the makefile where you can create '#define's that are passed to the compiler (just like F_CPU is for the processor speed). So, using my makefile from earlier, then you could change the line for CDEFS and append '-D logging=1'. In your code you can then add code all over the place to say:

#ifdef logging

add code for logging

#endif

When you are ready for the competition then remove the '-D logging=1' from the makefile and recompile. All of the logging code will have disappeared.

You can also create code that changes depending on the value of the #define. This is used a lot in the avr-lib files. For example we are already passing the processor speed, from the makefile, to the compiler. So when setting baud rates, PWM speeds etc then the processor speed is critical in knowing what our code should do. For example:-

Note the '#error' to output an error if F_CPU is not an expected value.

So this is like if..then...else...endif in C except that it is telling the compiler what code should be compiled.

The #define program is also often used to prevent errors if you include the same header (.h) file more than once. So a header file called 'test.h' would normally have all of its code surrounded by:-

#ifndef _TEST_H_

#define _TEST_H_ 1

... the contents ....

#endif

So the first time it is included then '_TEST_H_' has not been defined so: we define the variable and then all of the file contents. If it is later included again then '_TEST_H' has already been defined so the contents of the file are ignored.

07 - Reducing code with better design (struct)

The compiler can only optimise the code you have given it. If you make your code 'better' then there will be less of it in the first place.

Thats obvious.

Yes - but we aren't very good at it !!!

"Most software-beginners will write code that works on data. More advanced software folk will write data and then the associated code that operates on that data."

Hmm - that's confusing. But think of it this way: there's no point just writing code - it has to DO something else what's the point. That 'something' is normally making a servo/motor turn, read a sensor etc.

So the software-beginner may say: I need a method that sets the speed of my left-servo and another to do the same with my right-servo. I need to do this because each servo is being driven via a different I/O pin etc. This is sort of okay unless you then have 16 servos and you now have loads of code.

The more advanced author will say: I have a generic thing called a servo which has certain properties (like which I/O pin it is connected to). Given this data - I can now write a generic routine that is passed this servo data to set the speed of the servo.

So we are now starting to talk about Object Orientated Programming (OOP). Where an object is a servo, a motor, a sonar sensor, a bumper switch, an LCD etc. Write the code once and then the same code can deal with any number of the devices. Oh-no you're talking about me learning an OOP language like C++ or Java! No. You can write a lot of these sorts of things in C. Lets see how.

C has a useful thing called a 'struct' which is short for a 'structure'. It allows us to create a whole bunch of variables into one structure. We can also use the 'typedef' keyword to create a new datatype for the structure which is much easier to then use in the rest of our code. We can use this to define all of the variables that we need to record about an object. For example: lets assume we want to drive a servo object. A servo will need the following information to be stored against it:-

The I/O pin it is connected to

The current speed. Setting this will change the servo speed but also we could read it back to find the last required speed. Lets assume we can set the speed to any value between -127 and +127

Is the servo 'inverted'. ie if you have a left servo and a right servo then if speed="Full forward" for both then one will turn clockwise whilst the other will turn anti-clockwise.

Here is the C code to create a new type called 'SERVO' that contains variables to store whatever the values of the above things might be:-

Look at the last entry 'uint8_t inverted:1'. The ':1' means that it only requires one bit to store the value as it is either true or false (ie 1 or 0). The ability to set the number of bits for each variable within the structure allows us to minimise the amount of required space by combining several small variables into the same byte. Looking at my makefile you will notice an entry for '-fpack-struct' which means that structures should be packed down into the smallest possible number of bytes.

int main(void){ int8_t speed = 0; // The speed to set the servos to int8_t diff = 1; // Are we going up (1), or down (-1)

while(1){ // If we've hit the end stops then move the other way if(speed==127){ diff = -1; }else if (speed==-127){ diff = 1; }

// Increment or decrement the speed

speed += diff;

// Set the speed we want the motors to go at left.speed = speed; right.speed = speed;

// Send commands to the servos to set their current speed servo_drive(&left); servo_drive(&right);

// loop pause delay_cycles(200);

}}

Notice that instead of having seperate methods to set the speed of each servo we now have one generic routine called 'servo_drive'. We pass the data about a specific servo to that method so that it can use that data to drive that servo. So if we added another 30 servos then we wouldn't need any more code. We would just need to declare them at the top of the file to say what port they were connected to.

Of course the big beneift is that if you to choose to replace your 30 servo with 30 DC motors then you've only got one routine to re-write not 30.

Also notice that structures can contain other structures. So we could have a generic structure for an IOPin and this can then be embedded in other structures such as our SERVO. Here's an example that does with the previous code:-

08 - H files versus C files

So we seem to be able to place code into both .h and .c files - what's the difference.

In the 'good old days' - the difference was that anything in the .h files was placed 'inline' whereas anything in your .c file was only compiled once. An 'inline' is where the code is substituted into your main code every time you use it. So lets assume you have a header file that defines:-

int increment(int v){

return v=v+1;

}

Then in your main code you can do:

int x=0;

int y=0;

x = increment(x);

y = increment(y);

The compiler would then substitute the 'increment' code as if you had written:-

int x=0;

int y=0;

x = x+1;

y= y+1;

This 'inlining' is fine if the code is small - but what if its 100s of lines long? Then your program may grow by several hundred bytes every time you used it. So then you would place the code into a .c file and compile it into a library. The header file would then change to say:

extern int increment(int v);

to indicate that there is a method called 'increment' which accepts an integer and returns an integer result. This would mean that the code for the body of the function was only included one and any code that referenced it would call this code.

This meant that if you wanted to share your code with a 3rd party (but not let them change it) you could just give them the .h files (which contain some very simple code) and the pre-compiled libraries (with all the complicated stuff already compiled and cannot be changed).

Nowadays, and with avr-gcc compiler, it would appear that this is no longer the case. If you define a method in a .h file then it is normally only compiled once and any references to it end up calling it. So the difference between .h and .c is small. Even if you define a method in a .h file that is never called then it still gets compiled. Equally: if you compile a .c file into a library and the rest of the code only accesses one of the methods in that file then the entire compiled .c file will be added to your program.

So the differences between .h and .c are narrowing.

BUT - here is a big difference. If you have code in a .c file that is compiled and placed into a library then it is compiled using the makefile settings at that time. So if you compile the file using a makefile saying that its for a 1MHz ATMega8 with No optimisation then that is how it is stored into the library. If you then write another program for an 8MHz ATMega168 with optimisation for speed which then links in code from the library then the library code will still be 1MHz ATMega8 with No optimisation.So if you keep changing platforms then you will need to keep rebuilding the libraries.

However: a .h file is always interpreted at compile time. So anything that refers to variables such as F_CPU (the processor clock speed) should be placed into a .h file so that the code picks up the correct value every time.

If you look at the avr-lib then you will see that they don't really offer much in the way of a pre-compiled library. You are encouraged to re-compile their .h and .c files within your own project - this is to make sure that it picks up the latest settings from your makefile. I think that this is a common approach with microcontrollers where there are so many different makes/speeds etc that it is difficult to supply any pre-compiled library that works in every situation.

If you examine my 'makefile' then you will see that there is an option to specify where your avr-lib is installed. This makes the compiler search this folder for any files as well as your main code directory. So you don't have to cut and paste the file from avr-lib into your own project.

09 - Reducing SRAM requirements

As we have seen previously: a microcontroller normally has a lot
more Flash (read only memory) than it has SRAM (read write memory). So
if our program has a lot of global variables, lookup tables or text
strings then we can quickly use up all the SRAM or, at least, leave so
little that there is no space for the stack.

In this section we will look at some different methods of reducing the amount of SRAM our program requires at runtime.

Use local variables

Global variables are the variables you normally define at the start of your source code - ie they are declared outside of any method. These variables are global beause they are accessible by all code. The downside is that they take up an amount of space for the duration of the whole program. But a local variable is one that is defined inside a method. For example:

void doSomething(){

int counter;

for( counter=0; counter<10; counter++){

...

}

}

The variable called 'counter' is local because it is only visible to its containing method called 'doSomething'. Other methods may declare their own variables called 'counter' which are distinct from this one. The amount of space required this variable need only be allocated for the duration of the 'doSomething' method. Once the method has finished the space can be reclaimed so that it can be re-used for something else. This is normally done by moving the current stack pointer down by the required number of bytes so that the variable is stored within the stack. Once the mthod terminates: the stack pointer is restored to its previous position so that the space is 'popped' back off the stack. So make good use of local variables and only use global variables for variables that live for the duration of the program.

Passing arguments by reference

When passing some data to a method we have two ways of doing it: by value or by reference.

Here is an example of passing by value:-

void doSomething(int value){

value = value + 1;

printf("The value is %d",value);

}

void main(){

int value = 0;

doSomething(value);

printf("The final value is %d",value);

}

This will print the following:

The value is 1

The final value is 0

Is this what you expected?

The reason is that the main routine creates the variable with a value of 0. But when it calls 'doSomething' it passes the variable by value. This is the equivalent of the doSomething method defining its own local variable called 'value' and initialising it with the passed value. So the 'doSomething' method is working with its own instance of the variable so that when it returns to the calling method (main) then its variable is left unchanged and so will still be 0.

Compare this with passing by reference:-

void doSomething(int* value){

*value = *value + 1;

printf("The value is %d",*value);

}

void main(){

int value = 0;

doSomething(&value);

printf("The final value is %d",value);

}

This will print the following:

The value is 1

The final value is 1

The difference is that the doSomething method has an asterisk before the name of the variable and in the main method we prefix the variable name with an ampersand. This means that rather than being passed a copy of the variable it will instead be passed the address of the existing variable from main. When it adds one to the variable then its is actually changing the original variable so, on return to main, then the variable has been changed.

Why will that make a difference to the space requirements? Well in this simple case it wont make much difference at all. However: don't forget that you can use structures (discussed earlier) to create much more complex (and much larger) variables. Let's assume that we have used 'typedef' to define a structure to store all the relevant data we need about a servo (ie port, pin, current speed etc). Let's also assume that we have a method called setServoSpeed that we can pass one of these structure to in order to set the current speed for the servo. Without worrying about how these methods communicate with the servo, or how the servo structures are initialised, then our code may look like this if we are passing by value:-

SERVO left; // global variable for left servo

SERVO right; // global variable for right servo

SERVO setServoSpeed(SERVO servo, int speed){

servo->speed = speed;

....

return servo;

}

void main(){

while(1){

left = setServoSpeed(left,100);

right = setServoSpeed(right,100);

}

Since we are passing by value then when we call the 'setServoSpeed' method then a new copy of the SERVO data will be created on the stack. Hence we make 'setServoSpeed' return this copy so that the caller, main, can then store it back into the left or right variable. This is messy for a number of reasons.

1. When we call the 'setServoSpeed' method we must remember to assign the result back into the correct variable otherwise the results of the method will get lost.

2. When we call the 'setServoSpeed' method the compiler will add in some extra code for us so that it can copy the contents of the servo variable into the working copy that 'setServoSpeed' uses and also to copy the result back into the global variable on return. This means that the code is bigger (as it has extra code to perform the copying) and will also run slower as it is constantly copying all this data backwards and forwards.

3. But here is the consideration as far as space is concerned..... Since the 'setServoSpeed' method has its own copy of the servo data then, whilst 'setServoSpeed' is being executed our SRAM will two copies of the servo data: one in the global variable and another on the stack. So if the servo structure is quite big then this could cause the stack to overflow and start zapping some of our data segment.

4. Another consideration to watch out for is if any of the data in the servo structure is modified under interrupts. You may, for example, be using an encoder which is firing interrupts to update a 'current position' in the global variable. When 'setServoSpeed' returns and its copy of the servo data is written back over the top of the global variable then these sorts of variables could rewind back to their value at the time that 'setServoSpeed' was called!!

So with large data structures that are passed as parameters it is much better to pass them by reference. This means that the only extra info on the stack is the pointer to the original servo data and so only requires two bytes (for a 16 bit processor) irrespective of how big the servo data actually is. It will also mean that the compiler doesn't need all the extra code to copy the data backwards and forwards since 'setServoSpeed' will be working directly on the data in the global variable. Last, and not least, the 'setServoSpeed' method doesn't need to return the new data so that it can be reassigned to the global variable.

So here is how we would do the same thing by reference:

SERVO left; // global variable for left servo

SERVO right; // global variable for right servo

void setServoSpeed(SERVO* servo, int speed){

servo->speed = speed;

....

}

void main(){

while(1){

setServoSpeed(&left,100);

setServoSpeed(&right,100);

}

Data that never changes

It is quite common for a program to contain data that never changes. For example:-

A lookup table to convert an angle into its sin or cosin

A lookup table to convert a number from 0 to 15 into its hexadecimal character '0' to 'F'

String constants used in printf statements for logging values back to the PC or to an LCD display

By default the compiler will always put code into the Code Segment and everything else into the Data Segment. So all of the above would be placed into the Data Segment and hence take up valuable SRAM at runtime. But hold on - this data is different to other variables in that it will never change - it is constant. So it would be great if we could keep it in Flash memory instead as we have a lot more of that. We can achieve this by doing two things:-

When we define the table/string/etc we tell the compiler that it should be stored in the Code Segment instead of the Data Segment

When we access the table/string/etc we remind the compiler that it is stored in the Code Segment instead of the Data Segment

Failure to do both of these things will cause unexpected outcomes.

Let's take an example of a method that converts a number to its hexadecimal character - note that these examples use commands defined within AVRlib and require you to #include <avr/pgmspace.h> :-

Note we have added PROGMEM to the declaration and used the 'pgm_read_byte' function when it is accessed. If your table held 16 bit values then you would use 'pgm_read_word' instead.

The size of your HEX file will be unchanged as the 16 bytes have been taken out of the Data Segment and placed in the Code Segment instead but it will now require ZERO bytes of SRAM at runtime.

Here is another example that uses the AVRlib rprintf routines to output logging info. This demonstrates a common example whereby a method (in this case 'rprintf') has a matching method which works with data from program memory (in this case 'rprintfProgStr'). If our original code looked like this:-

void doSomething(int value){

...

rprintf("Value = %d
", value);

...

}

The we could re-write this so that the string "Value = %d
" is stored in program memory as follows:-

void doSomething(int value){

...

rprintfProgStr( PSTR("Value = %d
"), value);

...

}

Here we have used the PSTR() macro when the string is declared so that the compiler knows that it should be stored in program memory, and we have used the rprintfProgStr method rather than the rprintf method since rprintfProgStr expects the format string to be held in program memory.

Using EEPROM memory

This memory is used to hold the contents of variables that should retain their values when the power is discontinued. Think of it as battery back up SRAM. You could use it to hold some user preference such as, say, the current volume control setting for a sound output. Although they are useful for this purpose they don't really help with saving runtime memory as the number of variables that make sense to store in this way are normally limited. Admittedly if your robot performed an element of 'learning' then you may want it to remember what it has learned between reboots (for example: remember the map it has currently built up of the room it has explored). However: this kind of data is generally quite large and there is normally a lot less EEPROM memory than SRAM so it would be more practical to interface to something like an SD card to save this kind of information.