Most embedded applications are initially
developed in a prototype environment with resources that differ from those
available in the final product. As such, it is important to consider the
processes that enable your application to run on your target hardware.

The aim of this application note is to examine
the processes involved in moving an embedded application from one that relies
on the facilities of the development/debugging environment to a system that
runs standalone on target hardware. In particular, this document illustrates
some of the features of the ARM Developer Suite (ADS) v1.2, and suggests how
they might be effectively used in this regard.

With
ADS, several issues must be considered to move from an “out-of-the-box” build
to a standalone embedded application:

·C
library use of hardware

·Some C
library functionality executes by using debug environment resources. If used,
this functionality must be re-implemented to make use of target hardware.

·ADS
has no inherent knowledge of the memory map of any given target. The image
memory map must be tailored to the memory layout of the target hardware.

·An embedded application must perform some
initialization before the main application can be run. A complete
initialization sequence requires user-implemented code as well as ADS C library
initialization routines.

This application note addresses each of the
above issues. Also, some further considerations regarding the image memory map
are highlighted.

To illustrate the topics covered in this
application note, associated example projects are provided.

The Dhrystone benchmarking program provides the
code base for the example projects. Dhrystone was chosen because it provides a
simple, but non-trivial, main application that illustrates the topics described
in this application note.

Included in the example are several
directories, each containing a distinct build of Dhrystone. Each build
provides an example of the techniques discussed in each successive chapter of
the document. Specific information regarding each build can be found in the
sections labeled Example Code in this document.

The example projects are tailored to run on the
ARM Integrator development platform. However, the principles illustrated by
the examples are applicable to any target hardware.

NoteThe focus of this application note is not
the Dhrystone program itself, but the steps that must be taken to enable it on
a fully standalone system. For further discussion of Dhrystone as a
benchmarking tool, see Application Note 93 – Benchmarking with ARMulator.

When beginning to develop software for an
embedded application, a user of ADS might not have technical specifications of
their target hardware. Details of target peripheral devices, the memory map,
or perhaps even the processor itself might be unknown or undecided.

To enable software development before such
details are known or considered, the ADS tools have a default behavior that
enables the user to start building and debugging application code immediately.
It is useful to recognize this default behavior, in order to appreciate the
steps necessary to move from a default build to a fully standalone
application.

2.1.1Semihosting

In the ADS C Library, support for some ANSI C
functionality is provided by the host debugging environment. The mechanism by
which this is provided is termed semihosting.

Semihosting is implemented by a set of defined
software interrupt (SWI) operations. When a semihosting SWI is executed, the
debug agent identifies it and briefly suspends program execution. The
semihosting operation is then serviced by the debug agent before code execution
is resumed. Therefore, the task performed by the host itself is transparent to
the program.

Figure 2‑1 shows an
example of semihosting operation, which prints a string to the debugger
console.

NoteFor more information on semihosting, see the
ADS Debug Target Guide Section 5.

2.1.2C Library
Structure

Conceptually, the
C library can be divided into functions that are part of the ANSI C Language
specification, and functions that provide support to this ANSI C level. This
is illustrated in Figure 2‑2.

Support for some
ANSI C functionality is provided by the host debugging environment via a device
driver level of support functions.

For example, the
ADS C library implements the ANSI C printf()family of functions by writing to the debugger
console window. This functionality is provided by calling __sys_write(
), a support function that
executes a semihosting SWI which results in a string being written to the
console.

Within input
sections, code and data from individual objects are placed in the order the
object files are specified on the linker command line.

The user is not
advised to rely on these rules if precise placement of code and data is
required. Full control of placement of code and data is available via the
scatterloading mechanism (discussed in Section 4.1).

NoteSee section
3.2 of the ADS Linker and Utilities Guide for more information on placement
rules.

At a
high level, the initialization sequence can be divided into two functional
blocks. __main is responsible for setting the run-time image memory
map, whereas __rt_entry is responsible for initializing the
C library.

__main carries out code and data copying, and zeroing of ZI
data. This step is only significant when the run-time location of code and
data differs from that at load time (see section 4.1).

__main branches to __rt_entry (run-time entry). This
sets up the application stack and heap initializes library functions and static
data, and calls any constructors of globally declared objects (C++ only). __rt_entry then branches to main(), the entry to the user’s
application. When the main application has finished executing, __rt_entry hands control back to the
debugger.

The
function label main() has a special significance in ADS.
The presence of a main() function forces the linker to link
in the initialization code in __main and __rt_entry. Without a function
labeled main() the initialization sequence is not linked in, and as a
result, some standard C library functionality is not supported.

You can provide
your own implementation of C Library functions that make use of target
hardware, which are automatically linked in to your image in favor of the C
library implementations. This process, known as retargeting the C library, is
shown in Figure 3‑1.

For example, you
might have a peripheral I/O device such as a UART, in which case you want to
override the library implementation of fputc(), which writes to the debugger
console, with one that outputs to the UART. Since this implementation of fputc() is linked in to the final image, the
entire printf() family of
functions print out to the UART.

This example
simply redirects the input character parameter of fputc() to a serial output function sendchar(), which is assumed to be implemented in
a separate source file. In this way, fputc() acts as an abstraction layer between target
dependent output and the C library standard output functions.

In a standalone
application, you cannot support semihosting SWI operations. Therefore you must
be certain that no C library semihosting functions are being linked into your
application.

To ensure that no
functions which use semihosting SWIs are linked in from the C library, you must
import the symbol __use_no_semihosting_swi. This can be done in any C or assembler source
file in your project.

·In a C module, use
the #pragma directive.

#pragma
import(__use_no_semihosting_swi)

·In an assemble module, use the IMPORT directive.

IMPORT
__use_no_semihosting_swi

If there are
still SWI-using functions being linked in, the linker reports the following
error:

Error: Symbol __semihosting_swi_guard multiply defined

To identify which
SWI-using functions are still being linked in, link with the -verbose switch. In the resulting output, C
library SWI-using functions are tagged with __I_use_semihosting_swi.

It is important
to note that the linker does not report any semihosting SWI-using functions in
the user’s own application code. An error only occurs if a semihosting
SWI-using function is linked in from the C library.

Build 2 of the example uses the Integrator
platform’s hardware for clocking and string I/O.

The following changes were made to Build 1 of
the example project:

·C Library Retargeting

A retargeted layer
of ANSI C functions has been added. These include standard input/output
functionality, clock functionality, as well as some additional error signaling
and program exit.

·Target Dependent Device Driver

A device driver
layer that interacts directly with target hardware peripherals has been added.

To run this build on an Integrator:

·ROM/RAM remapping must have been performed.
This can easily be achieved by running the Boot Monitor (Switches 1 & 4
on).

·Set $top_of_memory to 0x40000, or fit a DIMM
memory module. If this is not done, the stack (which defaults to 0x80000) may
not be in valid memory.

NoteThe symbol __use_no_semihosting_swi is not imported into this project.
This is because a semihosting-SWI is executed during C library initialization
to set up the application stack and heap location. Retargeting stack and heap
setup is covered in detail in section 4.2.

NoteTo see the
output, a terminal or terminal emulator (such as Hyperterminal) must be
connected to serial port A. The serial port settings should be set to 38400
baud, no parity, 1 stop bit and no flow control. The terminal should be
configured to append line feeds to incoming line ends, and echo typed
characters locally.

In a real
embedded system, you almost certainly do not want to adhere to the default
memory map provided by ADS. Your target hardware usually has several memory
devices located at different address ranges. To make the best use of these
devices, you will want to have separate views of memory at load and run-time.

Scatterloading
enables the user to describe the load-time and run-time location of code and
data in memory in a textual description file known as a scatter file. The
scatter file is passed to the linker on the command line using the –scatter switch. For example:

armlink –scatter scat.scf file1.o file2.o

The scatter file
describes to the linker the desired location of code and data at both load-time
and run-time, in terms of addressed memory regions. Scatterloading regions
fall into two categories:

4.1.1Scatter File
Syntax

A region is
defined by a header tag that contains, as a minimum, a name for the region and
a start address. Optionally, a maximum length and various attributes can be
added. Open and closed curly braces delimit the contents of a region.

The
contents of the region depend on the type of region.

·Load
regions must contain at least one execution region. In practice, there are
usually several execution regions per load region.

·Execution
regions must contain at least one code or data section. These are usually
source or library object files. The wildcard (*) syntax can be used to group
all sections of a given attribute not specified elsewhere in the scatter file.

NoteFor a more
detailed description of scatter file syntax, see Chapter 5 of the ADS Linker
and Utilities Guide.

4.1.2Simple
Scatterloading Example

This
example has one load region containing all code and data, starting at address
zero. From this load region we create two execution regions. One contains all
RO code and data, which executes at the same address at which it is loaded. We
also have an execution region at address 0x10000, which contains all of our RW
and ZI data.

Below is the
scatter description file that describes the above memory map.

4.1.3Placing Objects in
a Scatter File

In
most images, you will want to control the placement of specific code and data
sections, rather than grouping all attributes together as in the previous
example. This can be done by specifying individual objects directly in the
scatter file, rather than relying only on the wildcard syntax.

NoteThe ordering
of objects within a scatter file execution region does not affect their
ordering in the output image. The linker placement rules described in Section
2.3 apply to each execution region.

To override the
standard linker placement rules, we can use the +FIRST and +LAST scatterloading
directives. A typical example is placing the vector table at the beginning of
an execution region:

In this scatter
file, we ensure that the area Vect in vectors.o is placed at address 0x0000.

4.1.4Root Regions

A
root region is an execution region whose load address is equal to its execution
address. Each scatter file must have at least one root region.

One
restriction placed on scatterloading is that the code and data responsible for
creating execution regions (ie: copying and zeroing code and data) cannot
itself be copied to another location. As a result, the following sections must
be included in a root region:

·__main.o - contains the code that copies code/data

·Region$$TableandZISection$$Table - sections which contain the addresses
of the code/data to be copied.

Because the above
sections are attributed as read-only, they are grouped by the * (+RO) wildcard syntax. As a result, if * (+RO) is specified in a non-root region, the
above must be explicitly declared in a root region.

Scatterloading
provides a method for specifying the placement of code and statically allocated
data in your image. We now look at how to place the application stack and
heap.

4.2.1Retargeting
__user_initial_stackheap( )

The application
stack and heap are setup during C library initialization. We are able to
tailor stack and heap placement by retargeting the routine responsible for
stack and heap setup. In the ADS C library, this routine is __user_initial_stackheap().

The diagram below
shows the C library initialization process with a retargeted __user_initial_stackheap().

Figure 4‑3 Retargeting __user_initial_stackheap()

__user_initial_stackheap can be coded in C or ARM assembler. It must return the following
parameters:

·heap base in r0

·stack base in r1

·heap limit in r2
(if required)

·stack limit in r3
(if required)

You must
re-implement __user_initial_stackheap if you are scatterloading your image.
Otherwise, the linker will generate the following error:

NoteIn ADS v1.1, no error message is generated.
Instead, the heap base is located (often inappropriately) at address 0x0000.

4.2.2Run-time Memory
Models

ADS provides two
possible run-time memory models. In the default model, the application stack
and heap grow towards each other in the same region of memory. This is called
the one-region model. In this case, the heap is checked against the value of
the stack pointer when new heap space is allocated (that is, when malloc()is called).

On the other
hand, your system design might require the stack and heap to be placed in
separate regions of memory. For instance you might have a small block of fast
RAM in which you want to reserve for stack use only. To inform ADS that you
wish to use a two-region model, you must import the symbol use_two_region_memory. The heap is then checked against a
dedicated heap limit, which is set up by __user_initial_stackheap.

In both run-time
memory models, the stack grows unchecked by default. You can optionally enable
software stack checking in your image by compiling all modules with the
compiler switch -apcs /swst. If you are using a two-region model, you must also specify a stack
limit in your implementation of __user_initial_stackheap.

NoteEnabling
software stack checking introduces a substantial code size and performance
overhead, since the value of the stack pointer must be checked against the
stack limit with each function call.

4.2.3Example
Implementations

One-region model

Figure 4‑4 One Region Model

The above example
of __user_initial_stackheap implements a simple one-region model, where the stack grows down from
address 0x40000, and the heap grows up from 0x20000. The routine simply loads
the appropriate values into the registers r0 and r1, and then returns. r2 and
r3 remain unchanged, because a heap limit and stack limit are not used in a
one-region model.

Two-region model

Figure 4‑5 Two Region Model

The above example
implements a two-region model. The stack grows down from 0x40000 towards a
limit of 0x20000. To make use of this stack limit, all modules using this
implementation must be compiled for software stack checking. The heap grows up
from 0x28000000 to 0x28080000.

Note__use_two_region_memory is imported using the assembler IMPORT directive.

Build 3 of the
example implements scatterloading, and contains a retargeted __user_initial_stackheap.

The following
modifications were made to build 2 of the example project:

·Scatterloading

A simple scatter description file is passed to the linker.

·Retargeted __user_initial_stackheap

You have the option of selecting either a one-region or a two-region
implementation. The default build is one-region. The two-region
implementation can be selected by defining two_region at the build step.

·Avoiding C library
Semihosting

To run this build on an Integrator, ROM/RAM
remapping must have been performed. This can easily be achieved by running the
Boot Monitor (Switches 1 & 4 on).

The symbol __use_no_semihosting_swi is imported into this build, because
there are no longer any C library semihosting functions present in the image.

NoteIn order to avoid using semihosting for
clock(), this is retargeted to read the Real Time Clock (RTC) on the Integrator
AP. This has a resolution of one second, so the results from Dhrystone will
not be precise. This mechanism is improved in Build 4.

NoteIt is important to disable all ‘Vector
Catch’ and semihosting if you are using an ARM7 core based target. Otherwise
the debugger will interpret the execution of instruction between address 0x0
and 0x1C as exceptions, and report this in a dialogue box. This can be set via
the ‘Options -> Configure Processor’ menu.

Until now, we
have assumed that execution begins at __main, the entry point to the C library initialization
routine. In fact, any real embedded application performs some system-level
initialization at startup. This section discusses this in more detail.

To Figure 5‑1 we
have added a reset handler, which executes immediately at system startup. We
also have a block of code labeled $Sub$$main(), which executes immediately before entering the
main application.

The reset handler
is a short module coded in assembler that is executed on system reset. As a
minimum, your reset handler initializes stack pointers for the modes that your
application is running in. For cores with local memory systems, (that is,
cache and/or tightly coupled memory), some configuration must be done at this
stage in the initialization process. After executing, the reset handler
typically branches to __main to begin the C library initialization sequence.

There are some
components of system initialization, for example the enabling of interrupts,
which are generally performed after the C library initialization code has
finished executing. The block of code labeled $Sub$$main() performs such tasks immediately
before the main application begins executing.

Section 5.2
describes the various components of the initialization sequence in more
detail.

The above code
imports the various exception handlers, presumably coded in other modules. The
table itself is simply a list of branch instructions to the various exception
handlers.

The FIQ handler
is placed at address 0x1C directly. In this way, we avoid having to execute a branch
to the FIQ handler, so optimizing FIQ response time.

NoteThe vector
table is marked with the label ENTRY. This effectively tells ADS that this
code is a possible entry point, and so it cannot be removed from the image at
link time. You must select one of the possible image entry points as the true
entry point to your application using the –entry linker option. See the ADS
Linker and Utilities Guide section 3.1.4 for more information.

5.3.1ROM/RAM Remapping

NoteWe assume in
this section that the ARM core begins fetching instructions at 0x0000, which is
the norm for ARM core based systems. Some ARM cores can be configured to begin
fetching instructions from 0xFFFF0000.

One important
consideration to make is what sort of memory your system has at 0x0000, the
address of the first instruction executed.

Clearly, you
require a valid instruction at 0x0000 at startup, so you have to have
non-volatile memory located at 0x0000 at the moment of reset.

A simple way to
achieve this is to have ROM located at 0x0000. However, there are some
drawbacks to this configuration. Access speeds to ROM are generally slower
than to RAM, and your system might suffer if there is too great a performance
penalty when branching to exception handlers. Also, locating the vector table
in ROM does not enable you to modify it at run time.

Figure 5‑2 ROM/RAM Remap

Another possible
solution is shown above. ROM is located at address 0x10000, but this memory is
aliased to zero by the memory controller at reset. Following reset, code in
the reset handler branches to the real address of ROM. The memory controller
then removes the aliased ROM, so that RAM is shown at address 0x0000. In __main, the vector table is copied into RAM
at 0x0000, so that exceptions can be serviced.

; RAM is now at 0x0.
; The exception vectors must be copied from ROM to RAM (in __main)

; Reset_Handler follows on from here

The above code
shows how you might implement ROM/RAM remapping in an ARM assembler module.
The constants shown here are specific to the Integrator platform, but the same
method is applicable to any platform that implements ROM/RAM remapping in a
similar way.

The first
instruction is a jump from aliased ROM to real ROM. This can be done because
the label instruct_2 is
located at the real ROM address.

After this step,
the alias of ROM is removed by flipping the remap bit of the Integrator Core
Module control register.

The above code is
normally executed immediately after system reset. Remapping must be completed
before C library initialization code can be executed.

NoteIn systems
with memory management units (MMUs), remapping can be implemented through MMU
configuration at system startup.

5.3.2Local Memory Setup Considerations

Many
ARM processor cores have on-chip memory systems, such as caches, tightly
coupled memories (TCMs), memory management units (MMUs) and memory protection
units (MPUs). Such devices are normally setup and enabled during system
startup. As such the initialization sequence of cores with local memory
systems requires special consideration.

As we
have seen, C library initialization code in __main is responsible for
setting up the execution time memory map of the image. Therefore, the
processor core’s run-time memory view must be set up before branching to __main. Essentially, this means
that any MMU or MPU must be set up and enabled in the reset handler.

TCMs
must also be enabled before branching to __main (normally before MMU/MPU
setup), because you generally want to scatterload code and data into TCMs. As
a side issue, you must be careful that you do not have to access memory that is
masked by the TCMs when they are enabled.

One
final issue to note is that you run the risk of cache coherency issues if
caches are enabled before branching to __main. Code in __main copies code regions from
their load address to their execution address – essentially treating
instructions as data. As a result, some instructions can be cached in the
data cache, in which case they are not visible to the instruction path.

You
can avoid such coherency issues easily if you simply enable caches after the C
library initialization sequence finishes executing.

5.3.3Scatterloading and
Memory Setup

In a system where the reset-time memory
view of the core is altered, either through ROM/RAM remapping or MMU
configuration, the scatterloading description file must describe the image
memory map after remapping has taken place.

In the above
example, the stacks are located at stack_base. This symbol can be a hard coded address, or it
can be defined in a separate assembler source file and located by a scatter
file. Details of how this is done are given in section 6.2.

The example
allocates 256 bytes of stack for FIQ and IRQ mode, and you can do the same for
any other execution mode. To set up the stack pointers, simply enter each mode
(interrupts disabled) and assign the appropriate value to the stack pointer.
If we are to make use of software stack checking, you to have set up a stack
limit here as well.

Stack pointer and/or stack limit values set up
in the reset handler are automatically passed as parameters to __user_initial_stackheap by C library initialization code. These values therefore must not
be modified by __user_initial_stackheap.

The following implementation of __user_initial_stackheap can be used with the stack pointer setup routine above. IMPORT heap_base
EXPORT __user_initial_stackheap

In
general, it is beneficial to separate all system initialization code from the
main application. However, some components of system initialization, for
example enabling of caches and interrupts, must occur after executing C library
initialization code.

We can make use
of the $Sub and $Super function wrapper symbols to effectively insert a routine
that is executed immediately before entering the main application.
Essentially, this mechanism enables us to extend functions without altering the
source code itself.

Above is an
example of how $Sub and $Super can be used in this way. The linker replaces
the function call to main() with a call to $Sub$$main(). From there we can call a routine that enables caches, and another to
enable interrupts.

The
code branches to the real main() by calling $Super$$main().

NoteMore
information on $Sub and $Super can be found in the ADS 1.2 Linker and Utilities
Guide.

It is
important to consider what mode the main application will run in. Your choice
affects how you implement system initialization.

A lot
of the functionality that you are likely to implement at startup (both in the
reset handler and $Sub$$main) can only be done while executing in
privileged modes. For example, cache/MMU/MPU/TCM manipulation, and enabling
interrupts.

If
you wish to run your application in a privileged mode (for example,
Supervisor), this is not an issue. Simply be sure to change to the appropriate
mode before exiting your reset handler.

If
you wish to run your application in User mode, you can only change to User mode
after completing the necessary tasks in a privileged mode. The most likely
place to do this is in Sub$$main().

An
important issue to note is that __user_initial_stackheap must set up the
application mode stack. Because of this, you must exit your reset handler in
system mode (which uses the User mode registers). __user_initial_stackheap then executes in system
mode, and so the application stack and heap are still set up when User mode is
entered.

Build
4 of the example can be run standalone on the Integrator platform.

The
following modifications were made to build 3 of the example project:

·Vector
Table

A vector table was added to the project, and placed by
the scatter file.

·Reset
Handler

The reset handler is added in init.s. Two separate
modules, responsible for TCM and MMU setup respectively, are included in the
ARM926EJ-S build. These are excluded from the ARM7TDMI build, which will run
on Integrator systems with any core. ROM/RAM remapping occurs immediately
after reset.

·$Sub$$main(
)

For the ARM926EJ-S build, Caches are enabled in
$Sub$$main( ) before entering the main application.

·Embedded Scatter File

An embedded scatter
file is used, which reflects the post-remapping view of memory.

The batch files for both of these builds
produce a binary file suitable for downloading into the Integrator AP
application Flash at address 0x24000000. This can be achieved via the ‘File
-> Flash Download menu’. A separate application note is available which
describes this process.

A precise timer is implemented using a timer on
the AP motherboard. This generates an IRQ, and a handler is installed which
increments a counter every 1/100 second.

So far, we have described the placement of code
and data in a scatter file, but the location of target hardware peripherals and
the stack and heap limits are assumed to be hard coded in source or header
files. It would be beneficial to locate all information pertaining to the
target’s memory map in our scatter file so removing all references to absolute
addresses from our source code.

6.1.1Locating Target Peripherals in the Scatter File

Conventionally,
addresses of peripheral registers are hard-coded in project source or header
files. One can also declare structures that map on to peripheral registers,
and place these structures in the scatter file.

For example, a
target could have a timer peripheral with two memory mapped 32-bit registers.
Below is a C structure that maps on to these registers.

The above
scatter file locates the timer_regs structure at 0x40000000.

It is important
that the contents of these registers are not initialized to zero during
application startup, because this is likely to alter the state of your system.
Marking an execution region with the UNINIT attribute prevents ZI data in that region from
being zero-initialized.

Section 5.4 refers to the symbols stack_base
and heap_base as reference symbols that can be placed in a scatter file. To do
this, create symbols labeled stack_base and heap_base in an assembler module (the same can be done for the stack and heap
limits in a two-region memory model).

AREA stacks, DATA, NOINIT
EXPORT stack_base

stack_base SPACE 1

AREA heap, DATA, NOINIT
EXPORT heap_base

heap_base SPACE 1

END

These symbols can be located each in their own
execution region in the scatter file.

The
above scatter description file places the heap base at 0x20000 and the stack
base at 0x40000. The stack and heap base locations can now be easily altered
by editing the addresses of the respective execution regions.

With ADS v1.2, this
method requires that the stack and heap size are specified in an object file.
To some extent this negates the advantages described in section 6.2. However this is the easiest method to
migrate to the recommended method for future ARM tools, where additional
features will avoid this requirement.

First, define areas of an appropriate size
for the stack and heap in an assembler source file, for example, stackheap.s.
The space directive can be used to reserve a zeroed block of memory. Setting
the ‘NOINIT’ area attribute prevents this zeroing (during development, you
might choose to zero-initialize the stack so that the maximum stack usage can
be seen). Note that labels are not required in this source file.

AREA
stack, DATA, NOINIT
SPACE 0x3000 ; Reserve stack space

AREA heap, DATA, NOINIT
SPACE 0x3000 ; Reserve heap space

END

These sections can
then be placed in their own execution region in the
scatter file.

The
linker generates symbols that point to the base and limit of each execution
region, which can be imported into the retargeting code to be used by __user_initial_stackheap. This code can be made more readable by
using the DCD
directive to give these values more meaningful names:

The above files are suitable to place the heap base at
0x15000 and the stack base at 0x40000. The stack and heap base locations can
now be easily altered by editing the addresses of the respective execution
regions.