Update: 01 Aug 06:
See the folks at Qualcomm just released a brand-new super-duper
elf2mod utility that looks to be the future. Unfortunately,
I don't have time to investigate this right now, but I would like
to think this web-site shamed them into doing something! (Along with
the need to support BREW 4, control their executable format, etc.)

Anyway, I will leave this info here for a while, but remember, what is described
here only works with the OLD BREWelf2mod.

Update: 29 Apr 07:
Hey, global variables are nice, but global static C++ objects are nicer. Richard
Willis over at Openwave Systems was generous enough to contribute a short C++
module and modified armelf.brew file that arranges to get all the constructors
of your global objects called before your GCC compiled BREW applet starts running
(and the destructors fired on the way out).
The three files (and a zip of the three files) are here. The comments in the
CPP file have a nice description of how it all works. Many thanks to Richard
and Openwave.

Update: 11 Feb 2010: I am working with BREW 4 using RVCT 3, and
have just put up an article about how to fix an armlink error about a missing
main() function (along with other RVCT notes).

This post is a HOWTO of information describing how I was able to build
(several) working brew MOD files using the WinARM 4.1.0 toolchain. It
provides a step-by-step guide, a custom BREW-specific linker script
for the GNU ld linker, and other tools.

I hope this information saves you headaches, but unfortunately I can't
support you and the standard disclaimers apply. I've tried my best to
be accurate, but a lot of this area is poorly or un-documented, so
sometimes I guess and sometimes guess wrong (or only half-right).

Caveat developer!

We do C++ work in the native Win32 environment, so my steps are skewed
to that environment. I have compiled a project of over 400 source
files resulting in a MOD of over 1 MB.

So, onward, through the fog...

WHY WOULD YOU WANT TO DO THIS?

The gnude toolchain most folks seem to be using (as BREWelf2mod was
originally developed against it) is now 3 years old, and its ARM code
generation is, by today's standards, poor. This (and the way BREWelf2mod
packages up the output MOD file) means that GNU mod files have gotten
a reputation "for being at least 20% larger than ADS!" Yeah, but this
situation changes if you use a recent compiler, plus:

You might get a smaller mod file than with ADS. At least one person has.
GNU is neck-and-neck with ADS now.

You can take advantage of the LightBlue project's clever work that maps
BREW interface structures directly to C++ vtables. (http://lightblue.tigris.org/)

You get more efficient vtable dispatching

You can have tables of function pointers, and tables of string pointers.

It is free.

At the end of this post I'll review how module loading works and the
current state of BREW C++ toolchains (as I understand it).

WHAT DOES NOT WORK?

When I originally wrote this, THUMB and -O2 and -Os optimization did
not work. These problems now seem to be solved. So now the only things
that don't work are the usual suspects: RTTI, GNU exceptions, maybe
floating point depending on your set up. Although I've heard talk
some folks are using exceptions -- and have modified the linker
script available here to support it...but I don't know the details.

SIZE BENCHMARKS

On 10May06 I grabbed every Win32 GNU ARM cross-compiler I could find
and used it to compile our monster load of legacy C++. This is a big
project that consists of about 480 source files and makes a MOD that
is around 1MB in size. A big program seemed like a good test of
compiler performance.

The compilers used were the gnude 3.3.1 that everyone knows and loves,
the 2005q5 3.4.4 sourcery release and the (new) 2006q1 4.1.0 sourcery
release, and WinARM. I linked against the libraries that came with
each package, and linked with the armelf.brew custom linker script.

The winner (for me at least) is WinARM 4.1.0 thumb, and this is probably
your best bet right now if you want to use the GNU toolchain. Using a
4.x series GCC also gives you the possibility of greatest improvements
to the code generator and compiler down the road, since these
compilers use the new parse-tree machinery that provides a much
richer intermediate program representation to the compiler back-end.

I can't use ADS without making code changes since key parts of this
program use function pointer tables (that are not -ropi compliant).
Thus I have never been able to get an ADS 1.2 thumb version of this
program to link. I would love to have that number to see how GNU 4.1.0
is comparing, but it would be a side-trip now since I've decided to
go with GNU for our work here.

This just (11Jul06) in from one of the folks I've corresponded
with privately about this:

Hi Ward,
Long time no talk. I've been busy with lots of build system-related
tasks around here. Yesterday my lead and a team member were saying how
some problem they needed to fix with our build could be done with global
vars, but since they were on the old build system, they were stuck with
ADS. So I decided to bump up getting your linker script working with
WinARM (even downloaded the new 20060606 distro).
Happy to say that it works great... and our .mod file is actually
smaller than it was with ADS (165k vs 170k). The only thing you may be
missing on your Web page is mention of the '-Os' flag for compiling
(without it, the mod size was 310k). Got it running on a Kyocera KX18
phone, even. So I know it's good!

This is great news and the first comfirmation I have that 1) the
linker script works with the latest WinARM, and 2) THE OUTPUT MOD FILE
CAN BE SMALLER THAN THE EQUIVALENT IN ADS

But, back to my benchmark: gnude won't compile our program in thumb
mode, so I can't compare gnude thumb to WinARM thumb, which is
disappointing. The best I can do is compare gnude 3.3.1 ARM optimized
(1,283,688) to WinARM ARM 4.1.0 optimized (1,280,064) a whopping 3,624
byte saving (1%).

Here are the results. The columns are toolchain name, GCC version,
instruction set used, optimization settings, size of exe/elf file
output, size of mod file created, number of text relocations, number of
data relocations, and the total number of relocations. (Relocations
are worth looking at because GNU mod files carry a table of them
around and I wanted to know about any radical deltas here.)

When you turn off optimization on a 4.x series GCC, you REALLY turn
it off. The resulting output is huge.

The sourcery chains are still an also-ran, though I want to like
them better because I think those folks are doing great work. The
problem is they track the EABI closely and their tools emit all kinds
of fancy new relocations BREWelf2mod does not understand. Using my
objdump filter (that I updated yesterday) I can re-map these to (what
are probably) equivalent relocations for BREWelf2mod, but I HAVEN'T
ACTUALLY RUN ANY OF THE SOURCERY-PRODUCED MODULES so I don't know if
they really work. Since the sourcery output was (slightly) larger too,
I did not pursue it further.
It is interesting to speculate why the sourcery 4.1.0 output is larger
than the WinARM 4.1.0 output. If I had to speculate I would assume it
has to do with the way the runtime libraries are compiled and any
unique EABI support they put in their back-end.

When you turn on thumb, the number of data relocations spikes, but
the overall total is about the same and the module size overall is
still smaller than any ARM version.

So that's it. The smallest module size that I have verified actually
seems to run OK, is WinARM 4.1.0 thumb.

Now its off to keep working and developing with this compiler and
discover its other bugs and idiosyncrasies!

\winarm\bin\arm-elf-g++.exe
-mlittle-endian // or -mbig-endian if that's your target
-mcpu=arm7tdmi // or the right CPU for you, this is common
-mapcs-frame // use the "standard" arm procedure call standard
// (linked calling frames on the stack)
-fno-builtin // Don't generate inline versions memcpy(), memset()
// etc. Both because they may make the code larger, and
// because you need to call the BREW versions MEMCPY()
// and so forth.
-ffunction-sections // put each function in a separate .text._foo section
// this lets the linker throw away ones that are not
// called when it "garbage collects" (drops empty)
// sections. The default behavior for GCC ld 4.x seems
// to be to garbage collect, so the old --gc-sections switch
// is no longer necessary or recognized. (But, the
// new --no-gc-sections switch is, if you need to
// keep an uncalled section around for some reason.
// But for this, changing the linker script is
// probably a better choice.)
// Having each function in a separate section also
// lets the linker move callers and callees closer
// together (locality improvement) so shorter
// instructions can be used. (I don't know how much
// of an improvement this is in non -fpic code, tho)
-fno-exceptions // nope, you can't have C++ exceptions. And if you
// try it you'll suck in a bunch of library stuff
// that won't link.
-fno-unwind-tables // When exceptions are thrown the stack needs to
// "unwound" and destructors called. These tables
// are not supported.
-fno-rtti // Nope, no runtime type identification either, so
// no new-fnagled dynamic_cast<>()-ing either.
// All of these -fno-xxx things stop the compiler from pulling in library stuff and
// using globals. It may be possible to support them at a later time with open-source
// loaders and a good understanding of run-time internals.
-DDEBUG // If you need it
-DDYNAMIC_APP // Everyone includes this in applets, but I never
// see it used in the BREW headers (I think) Is it
// vestiagl? Or does it do something?
-I "C:\Program Files\BREW 3.1.5\sdk\inc" // and other include paths...
// repeat the -I for each one
-o applet.o // Your output (EABI ELF) file
-c applet.c // Your input source

-Os // Optimize for size (may make debugging harder)
// And turn on thumb and compile everything as thumb. Including AEEAppGen
// and AEEModGen.
-mthumb // enable thum instruction set
-mtpcs-frame // this sets the thumb stack frame, use this INSTEAD of
// -mapcs-frame above
--mcallee-super-interworking // this is magic that makes thumb work

You should use the C compiler for both AEEModGen.o and AEEAppGen.o.
Throw in the same -Os, -mthumb, -mcallee-super-interworking and
-mtpcs-frame instead of -mapcs-frame if you are "going small."

If you are compiling thumb, my current recommendation is to compile
everything as thumb (including AEEModGen.c and AEEAppGen.c) and use
the --mcallee-super-interworking switch to transition to thumb mode.
(I could not get the traditional method of interwork building and
compiling AEEModGen and AEEAppGen in ARM mode to work.)

To expand a bit, most folks seem to recommend doing something like
this:

...the thinking being that AEE would call the ARM code in AEEModGen
and we'd flip over to thumb on the call to (the C++)
AEEClsCreateInstance(). The return from AEEClsCreateInstance would
then be smart enough to return to ARM mode on exit.

Well, it seems that call (first switch to thumb) was failing and
nothing I did could make it go.

The "callee-super-interworking" switch inserts some preamble code to
switch into thumb mode at the start of the public functions AEE calls,
so you don't have to compile AEEAppGen or AEEModGen in ARM mode. In
fact, everything in the applet can be in thumb.

Callbacks from BREW seem to be working too (we do some socket stuff).
I was worried about this.

There is one trick to make a mod. As I will explain, BREWelf2Mod
does not understand R_ARM_THM_CALL. To get around this, I used the
gnude objdump utility (that emits the old names):

BREWelf2mod test.elf test.mod \gnude\bin\arm-elf-objdump.exe
^^^^^

Or, you can use my wrapper
program, which will also make your mod file a little smaller as well.

So....I'm not an expert, and I don't have a debugger for my target. I
don't understand why more "traditional" approach doesn't work, but
this workaround seems to go OK at the moment.

5. Link.

This is where things get tricky. The idea is to make an ELF file that
can be fed to BREWelf2mod. Now, BREWelf2mod needs to see some things
in the ELF file, or it will gack. Specifically:

It needs access to every rellocation in the ELF file.

It needs your code (the .text section) to be based at an address of
zero (so the simple-minded tiny loader BREWelf2mod sticks on your MOD
file can "fix up" (locate) your program easily.

It needs your file to have a .data and .bss section that follow your
.text section -- or it won't be able figure out how big things are
(and will happily make a multi-megabyte MOD file...)

It needs AEEMod_Load() to be the first thing in your ELF file's
.text section.

There is an unconfirmed report BREWelf2mod can't process functions
with more than, perhaps 255 characters in the name. This is because it
consumes the output of a program it exec()s to get the rellocation
list as ASCII: objdump, and may only have a fixed buffer inside to
hold the lines. Nicely, it just outputs a broken file if this
happens.

So...if you do something mad like trying to use templates in your C++,
this might bite you. Be careful.

Anyway, back to linking. To get the rellocations to be placed in the
ELF output, use --emit-relocs.

Now, traditionally, to get the .text scection loaded at zero you had
to use "-T text 0." And to quiet a linker warning you had to provide an
entry point, AEEMod_Load(), with "-entry AEEMod_Load." And to get all
the rellocations together, sorted and in the right place in the
output, you either had to use the right (.xc) linker script -- that,
thankfully, gnude used by default OR force it to use the (.xc) script
with a -zcombreloc switch when using other toolchains (Sourcery).

And to get AEEModGen.o first you had to stick it first on the command
line with something like this:

AEEModLoad.o AEEAppGen.o -( onelib twolib threelib -)

Note: unlike a lot of other linkers, GNU only searchs libraries once
unless you "group" them -- that what the -( and -) switches above do
to the example libraries "onelib" "twolib" and "threelib."

AND...if you turned on -Os or -O2 optimization, GNU would reorder
your sections (if you compiled with -ffunction-sections, as you
shoould) to improve "locality." So folks wound up putting
AEEMod_Load() in a separate file and linking it first to get around this.

Finally, to be sure a .bss and .data section were in your output, you
didn't have to do ANYTHING -- if you were a good BREW programmer they
would both be empty, but the old gnude chain would keep them around
anyway. Not so with 4.1.0, it drops them faster than you can say
"empty section" -- so BREWelf2mod would get confused and make ENORMOUS
output files.....

The way to address this is with a "linker script." This program that
tells ld how to take the input files and arrange them in the output.
(Equivalent to ADS scatter files). In that script, you can do
everything you can do on the command line PLUS, keep the .data and
.bss sections around no matter what. Very handy.

And...as long as I was writing a linker script, I figured I'd
automatically load AEEModGen.o first, set the text section offset to
zero, and provide an entry point so you don't have to.

So, given we have a custom linker script, the link command becomes:

\winarm\bin\arm-elf-ld.exe
--script armelf.brew // req: use custom script
--emit-relocs // req: BREWelf2mod needs the relocation info
--verbose // opt: be verbose, handy
--no-warn-mismatch // opt: you shouldn't have mismatch warnings, esp
// if not generating thumb code. Usually this warns
// you try to glue an interworking supporting object
// to one that does not, or ld _thinks_ does not...
-Map "wardtest.map" --cref // opt: make a cross referenced map if you need it
-L \winARM\lib\gcc\arm-elf\4.1.0\ // this is where to look if building ARM. If thumb
// you'll have mixed ARM and thumb calling the library
// and need to go down into the interworking and
// interworking/thumb directories.
-lgcc // This is the only one I've needed to link everything
-o "wardtest.elf" // output file
AEEAppGen.o // input file
applet.o // input file

Note that I did not specify AEEModGen.o. The linker script takes care
of this. See next section.

About __cxa_pure_virtual, memcpy(), memset(): The compiler generates
these by itself and they need to be satisfied. Qualcomm provides a
module called GCCResolver.o to do this, or you can just define them in
your code. __cxa_pure_virtual needs to do nothing. memcpy(), memset(),
strlen() (or any others) need to call their BREW AEEStdLib.h
equivalents.

That's right, GCCResolver is not from GNU and has nothing to do with
DNS!

This says that somewhere in "mylibpaths" I have 4 libraries: libONE.a,
libTWO.a, libTHREE.a, libFOUR.a and libgcc.a.

Note again I did not specify AEEModGen.o -- don't have to. The
armelf.brew linker script brings it in for me.

6. Run BREWelf2mod and pray.

I talk about BREWelf2mod below. The key thing is that it strips all
the ELF stuff off your image, squirts out your code and a table of
relocations and slaps a 0x9C long startup routine onto your file
before AEEMod_Load().

To get the relocations, it exec()s arm-elf-objdump and processes the
ASCII list of reloactions that come out.

Now, BREWelf2mod and the loader only understand a couple of
rellocation types, specifically R_ARM_PC24 and
R_ARM_ABS32. If you
have compiled straight ARM code all the way, that should be all that
is in your objdump output. So you can use the objdump that comes with
the WinARM toolchain:

BREWelf2mod myapp.elf myapp.mod \winARM\bin\arm-elf-objdump.exe

If you build thumb, the new toolchain's objdump emit an
R_THM_CALL
relocation instead of the old name for this,
R_ARM_THM_PC22. You can get
around this by specifying an old version of objdump -- like the one
from gnude:

BREWelf2mod myapp.elf myapp.mod \gnude\bin\arm-elf-objdump.exe

Or I have just written a filter program to 1) transform the output of
WinARM's objdump so that BREWelf2mod can understand and 2) drop the
__cxa_pure_virtual relocations. The (simple) source code is
here.
Just stick it in Visual Studio and compile as a Win32 console app.
Instructions for how to deploy it are in the comment at the start of
the source. (If you trust me not to be giving you mal-ware, my
pre-compiled debug version (from VS Express 2005) is
here. (Just save this link to disk.)

...or..writing a wrapper program, or maybe just patching the string in
the BREWelf2mod.exe directly.

My philosophy was to start with nothing and add stuff I needed or
seemed "nice to have" -- this way I understand the reason for
everything in there. The default GNU linker scripts are so terrifying
because they include everything for every platform. We don't need
that.

Here is the
linker script. Save this link to disk in a file called
"armelf.brew" and call it with the --script switch to
ld as shown above.

DEVELOPMENT TOOLS OVERIVEW

I already posted this elsewhere, but I've updated and here is my read as of early May
2006:

TUTORIAL ON LINKING, LOCATING AND LOADING IN BREW AND
ELSEWHERE

When you compile a program with GCC ARM the resulting file is in a
format called ELF or ARM-ELF. This stands for "executable and linking
format" -- a souped-up version of the old "Portable Executable" (PE)
format still used on Win32 (that superceeded earlier 16 bits formats
such as the New Executable (NE) Linear Executable (LE) a.out, and old
MS-DOS files (MZ -- "Mark Zbikowski") files. ELF files are structured
as a header and then a series of logical sections that the linker
makes up. Some of these are common and the operating system loader
relies on being able to find them to get a program into memory, set it
up for execution, and run it. For example, the .text section contains
the program code (and is generally marked read-only). The .rodata
section contains "read-only" or rommable data -- string constants,
tables, "const" stuff. This is the only kind of data we like in BREW.
On a desktop machine the .data section contains intialized variables,
and the .bss section ("block start symbol" or "block storage start"
annacronistic Unix name) contains a bunch of zeros for uninitailzed
global variables (actually, it contains how big such a section should
be, the bytes aren't actually in the file -- the OS loader allocs that
memory and zeros it when it is preparing the program to run).

The idea of ELF as a "standard" is that many operating systems can
load and run those files (Linux, Symbian, BREW). ELF is part of a
larger notion of an Application Binary Interface (ABI) that specifies
how executables, object files and libraries/archive files should be
structures so compilers/linkers/debuggers (collectively: "toolchains")
can interoperate with each other. ARM Limited blessed an ABI around
1998. This is called the "Extended/Embedded ABI" or EABI. Tool makers
(including GNU) have been modifying their tools to track this. (It
appears even this spec is slighty revised over time, since the latest
EABI paper I have from ARM is dated January 2006. Early GNU and early
ARM compilers (including ADS 1.2) did not follow the EABI spec and
thus do not interoperate. The latest RVDS 2.x series of ARM compiles
and recent GNU compilers (late v3 and v4) do support it and, in
theory, interoperate.

On an operating system with a Memory Management Unit (MMU) (most
Linux, all Win32) an application can be run at what seems (to the
application) to be a fixed address. Thanks to the miracle of virtual
address space mapping to physical hardware, each application can
believe it is running at the same address in memory (0x80000000 for
example). This makes a linker's job very easy, all it has to do is
start the program counter at that number count up from there. Further,
it can figure out the absolute addresses of every symbol in the
program (both code and data) and stomp absolute values into the code
where required. This process is called "locating" the executable.

Now with libraries (DLL, dynamic shared objects, etc.) and on BREW it
gets more complicated. In the case of BREW, there is no MMU and the
application will run in real memory at possibly any address. The BREW
applet loader would like to be as simple and stupid as possible and
just allocate some heap memory, read the applet file into it, and call
the first byte. This is, in fact, what it does. The problem is, if the
linker does not know the run address, how does the executable image
get located -- or as we say "fixed up" -- so it can run without
crashing?

We now enter the thrilling realm of rellocations. When the linker
slams together all the object modules it also outputs a rellocation
section(s) (.reloc) that contains a list of places of that need to
have their address adjusted in some way. A simple approach (and one
kind, ABS32 reloc) is to simply add the 32 bit base load address of
the applet to the location. Other rellocations involve calculating
offsets and adding them. It all depends on the addressing mode of the
instruction being fixed up and what is required to let it find the
data or code it is after.

Recall that an ELF file has a header and all the "good stuff" is
tucked away in sections. This is not something you can just suck into
memory and jump to. To solve this, Qualcomm (QC) provides a utility in
the VSAddins directory of the SDK called BREWelf2mod. This utility
does several things:

It strips off the ELF header of the input file

It secretly runs the GNU objdump utility to get an ASCII list of
relocations in the ELF file.

It runs through the rellocations and "cooks them" so they are easy
to apply. In fact, it gets it down to a sorted list of places in the
image that need to have the actual loaded address added to them (the
addenend)
Note that in a big program, this data can be of fair size and is one
of the reasons GNU gets a rap for big images on ARM. (There are tricks
that could fix this with more development.)

It slaps a short (0x9C byte) startup routine on the front of the
output image. This is the code that first gets control when BREW
jumps to location zero in an applet. It fixes up the rellocations in
the image and after that, winds up executing AEEMod__Load().

If you are curious, the layout of this startup code looks like this
(from my limited poking around):

Note there is no need to explicitly compile position independent code,
because this scheme is postion independent. In fact, if you do compile
position independent (-fPIC) it won't work because GNU implements this
by creating some "thunk sections" -- the global offset table (or .got)
these are not supported by elf2mod startup code. (Thumb "interworking"
calls can generate thunks too -- to switch from 16 to 32 bit
instructions and back -- but that's a different thing -- and something
I have not gotten to work yet anyway.)

By contrast, the ADS toolchain only compiles position independent
(-ropi). This means no startup code is required to fix it up at
runtime, AEE can just call the first byte and off it goes, but you
can't have tables of function pointers, or mutable global variables.
It also means vtable dispatching and long jumps are inefficient,
having to go through "jump veneers" if the target is too far away.

Note that the LightBlue project intends to make their own elf2mod
utility that will allow compiling applets that are not -ropi and
fixing them up at load time as is done with GNU. This will being
LightBlue vtable support, global variables, function pointer tables,
string tables and efficient vtable dispatching to ADS just as is now
available with GNU.

For the really curious, here is my disassembled and annotated
version of the BREW gnu startup code: