Assembly HOWTO
François-René Rideau fare@tunes.orgv0.4q, 22 June 1999
This is the
Linux Assembly HOWTO.
This document describes how to program in assembly
using FREE programming tools,
focusing on development for or from the Linux Operating System
on i386 platforms.
Included material may or may not be applicable
to other hardware and/or software platforms.
Contributions about these would be gladly accepted.
keywords:
assembly, assembler, free, macroprocessor, preprocessor,
asm, inline asm, 32-bit, x86, i386, gas, as86, nasm
INTRODUCTION
Legal Blurp

This is an interactively evolving document: you are especially invited
to ask questions,
to answer to questions,
to correct given answers,
to add new FAQ answers,
to give pointers to other software,
to point the current maintainer to bugs or deficiencies in the pages.
If you're motivated, you could even
take over the maintenance of the HOWTO.
In one word, contribute!
To contribute, please contact whoever appears to maintain
the Assembly-HOWTO. At the time of this writing, it's me, i.e.
.
However, it's been some time since I've been looking for a serious hacker
to replace me as maintainer of this document. Disadvantages are
you must spend some time updating and correcting the document,
and learning the LDP publication tools. Advantages are you get some fame
and you can receive complimentary copies of HOWTO compendiums.
Foreword

This document aims at answering frequently asked questions of people
who program or want to program 32-bit x86 assembly using
,
particularly under the Linux operating system.
It may also point to other documents about
non-free, non-x86, or non-32-bit assemblers,
though such is not its primary goal.
Because the main interest of assembly programming is to build to write
the guts of operating systems, interpreters, compilers, and games,
where a C compiler fails to provide the needed expressiveness
(performance is more and more seldom an issue),
we stress on development of such software.
How to use this document

This document contains answers to some frequently asked questions.
At many places, Universal Resource Locators (URL) are given for some
software or documentation repository.
Please see that the most useful repositories are mirrored,
and that by accessing a nearer mirror site,
you relieve the whole Internet from unneeded network traffic,
while saving your own precious time.
Particularly, there are large repositories all over the world,
that mirror other popular repositories.
You should learn and note what are those places near you (networkwise).
Sometimes, the list of mirrors is listed in a file,
or in a login message. Please heed the advice.
Else, you should ask archie about the software you're looking for...
The most recent version for this documents sits in
but what's in Linux HOWTO repositories should be fairly up to date, too
(I can't know):
.
A french translation of this HOWTO can be found around
.
Other related documents

If you don't know what free software is,
please do read carefully the GNU General Public License,
which is used in a lot of free software,
and is a model for most of their licenses.
It generally comes in a file named COPYING,
with a library version in a file named COPYING.LIB.
Literature from the
(free software foundation) might help you, too.
Particularly, the interesting kind of free software
comes with sources that you can consult and correct,
or sometimes even borrow from.
Read your particular license carefully, and do comply to it.
There is a FAQ for comp.lang.asm.x86 that answers generic questions
about x86 assembly programming, and questions about some commercial
assemblers in a 16-bit DOS environment.
Some of it apply to free 32-bit asm programming, so you may want
to read this
...
FAQs and docs exist about programming on your favorite platform,
whichever it is, that you should consult for platform-specific issues
not directly related to programming in assembler.
History

Each version includes a few fixes and minor corrections,
which needs not be repeatedly mentionned every time.
Version 0.1 23 Apr 1996
Francois-Rene "Faré" Rideau <fare@tunes.org>
creates and publishes the first mini-HOWTO,
because ``I'm sick of answering ever the same questions
on comp.lang.asm.x86''
Version 0.2 4 May 1996 *
Version 0.3c 15 Jun 1996 *
Version 0.3f 17 Oct 1996 *
Version 0.3g 2 Nov 1996
Created the History. Added pointers in cross-compiling section.
Added section about I/O programming under Linux (particularly video).
Version 0.3h 6 Nov 1996
more about cross-compiling -- See on sunsite: devel/msdos/
Version 0.3i 16 Nov 1996
NASM is getting pretty slick
Version 0.3j 24 Nov 1996
point to french translated version
Version 0.3k 19 Dec 1996
What? I had forgotten to point to terse???
Version 0.3l 11 Jan 1997 *
Version 0.4pre1 13 Jan 1997
text mini-HOWTO transformed into a full linuxdoc-sgml HOWTO,
to see what the SGML tools are like.
Version 0.4 20 Jan 1997
first release of the HOWTO as such.
Version 0.4a 20 Jan 1997
CREDITS section added
Version 0.4b 3 Feb 1997
NASM moved: now is before AS86
Version 0.4c 9 Feb 1997
Added section "DO YOU NEED ASSEMBLY?"
Version 0.4d 28 Feb 1997
Vapor announce of a new Assembly-HOWTO maintainer.
Version 0.4e 13 Mar 1997
Release for DrLinux
Version 0.4f 20 Mar 1997 *
Version 0.4g 30 Mar 1997 *
Version 0.4h 19 Jun 1997
still more on "how not to use assembly";
updates on NASM, GAS.
Version 0.4i 17 July 1997
info on 16-bit mode access from Linux.
Version 0.4j 7 September 1997 *
Version 0.4k 19 October 1997 *
Version 0.4l 16 November 1997
release for LSL 6th edition.
Version 0.4m 23 March 1998
corrections about gcc invocation
Version 0.4o 1 December 1998 *
Version 0.4p 6 June 1999
clean up and updates.
Version 0.4q 22 June 1999
process argument passing (argc,argv,environ) in assembly.
This is yet another
``last release by Faré before new maintainer takes over''.
Only nobody knows who the new maintainer might be.
Credits

I would like to thanks the following persons, by order of appearance:
for Linux
for bcc from which as86 is extracted
and
for NASM
and now
for maintaining HOWTOs
for his FAQ
for his translation of the mini-HOWTO into french
(sad thing for the original author to be french and write in english)
and
for helping me, if not for taking over the HOWTO.
for his insight on GCC invocation.
for helping me figure out the process argument passing convention
All the people who have contributed ideas, remarks, and moral support.
DO YOU NEED ASSEMBLY?

Well, I wouldn't want to interfere with what you're doing,
but here are a few advice from hard-earned experience.
Pros and Cons

The advantages of Assembly

Assembly can express very low-level things:
you can access machine-dependent registers and I/O.
you can control the exact behavior of code
in critical sections that might otherwise involve deadlock
between multiple software threads or hardware devices.
you can break the conventions of your usual compiler,
which might allow some optimizations
(like temporarily breaking rules about memory allocation,
threading, calling conventions, etc).
you can build interfaces between code fragments
using incompatible such conventions
(e.g. produced by different compilers,
or separated by a low-level interface).
you can get access to unusual programming modes of your processor
(e.g. 16 bit mode to interface startup, firmware, or legacy code
on Intel PCs)
you can produce reasonably fast code for tight loops
to cope with a bad non-optimizing compiler
(but then, there are free optimizing compilers available!)
you can produce code where
(but only on CPUs with known instruction timings,
which generally excludes all current ....
you can produce hand-optimized code
that's perfectly tuned for your particular hardware setup,
though not to anyone else's.
you can write some code for your new language's
optimizing compiler
(that's something few will ever do, and even they, not often).

The disadvantages of Assembly

Assembly is a very low-level language
(the lowest above hand-coding the binary instruction patterns).
This means
it's long and tedious to write initially,
it's very bug-prone,
your bugs will be very difficult to chase,
it's very difficult to understand and modify,
i.e. to maintain.
the result is very non-portable to other architectures,
existing or future,
your code will be optimized only for a certain implementation
of a same architecture:
for instance, among Intel-compatible platforms,
each CPU design and its variations
(relative latency, throughput, and capacity,
of processing units, caches, RAM, bus, disks,
presence of FPU, MMX extensions, etc)
implies potentially completely different optimization techniques.
CPU designs already include
Intel 386, 486, Pentium, PPro, Pentium II;
Cyrix 5x86, 6x86; AMD K5, K6.
New designs keep popping up, so don't expect either this listing
or your code to be up-to-date.
your code might also be unportable accross different
OS platforms on the same architecture, by lack of proper tools.
(well, GAS seems to work on all platforms;
NASM seems to work or be workable on all intel platforms).
you spend more time on a few details,
and can't focus on small and large algorithmic design,
that are known to bring the largest part of the speed up.
&lsqb;e.g. you might spend some time building very fast
list/array manipulation primitives in assembly;
only a hash table would have sped up your program much more;
or, in another context, a binary tree;
or some high-level structure distributed over a cluster of CPUs&rsqb;
a small change in algorithmic design might completely
invalidate all your existing assembly code.
So that either you're ready (and able) to rewrite it all,
or you're tied to a particular algorithmic design;
On code that ain't too far from what's in standard benchmarks,
commercial optimizing compilers outperform hand-coded assembly
(well, that's less true on the x86 architecture
than on RISC architectures,
and perhaps less true for widely available/free compilers;
anyway, for typical C code, GCC is fairly good);
And in any case, as says moderator John Levine on comp.compilers,
``compilers make it a lot easier to use complex data structures,
and compilers don't get bored halfway through
and generate reliably pretty good code.''
They will also correctly propagate code transformations
throughout the whole (huge) program
when optimizing code between procedures and module boundaries.

Assessment

All in all, you might find that
though using assembly is sometimes needed,
and might even be useful in a few cases where it is not,
you'll want to:
minimize the use of assembly code,
encapsulate this code in well-defined interfaces
have your assembly code automatically generated
from patterns expressed in a higher-level language
than assembly (e.g. GCC inline assembly macros).
have automatic tools translate these programs
into assembly code
have this code be optimized if possible
All of the above,
i.e. write (an extension to) an optimizing compiler back-end.
Even in cases when Assembly is needed (e.g. OS development),
you'll find that not so much of it is,
and that the above principles hold.
See the sources for the Linux kernel about it:
as little assembly as needed,
resulting in a fast, reliable, portable, maintainable OS.
Even a successful game like DOOM was almost massively written in C,
with a tiny part only being written in assembly for speed up.
How to NOT use Assembly

General procedure to achieve efficient code

As says Charles Fiterman on comp.compilers
about human vs computer-generated assembly code,
``The human should always win and here is why.
First the human writes the whole thing in a high level language.
Second he profiles it to find the hot spots where it spends its time.
Third he has the compiler produce assembly for those small
sections of code.
Fourth he hand tunes them looking for tiny improvements over
the machine generated code.
The human wins because he can use the machine.''
Languages with optimizing compilers

Languages like
ObjectiveCAML, SML, CommonLISP, Scheme, ADA, Pascal, C, C++,
among others,
all have free optimizing compilers
that'll optimize the bulk of your programs,
and often do better than hand-coded assembly even for tight loops,
while allowing you to focus on higher-level details,
and without forbidding you to grab
a few percent of extra performance in the above-mentionned way,
once you've reached a stable design.
Of course, there are also commercial optimizing compilers
for most of these languages, too!
Some languages have compilers that produce C code,
which can be further optimized by a C compiler.
LISP, Scheme, Perl, and many other
are suches.
Speed is fairly good.
General procedure to speed your code up

As for speeding code up,
you should do it only for parts of a program
that a profiling tool has consistently identified
as being a performance bottleneck.
Hence, if you identify some code portion as being too slow, you should
first try to use a better algorithm;
then try to compile it rather than interpret it;
then try to enable and tweak optimization from your compiler;
then give the compiler hints about how to optimize
(typing information in LISP; register usage with GCC;
lots of options in most compilers, etc).
then possibly fallback to assembly programming
Finally, before you end up writing assembly,
you should inspect generated code,
to check that the problem really is with bad code generation,
as this might really not be the case:
compiler-generated code might be better than what you'd have written,
particularly on modern multi-pipelined architectures!
Slow parts of a program might be intrinsically so.
Biggest problems on modern architectures with fast processors
are due to delays from memory access, cache-misses, TLB-misses,
and page-faults;
register optimization becomes useless,
and you'll more profitably re-think data structures and threading
to achieve better locality in memory access.
Perhaps a completely different approach to the problem might help, then.
Inspecting compiler-generated code

There are many reasons to inspect compiler-generated assembly code.
Here are what you'll do with such code:
check whether generated code
can be obviously enhanced with hand-coded assembly
(or by tweaking compiler switches)
when that's the case,
start from generated code and modify it
instead of starting from scratch
more generally, use generated code as stubs to modify,
which at least gets right the way
your assembly routines interface to the external world
track down bugs in your compiler (hopefully rarer)
The standard way to have assembly code be generated
is to invoke your compiler with the -S flag.
This works with most Unix compilers,
including the GNU C Compiler (GCC), but YMMV.
As for GCC, it will produce more understandable assembly code with
the -fverbose-asm command-line option.
Of course, if you want to get good assembly code,
don't forget your usual optimization options and hints!
ASSEMBLERS

GCC Inline Assembly

The well-known GNU C/C++ Compiler (GCC),
an optimizing 32-bit compiler at the heart of the GNU project,
supports the x86 architecture quite well,
and includes the ability to insert assembly code in C programs,
in such a way that register allocation can be either specified or left to GCC.
GCC works on most available platforms,
notably Linux, *BSD, VSTa, OS/2, *DOS, Win*, etc.
Where to find GCC

The original GCC site is the GNU FTP site
together with all the released application software from the GNU project.
Linux-configured and precompiled versions can be found in
There exists a lot of FTP mirrors of both sites.
everywhere around the world, as well as CD-ROM copies.
GCC development has split in two branches some time ago,
but they will merge back soon.
See more about the experimental version, egcs, at
Sources adapted to your favorite OS, and binaries precompiled for it,
should be found at your usual FTP sites.
For most popular DOS port of GCC is named DJGPP,
and can be found in directories of such name in FTP sites. See:
There is also a port of GCC to OS/2 named EMX,
that also works under DOS,
and includes lots of unix-emulation library routines.
See around the following site:
.
Other URLs listed in previous versions of this HOWTO
seem to be as dead as OS/2.
Where to find docs for GCC Inline Asm

The documentation of GCC includes documentation files in texinfo format.
You can compile them with tex and print then result,
or convert them to .info, and browse them with emacs,
or convert them to .html, or nearly whatever you like.
convert (with the right tools) to whatever you like,
or just read as is.
The .info files are generally found on any good installation for GCC.
The right section to look for is:
C Extensions::Extended Asm::
Section
Invoking GCC::Submodel Options::i386 Options::
might help too.
Particularly, it gives the i386 specific constraint names for registers:
abcdSDB correspond to
&percnt;eax,
&percnt;ebx,
&percnt;ecx,
&percnt;edx,
&percnt;esi, &percnt;edi, &percnt;ebp
respectively (no letter for &percnt;esp).
The DJGPP Games resource (not only for game hackers) had this page
specifically about assembly, but it's down.
Its data have nonetheless been recovered on the
,
that contains a mine of other useful information:
GCC depends on GAS for assembling, and follow its syntax (see below);
do mind that inline asm needs percent characters to be quoted
so they be passed to GAS.
See the section about GAS below.
Find lots of useful examples in the linux/include/asm-i386/
subdirectory of the sources for the Linux kernel.
Invoking GCC to have it properly inline assembly code ?

Because assembly routines from the kernel headers
(and most likely your own headers,
if you try making your assembly programming as clean
as it is in the linux kernel)
are embedded in extern inline functions,
GCC must be invoked with the -O flag (or -O2, -O3, etc),
for these routines to be available.
If not, your code may compile, but not link properly,
since it will be looking for non-inlined extern functions
in the libraries against which your program is being linked !!!
Another way is to link against libraries that include fallback
versions of the routines.
Inline assembly can be disabled with -fno-asm,
which will have the compiler die when using extended inline asm syntax,
or else generate calls to an external function named asm()
that the linker can't resolve.
To counter such flag, -fasm restores treatment of the asm keyword.
More generally, good compile flags for GCC on the x86 platform are
gcc -O2 -fomit-frame-pointer -W -Wall
-O2 is the good optimization level in most cases.
Optimizing besides it takes longer, and yields code that is a lot larger,
but only a bit faster;
such overoptimization might be useful for tight loops only (if any),
which you may be doing in assembly anyway.
In cases when you need really strong compiler optimization for a few files,
do consider using up to -O6.
-fomit-frame-pointer allows generated code to skip the stupid
frame pointer maintenance, which makes code smaller and faster,
and frees a register for further optimizations.
It precludes the easy use of debugging tools (gdb),
but when you use these,
you just don't care about size and speed anymore anyway.
-W -Wall enables all warnings and helps you catch obvious stupid errors.
You can add some cpu-specific -m486 or such flag so that
GCC will produce code that is more adapted to your precise computer.
Note that EGCS (and perhaps GCC 2.8) have -mpentium and such flags,
whereas GCC 2.7.x and older versions do not.
A good choice of CPU-specific flags should be in the Linux kernel.
Check the texinfo documentation of your current GCC installation for more.
-m386 will help optimize for size,
hence also for speed on computers whose memory is tight and/or loaded,
since big programs cause swap, which more than counters
any "optimization" intended by the larger code.
In such settings, it might be useful to stop using C,
and use instead a language that favors code factorization,
such as a functional language and/or FORTH,
and use a bytecode- or wordcode- based implementation.
Note that you can vary code generation flags from file to file,
so that performance-critical files use maximal optimization,
whereas other files be optimized for size.
To optimize even more, option -mregparm=2
and/or corresponding function attribute might help,
but might pose lots of problems when linking to foreign code,
including the libc.
There are ways to correctly declare foreign functions
so the right call sequences be generated,
or you might want to recompile the foreign libraries
to use the same register-based calling convention...
Note that you can add make these flags the default by editing file
/usr/lib/gcc-lib/i486-linux/2.7.2.3/specs
or wherever that is on your system (better not add -Wall there, though).
The exact location of the GCC specs files on your system
can be found by asking gcc -v.
GAS

GAS is the GNU Assembler, that GCC relies upon.
Where to find it

Find it at the same place where you found GCC,
in a package named binutils.
What is this AT&T syntax

Because GAS was invented to support a 32-bit unix compiler,
it uses standard ``AT&T'' syntax,
which resembles a lot the syntax for standard m68k assemblers,
and is standard in the UNIX world.
This syntax is no worse, no better than the ``Intel'' syntax.
It's just different.
When you get used to it,
you find it much more regular than the Intel syntax,
though a bit boring.
Here are the major caveats about GAS syntax:
Register names are prefixed with &percnt;, so that
registers are &percnt;eax, &percnt;dl and suches
instead of just eax, dl, etc.
This makes it possible to include external C symbols directly
in assembly source, without any risk of confusion, or any need
for ugly underscore prefixes.
The order of operands is source(s) first, and destination last,
as opposed to the intel convention of destination first and sources last.
Hence, what in intel syntax is mov ax,dx (move contents of
register dx into register ax) will be in att syntax
mov &percnt;dx, &percnt;ax.
The operand length is specified as a suffix to the instruction name.
The suffix is b for (8-bit) byte,
w for (16-bit) word,
and l for (32-bit) long.
For instance, the correct syntax for the above instruction
would have been movw &percnt;dx,&percnt;ax.
However, gas does not require strict att syntax,
so the suffix is optional when length can be guessed from register operands,
and else defaults to 32-bit (with a warning).
Immediate operands are marked with a &dollar; prefix,
as in addl &dollar;5,&percnt;eax
(add immediate long value 5 to register &percnt;eax).
No prefix to an operand indicates it is a memory-address;
hence movl &dollar;foo,&percnt;eax
puts the address of variable foo
in register &percnt;eax,
but movl foo,&percnt;eax
puts the contents of variable foo
in register &percnt;eax.
Indexing or indirection is done by enclosing the index register
or indirection memory cell address in parentheses,
as in testb &dollar;0x80,17(&percnt;ebp)
(test the high bit of the byte value at offset 17
from the cell pointed to by &percnt;ebp).
A program exists to help you convert programs
from TASM syntax to AT&T syntax. See
.
(Since the original x2ftp site is closing, use a
).
There also exists a program for the reverse conversion:
.
GAS has comprehensive documentation in TeXinfo format,
which comes at least with the source distribution.
Browse extracted .info pages with Emacs or whatever.
There used to be a file named gas.doc or as.doc
around the GAS source package, but it was merged into the TeXinfo docs.
Of course, in case of doubt, the ultimate documentation
is the sources themselves!
A section that will particularly interest you is
Machine Dependencies::i386-Dependent::
Again, the sources for Linux (the OS kernel), come in as good examples;
see under linux/arch/i386, the following files:
kernel/*.S, boot/compressed/*.S, mathemu/*.S
If you are writing kind of a language, a thread package, etc
you might as well see how other languages (OCaml, gforth, etc),
or thread packages (QuickThreads, MIT pthreads, LinuxThreads, etc),
or whatever, do it.
Finally, just compiling a C program to assembly
might show you the syntax for the kind of instructions you want.
See section above.
Limited 16-bit mode

GAS is a 32-bit assembler, meant to support a 32-bit compiler.
It currently has only limited support for 16-bit mode,
which consists in prepending the 32-bit prefixes to instructions,
so you write 32-bit code that runs in 16-bit mode on a 32 bit CPU.
In both modes, it supports 16-bit register usage,
but what is unsupported is 16-bit addressing.
Use the directive .code16 and .code32
to switch between modes.
Note that an inline assembly statement
asm(&dquot;.code16&bsol;n&dquot;)
will allow GCC to produce 32-bit code that'll run in real mode!
I've been told that most code needed to fully support
16-bit mode programming was added to GAS by Bryan Ford (please confirm?),
but at least, it doesn't show up in any of the distribution I tried,
up to binutils-2.8.1.x ... more info on this subject would be welcome.
A cheap solution is to define macros (see below) that somehow produce
the binary encoding (with .byte) for just the 16-bit mode instructions
you need (almost nothing if you use code16 as above,
and can safely assume the code will run on a 32-bit capable x86 CPU).
To find the proper encoding, you can get inspiration from
the sources of 16-bit capable assemblers for the encoding.
GASP

GASP is the GAS Preprocessor.
It adds macros and some nice syntax to GAS.
Where to find GASP

GASP comes together with GAS in the GNU binutils archive.
How it works

It works as a filter, much like cpp and the like.
I have no idea on details, but it comes with its own texinfo documentation,
so just browse them (in .info), print them, grok them.
GAS with GASP looks like a regular macro-assembler to me.
NASM

The Netwide Assembler project is producing yet another i386 assembler,
written in C, that should be modular enough
to eventually support all known syntaxes and object formats.
Where to find NASM

Binary release on your usual metalab mirror in
devel/lang/asm/
Should also be available as .rpm or .deb in your usual RedHat/Debian
distributions' contrib.
What it does

At the time this HOWTO is written, version 0.98 of NASM is just out.
The syntax is Intel-style.
Some macroprocessing support is integrated.
Supported object file formats are
bin, aout, coff, elf, as86,
(DOS) obj, win32, (their own format) rdf.
NASM can be used as a backend for the free LCC compiler
(support files included).
Surely NASM evolves too fast for this HOWTO to be kept up to date.
Unless you're using BCC as a 16-bit compiler
(which is out of scope of this 32-bit HOWTO),
you should definitely use NASM instead of say AS86 or MASM,
because it is actively supported online,
and runs on all platforms.
Note: NASM also comes with a disassembler, NDISASM.
Its hand-written parser makes it much faster than GAS,
though of course, it doesn't support three bazillion different architectures.
For the x86 target, it should be the assembler of choice...
AS86

AS86 is a 80x86 assembler, both 16-bit and 32-bit,
part of Bruce Evans' C Compiler (BCC).
It has mostly Intel-syntax,
though it differs slightly as for addressing modes.
Where to get AS86

A completely outdated version of AS86 is distributed by HJLu
just to compile the Linux kernel,
in a package named bin86 (current version 0.4),
available in any Linux GCC repository.
But I advise no one to use it for anything else but compiling Linux.
This version supports only a hacked minix object file format,
which is not supported by the GNU binutils or anything,
and it has a few bugs in 32-bit mode,
so you really should better keep it only for compiling Linux.
The most recent versions by Bruce Evans (bde@zeta.org.au)
are published together with the FreeBSD distribution.
Well, they were: I could not find the sources from distribution 2.1 on :(
Hence, I put the sources at my place:
The Linux/8086 (aka ELKS) project is somehow maintaining bcc
(though I don't think they included the 32-bit patches).
See around
(or )
and .
I haven't followed these developments,
and would appreciate a reader contributing on this topic.
Among other things, these more recent versions, unlike HJLu's,
supports Linux GNU a.out format,
so you can link you code to Linux programs, and/or use the usual
tools from the GNU binutils package to manipulate your data.
This version can co-exist without any harm with the previous one
(see according question below).
BCC from 12 march 1995 and earlier version has a misfeature
that makes all segment pushing/popping 16-bit,
which is quite annoying when programming in 32-bit mode.
I wrote a patch at a time when the TUNES Project used as86:
.
Bruce Evans accepted this patch,
but since as far as I know he hasn't published a new release of bcc,
the ones to ask about integrating it (if not done yet)
are the ELKS developers.
How to invoke the assembler?

Here's the GNU Makefile entry for using bcc
to transform .s asm
into both GNU a.out .o object
and .l listing:
&percnt;.o &percnt;.l: &percnt;.s
bcc -3 -G -c -A-d -A-l -A&dollar;*.l -o &dollar;*.o &dollar;<
Remove the &percnt;.l, -A-l, and -A&dollar;*.l,
if you don't want any listing.
If you want something else than GNU a.out,
you can see the docs of bcc about the other supported formats,
and/or use the objcopy utility from the GNU binutils package.
Where to find docs

The docs are what is included in the bcc package.
I salvaged the man pages that used to be available from the FreeBSD site at
.
Maybe ELKS developers know better.
When in doubt, the sources themselves are often a good docs:
it's not very well commented, but the programming style is straightforward.
You might try to see how as86 is used in ELKS or Tunes 0.0.0.25...
What if I can't compile Linux anymore with this new version ?

Linus is buried alive in mail,
and since HJLu (official bin86 maintainer)
chose to write hacks around an obsolete version of as86
instead of building clean code around the latest version,
I don't think my patch for compiling Linux with a modern as86
has any chance to be accepted if resubmitted.
Now, this shouldn't matter: just keep your as86 from the bin86 package
in /usr/bin, and let bcc install the good as86 as
/usr/local/libexec/i386/bcc/as
where it should be. You never need explicitly call this ``good'' as86,
because bcc does everything right, including conversion to Linux a.out,
when invoked with the right options;
so assemble files exclusively with bcc as a frontend, not directly with as86.
OTHER ASSEMBLERS

These are other, non-regular, options,
in case the previous didn't satisfy you (why?),
that I don't recommend in the usual (?) case,
but that could prove quite useful if the assembler must be integrated
in the software you're designing (i.e. an OS or development environment).
Win32Forth assembler

Win32Forth is a free 32-bit ANS FORTH system
that successfully runs under Win32s, Win95, Win/NT.
It includes a free 32-bit assembler (either prefix or postfix syntax)
integrated into the reflective FORTH language.
Macro processing is done with
the full power of the reflective language FORTH;
however, the only supported input and output contexts is Win32For itself
(no dumping of .obj file, but you could add that feature yourself, of course).
Find it at
.
Terse

is a programming tool that provides
THE most compact assembler syntax for the x86 family!
However, it is evil proprietary software.
It is said that there was a project for a free clone somewhere,
that was abandonned after worthless pretenses that the syntax
would be owned by the original author.
Thus, if you're looking for
a nifty programming project related to assembly hacking,
I invite you to develop a terse-syntax frontend to NASM,
if you like that syntax.
Non-free and/or Non-32bit x86 assemblers.

You may find more about them,
together with the basics of x86 assembly programming,
in Raymond Moon's FAQ for comp.lang.asm.x86:
.
Note that all DOS-based assemblers should work inside the Linux DOS Emulator,
as well as other similar emulators, so that if you already own one,
you can still use it inside a real OS.
Recent DOS-based assemblers also support COFF and/or other object file formats
that are supported by the GNU BFD library,
so that you can use them together with your free 32-bit tools,
perhaps using GNU objcopy (part of the binutils) as a conversion filter.
METAPROGRAMMING/MACROPROCESSING

Assembly programming is a bore,
but for critical parts of programs.
You should use the appropriate tool for the right task,
so don't choose assembly when it's not fit;
C, OCAML, perl, Scheme, might be a better choice for most
of your programming.
However, there are cases when these tools do not give
a fine enough control on the machine, and assembly is useful or needed.
In those case, you'll appreciate a system of macroprocessing and
metaprogramming that'll allow recurring patterns to be factored
each into a one indefinitely reusable definition,
which allows safer programming, automatic propagation of pattern modification,
etc.
A ``plain'' assembler is often not enough,
even when one is doing only small routines to link with C.
What's integrated into the above

Yes I know this section does not contain much useful up-to-date information.
Feel free to contribute what you discover the hard way...
GCC

GCC allows (and requires) you to specify register constraints
in your ``inline assembly'' code, so the optimizer always know about it;
thus, inline assembly code is really made of patterns,
not forcibly exact code.
Thus, you can make put your assembly into CPP macros, and inline C functions,
so anyone can use it in as any C function/macro.
Inline functions resemble macros very much, but are sometimes cleaner to use.
Beware that in all those cases, code will be duplicated,
so only local labels (of 1: style)
should be defined in that asm code.
However, a macro would allow the name for a non local defined label
to be passed as a parameter
(or else, you should use additional meta-programming methods).
Also, note that propagating inline asm code will spread potential bugs in them;
so watch out doubly for register constraints in such inline asm code.
Lastly, the C language itself may be considered as a good abstraction
to assembly programming,
which relieves you from most of the trouble of assembling.
GAS

GAS has some macro capability included, as detailed in the texinfo docs.
Moreover, while GCC recognizes .s files as raw assembly to send to GAS,
it also recognizes .S files as files to pipe through CPP before
to feed them to GAS.
Again and again, see Linux sources for examples.
GASP

It adds all the usual macroassembly tricks to GAS.
See its texinfo docs.
NASM

NASM has some macro support, too.
See according docs.
If you have some bright idea,
you might wanna contact the authors,
as they are actively developing it.
Meanwhile, see about external filters below.
AS86

It has some simple macro support, but I couldn't find docs.
Now the sources are very straightforward,
so if you're interested, you should understand them easily.
If you need more than the basics, you should use an external filter
(see below).
OTHER ASSEMBLERS

Win32FORTH:
CODE and END-CODE are normal that do not switch from interpretation mode
to compilation mode, so you have access to the full power of FORTH
while assembling.
TUNES:
it doesn't work yet, but the Scheme language is a real high-level language
that allows arbitrary meta-programming.
External Filters

Whatever is the macro support from your assembler,
or whatever language you use (even C !),
if the language is not expressive enough to you,
you can have files passed through an external filter
with a Makefile rule like that:
&percnt;.s: &percnt;.S other&lowbar;dependencies
&dollar;(FILTER) &dollar;(FILTER&lowbar;OPTIONS) < &dollar;< > &dollar;@
CPP

CPP is truely not very expressive, but it's enough for easy things,
it's standard, and called transparently by GCC.
As an example of its limitations, you can't declare objects so that
destructors are automatically called at the end of the declaring block;
you don't have diversions or scoping, etc.
CPP comes with any C compiler.
However, considering how mediocre it is,
stay away from it if by chance you can make it without C,
M4

M4 gives you the full power of macroprocessing,
with a Turing equivalent language, recursion, regular expressions, etc.
You can do with it everything that CPP cannot.
See
or
as examples of advanced macroprogramming using m4.
However, its disfunctional quoting and unquoting semantics force you to use
explicit continuation-passing tail-recursive macro style if
you want to do advanced macro programming
(which is remindful of TeX -- BTW, has anyone tried to use TeX as
a macroprocessor for anything else than typesetting ?).
This is NOT worse than CPP that does not allow quoting and recursion anyway.
The right version of m4 to get is GNU m4 1.4 (or later if exists),
which has the most features and the least bugs or limitations of all.
m4 is designed to be slow for anything but the simplest uses,
which might still be ok for most assembly programming
(you're not writing million-lines assembly programs, are you?).
Macroprocessing with yer own filter

You can write your own simple macro-expansion filter
with the usual tools: perl, awk, sed, etc.
That's quick to do, and you control everything.
But of course, any power in macroprocessing must be earned the hard way.
Metaprogramming

Instead of using an external filter that expands macros,
one way to do things is to write programs that write part
or all of other programs.
For instance, you could use a program outputing source code
to generate sine/cosine/whatever lookup tables,
to extract a source-form representation of a binary file,
to compile your bitmaps into fast display routines,
to extract documentation, initialization/finalization code,
description tables, as well as normal code from the same source files,
to have customized assembly code, generated from a perl/shell/scheme script
that does arbitrary processing,
to propagate data defined at one point only
into several cross-referencing tables and code chunks.
etc.
Think about it!
Backends from compilers

Compilers like GCC, SML/NJ, Objective CAML, MIT-Scheme, CMUCL, etc,
do have their own generic assembler backend,
which you might choose to use,
if you intend to generate code semi-automatically
from the according languages,
or from a language you hack:
rather than write great assembly code,
you may instead modify a compiler so that it dumps great assembly code!
The New-Jersey Machine-Code Toolkit

There is a project, using the programming language Icon
(with an experimental ML version),
to build a basis for producing assembly-manipulating code.
See around
TUNES

The
for a Free Reflective Computing System
is developping its own assembler
as an extension to the Scheme language,
as part of its development process.
It doesn't run at all yet, though help is welcome.
The assembler manipulates abstract syntax trees,
so it could equally serve as the basis for a assembly syntax translator,
a disassembler, a common assembler/compiler back-end, etc.
Also, the full power of a real language, Scheme,
make it unchallenged as for macroprocessing/metaprograming.
CALLING CONVENTIONS

Linux

Linking to GCC

That's the preferred way.
Check GCC docs and examples from Linux kernel .S files
that go through gas (not those that go through as86).
32-bit arguments are pushed down stack in reverse syntactic order
(hence accessed/popped in the right order),
above the 32-bit near return address.
&percnt;ebp, &percnt;esi,
&percnt;edi, &percnt;ebx are callee-saved,
other registers are caller-saved;
&percnt;eax is to hold the result,
or &percnt;edx:&percnt;eax for 64-bit results.
FP stack: I'm not sure,
but I think it's result in st(0), whole stack caller-saved.
Note that GCC has options to modify the calling conventions
by reserving registers, having arguments in registers,
not assuming the FPU, etc. Check the i386 .info pages.
Beware that you must then declare the cdecl or regparm(0)
attribute for a function that will follow standard GCC calling conventions.
See in the GCC info pages the section:
C Extensions::Extended Asm::.
See also how Linux defines its asmlinkage macro...
ELF vs a.out problems

Some C compilers prepend an underscore before every symbol,
while others do not.
Particularly, Linux a.out GCC does such prepending,
while Linux ELF GCC does not.
If you need cope with both behaviors at once,
see how existing packages do.
For instance, get an old Linux source tree,
the Elk, qthreads, or OCAML...
You can also override the implicit C->asm renaming
by inserting statements like
void foo asm(&dquot;bar&dquot;) (void);
to be sure that the C function foo will be called really bar in assembly.
Note that the utility objcopy, from the binutils package,
should allow you to transform your a.out objects into ELF objects,
and perhaps the contrary too, in some cases.
More generally, it will do lots of file format conversions.
Direct Linux syscalls

This is specifically NOT recommended,
because the conventions change from time to time
or from kernel flavor to kernel flavor (cf L4Linux),
plus it's not portable,
it's a burden to write, it's redundant with the libc effort,
AND it precludes fixes and extensions that are made to the libc,
like, for instance the zlibc package,
that does on-the-fly transparent decompression of gzip-compressed files.
The standard, recommended way to call Linux system services is,
and will stay, to go through the libc.
Shared objects should keep your stuff small.
And if you really want smaller binaries, do use &num;! stuff,
with the interpreter having all the overhead you want to keep out
of your binaries.
Now, if for some reason,
you don't want to link to the libc,
go get the libc and understand how it works!
After all, you're pretending to replace it, ain't you?
You might also take a look at how my
does it.
The sources for Linux come in handy, too,
particularly the asm/unistd.h header file,
that describes how to do system calls...
Basically, you issue an int &dollar;0x80,
with the &lowbar;&lowbar;NR&lowbar;syscallname number (from asm/unistd.h)
in &percnt;eax,
and parameters (up to five) in
&percnt;ebx, &percnt;ecx, &percnt;edx,
&percnt;esi, &percnt;edi respectively.
Result is returned in &percnt;eax,
with a negative result being an error
whose opposite is what libc would put in errno.
The user-stack is not touched,
so you needn't have a valid one when doing a syscall.
As for the invocation arguments passed to a process upon startup,
the general principle is that the stack
originally contains the number of arguments argc,
then the list of pointers that constitute *argv,
then a null-terminated sequence of null-terminated
variable=value strings.
For more details,
read the sources of C startup code from your libc (crt0.S or crt1.S),
the sources of eforth 1.0e,
or those of the linux kernel (exec.c et binfmt_*.c in linux/fs/).
Hardware I/O under Linux

If you want to do direct I/O under Linux,
either it's something very simple that needn't OS arbitration,
and you should see the IO-Port-Programming mini-HOWTO;
or it needs a kernel device driver, and you should try to learn more about
kernel hacking, device driver development, kernel modules, etc,
for which there are other excellent HOWTOs and documents from the LDP.
Particularly, if what you want is Graphics programming,
then do join one of the
or
projects.
Some people have even done better,
writing small and robust XFree86 drivers
in an interpreted domain-specific language,
,
and achieving the efficiency of hand C-written drivers
through partial evaluation (drivers not only not in asm, but not even in C!).
The problem is that the partial evaluator they used
to achieve efficiency is not itself free software.
Any taker for a replacement?
Anyway, in all these cases, you'll be better off using GCC inline assembly
with the macros from linux/asm/*.h than writing full assembly source files.
Accessing 16-bit drivers from Linux/i386

Such thing is theoretically possible
(proof: see how
can selectively grant hardware port access to programs),
and I've heard rumors that someone somewhere did actually do it
(in the PCI driver? Some VESA access stuff? ISA PnP? dunno).
If you have some more precise information on that,
you'll be most welcome.
Anyway, good places to look for more information are the Linux kernel sources,
DOSEMU sources (and other programs in the
),
and sources for various low-level programs under Linux...
(perhaps GGI if it supports VESA).
Basically, you must either use 16-bit protected mode or vm86 mode.
The first is simpler to setup, but only works with well-behaved code
that won't do any kind of segment arithmetics
or absolute segment addressing (particularly addressing segment 0),
unless by chance it happens that all segments used can be setup in advance
in the LDT.
The later allows for more "compatibility" with vanilla 16-bit environments,
but requires more complicated handling.
In both cases, before you can jump to 16-bit code,
you must
mmap any absolute address used in the 16-bit code
(such as ROM, video buffers, DMA targets, and memory-mapped I/O)
from /dev/mem to your process' address space,
setup the LDT and/or vm86 mode monitor.
grab proper I/O permissions from the kernel (see the above section)
Again, carefully read the source for the stuff contributed
to the DOSEMU project,
particularly these mini-emulators
for running ELKS and/or simple .COM programs under Linux/i386.
DOS

Most DOS extenders come with some interface to DOS services.
Read their docs about that,
but often, they just simulate int &dollar;0x21 and such,
so you do ``as if'' you were in real mode
(I doubt they have more than stubs
and extend things to work with 32-bit operands;
they most likely will just reflect the interrupt
into the real-mode or vm86 handler).
Docs about DPMI and such (and much more) can be found on
(again, the original x2ftp site is closing, so use a
).
DJGPP comes with its own (limited) glibc derivative/subset/replacement, too.
It is possible to cross-compile from Linux to DOS,
see the devel/msdos/ directory of your local FTP mirror for metalab.unc.edu
Also see the MOSS dos-extender from the
from university of Utah.
Other documents and FAQs are more DOS-centered.
We do not recommend DOS development.
Winblows and suches

Hey, this document covers only free software.
Ring me when Winblows becomes free,
or when there are free dev tools for it!
Well, after all there are:
has developped the cygwin32.dll library,
for GNU programs to run on MacroShit platforms.
Thus, you can use GCC, GAS, all the GNU tools,
and many other Unix applications.
Have a look around their homepage.
I (Faré) don't intend to expand on Losedoze programming,
but I'm sure you can find lots of documents about it everywhere...
Yer very own OS

Control being what attract many programmers to assembly,
want of OS development is often what leads to or stems from assembly hacking.
Note that any system that allows self-development could be qualified an "OS"
even though it might run "on top" of an underlying system that
multitasking or I/O (much like Linux over Mach or OpenGenera over Unix), etc.
Hence, for easier debugging purpose,
you might like to develop your ``OS'' first as a process running
on top of Linux (despite the slowness), then use the
(which grants use of Linux and BSD drivers in yer own OS)
to make it standalone.
When your OS is stable, it's still time to write your own
hardware drivers if you really love that.
This HOWTO will not itself cover topics such as
Boot loader code & getting into 32-bit mode,
Handling Interrupts,
The basics about intel ``protected mode'' or ``V86/R86'' braindeadness,
defining your object format and calling conventions.
The main place where to find reliable information about that all
is source code of existing OSes and bootloaders.
Lots of pointers lie in the following WWW page:
TODO & POINTERS

find someone who has got some time to takeover the maintenance
fill incomplete sections
add more pointers to software and docs
add simple examples from real life to illustrate the syntax, power,
and limitations of each proposed solution.
ask people to help with this HOWTO
perhaps give a few words for assembly on other architectures than i386?
A few pointers (in addition to those already in the rest of the HOWTO)
80x86 CPU family references:
;
.
mirrors the hornet and x2ftp
former archives of msdos assembly coding stuff.
A few starting points on the web about assembly programming:
;
;
;
Fun stuff: , a fun way to learn assembly in general.
USENET:
;
.
And of course, do use your usual Internet Search Tools
to look for more information,
and tell me anything interesting you find!
Author's .sig:
## Faré | VN: Ð£ng-Vû Bân | Join the TUNES project! http://www.tunes.org/ ##
## FR: François-René Rideau | TUNES is a Useful, Not Expedient System ##
## Reflection&Cybernethics | Project for a Free Reflective Computing System ##