Search

Porting DOS Applications to Linux

With a little care, the average DOS
application can be easily ported to the Linux system. This article
looks at some of the techniques involved, and tries to provide a
small “builder's kit” of handy little DOS routines people always
want under Linux.

Memory Models

DOS programs written in the C and C++ languages generally run
in a variety of different memory models with their own segmentation
semantics. The simplest is the “tiny” model, where all of the
program and data are referenced off one segment. All three segment
registers (CS, DS, and SS) point to the same place to suit the way
the processor wishes to work. The Linux kernel executes programs in
the 32-bit equivalent of tiny mode. Because offsets are 32-bit, not
16-bit, a program can utilise 4GB of address space before
segmentation becomes an issue. Thus you get the simplicity of tiny
model without the limitations.

As a result of this the DOS keywords near,
far, and huge, have no
meaning to Linux. These can be removed, or if you are trying to
maintain a common source tree, you can add these lines
instead:

gcc, the normal Linux C compiler, understands the
register keyword, but the code optimiser is
sufficiently good that using register is
normally a bad idea.

Many DOS C compilers support an inline
keyword. gcc also supports this.

C Types Supported

gcc supports all the ANSI C types you would expect and some
extensions. The size of the normal types is, however, different
from that of DOS compilers, and frequently causes problems when
porting. Here is a summary for sizes on Linux/i386 (Linux on other
architectures, such as the 64-bit Alpha, will differ in some
respects):

* Actually, because some of the address space is reserved and
used for other things, you can't get above about 2GB at the moment.

DOS programmers generally make good use of prototypes to
avoid mysterious crashes caused by passing the wrong type. Mixing
short and long under Linux
normally just results in mysterious value changes in passed
parameters, so the habit of prototyping is a good one to get into.
Furthermore, you can tell gcc to warn you about any routine which
has no prototype by adding the compiler flag
-Wstrict-prototypes. All of the C library and
system calls have prototypes, provided the correct header files are
included.

gcc: the GNU C Compiler

The GNU C compiler is an extremely flexible tool. Although it
compiles much slower than most of the DOS compilers, and is
(intentionally) without an Integrated Development Environment, it
has a wide range of abilities and flexibility that few DOS
compilers can touch. People who have used DJGPP to write 32-bit DOS
extender programs will be familiar with gcc, although in its Linux
and Unix form it is somewhat easier to work with.

It is worth knowing how to tell gcc how to cope with
different “flavours” of code. It can become a traditional K&R
C compiler, by using the -traditional option, a
strict ANSI compiler, by using the -ansi option,
or a GNU C compiler—ANSI + GNU extensions. In addition, you can
ask it to perform a wide range of sanity checks with the
-pedantic and -Wall options.
For a typical program, the compiler will generate a lot of
warnings, many of which will give insights into potential problems.
For example, the compiler will check to see that the conversion
options in the format strings of
printf()/scanf() and their
family of functions match the types of the variables they will
interpret.

The optimiser is controllable both by a general level of
optimisations, using the -O1 or
-O2 options, and on a per-optimisation basis for
those speed-critical special cases. The optimiser performs a wide
range of peephole and global optimisations, including intelligent
allocation of registers, loop unrolling, and even instruction
scheduling on RISC CPUs.

The GNU C compiler, linker, and debugger are all described in
complete documents available from the Free Software Foundation,
which you can either buy as books (the money goes to fund more free
software work) or print yourself.

To cover the compiler, debugging tools, make, and other
programs in full would require several more articles. If the
documentation and documentation viewer are all installed, typing
info gdb, info gcc, and
info ld should give you a good start. (If the
info program is not installed, the
Emacs editor can also be used to read documentation in the info
format.) Fans of graphical user interfaces may also like to pick up
tgdb as a graphical front end for
the gdb debugger, and xwpe, a
look-alike of a well-known DOS C development environment, built on
top of gcc, make, and gdb.

Where is the foo() Function?

This is commonly asked of a specific set of DOS C library and
system functions, most notably the various text mode window
packages, kbhit(), getch(),
getche(), and the string functions
stricmp() and
strnicmp().

Not surprisingly, equivalent functionality exists under
Linux. The text mode window case warrants a section of its own, so
you will have to wait a minute or skip on ahead. The string
functions are nice and easy. stricmp() is also
known as strcasecmp() and
strnicmp() as
strncasecmp()---there is just a naming
difference.

The keyboard I/O routines cause problems because Unix
terminal I/O is a lot more flexible than DOS terminal I/O, and in
the case of kbhit() it does not suit the
multitasking nature of the system to have the CPU spend all its
time in loops polling the keyboard. Furthermore, unlike in DOS, the
terminal mode is set explicitly, rather implicitly by each call. A
set of routines (the standard POSIX termios
functions) exist to manipulate the control structures for each
device.

You cannot use the various privileged instructions that might
otherwise harm the machine integrity. Controlling I/O devices is
done by devices in the kernel, with access through file abstraction
(the “special” files in /dev/), rather than directly. It is
possible (but dangerous) to use ioperm() to
allow access to devices in a process running as root if absolutely
necessary. Any code that does this will be non-portable, and thus
this should be used for special purposes only.
mmap() can be used for equivalent access to the
device memory window on a PC (640KB-1MB), but this is equally as
bad an idea for normal use. In particular, you should never, ever,
attempt to do screen output this way.

Executing Other Programs

The Linux environment is built on the basis of small,
effective, programs working together. Therefore, there are a wide
variety of program execution facilities. They differ from the DOS
environment in very specific ways. There are no equivalents to the
various “swap out existing program and spawn another task”
facilities found under DOS. The Linux virtual memory system will
automatically decide by itself what to swap out and when. Secondly,
the basic constructs of Linux process execution have no equivalents
in DOS, even though there are library routines which do.

The simplest way to run another process is via the
system() function, which calls a shell and feeds
it a command string for interpretation and execution. All of the
normal shell parsing and redirection is performed. This means that
you should be careful of the arguments you pass if you don't want
the shell to misinterpret any special characters.

This is a simple example of a program that shows who is
logged on, and then who is logged on to the remote machine called
“thrudd”.

The Linux system makes heavy use of pipes—a way of feeding
the output of one command into another. This is not just a shell
facility. Any program can read or write from another using the
popen() and pclose() calls.
These work the same way as fopen() and
fclose() do, save that
popen() is passed a program as its argument.
Because pclose() handles the termination of the
process created by popen(), it is important to
use the right close routine.

This brings us conveniently back to printing for our next
example. Here is a set of subroutines for printing a file:

Unlike DOS, Linux has no support for overlays in the linker,
nor for loading overlays in the C libraries. The memory management
and virtual memory system will swap unused program segments out of
memory without assistance from the program being swapped and will
automatically bring them back in when they are needed. Using
overlays would not speed up program startup time, either. Program
code is read from disk whenever the page of code in question is
needed and not found resident in memory. The new ELF library format
does support dynamic linking, and you could conceivably implement
overlays using it. This is, however, pointless.

The core routines Linux provides for process execution are
execve(), fork(), and
wait(). The execve() call
replaces the running program image with another one. The original
is destroyed totally. The fork() call creates a
new copy of the existing process. The only difference between the
copies is the value returned by fork() (the
process id of the new process in the parent, 0 in the child).
Finally, the wait() call lets you wait for a
process to complete. This level of control is unlikely to be needed
when porting DOS programs, so they are not covered here.

Multitasking Politely

In general, the kernel automatically blocks programs, thus
avoids having them use CPU time when they are waiting for I/O. A
device can be opened with the extra option
O_NDELAY to indicate that an error
EWOULDBLOCK should be returned to indicate the
lack of ready data (or for writing lack of buffer space). When a
program is using I/O in this manner, it must take great care not to
sit in a tight loop.

Linux instead provides the very useful
select() system call, which allows you to wait
for multiple I/O events, a timeout, or both, in a manner that
avoids polling and enables the kernel to avoid allocating processor
resources to the task in question.

select() allows you to wait for a given
time, or wait until one of a set of files has data ready to read or
space to write, or wait until an exceptional condition occurs on
that file. As the Linux system sees everything within reason as a
file, this is extremely flexible.

Listing 1 shows is a
“trivial” example. This is an implementation of
kbhit(). For exact DOS behaviour, it assumes the
terminal is already in raw mode,
which we will discuss later. Otherwise, it will return 1 after
ENTER is pressed, which is when data becomes
available in the line-by-line
cooked mode.

Sharp-eyed DOS programmers might wonder, “What happens with
this kbhit() if we redirect the input of the
program from a file? It's not a keyboard, but input is available.”
The answer is simple enough—a disk file is always ready for
reading, and at this level, there is no difference between reading
a keyboard and reading a file. The program carries on and works
fine. Indeed, you could redirect a program to run reading input
from the mouse and select() would still behave
consistently.

Files and Devices

File I/O under Linux is somewhat simpler than DOS. DOS
emulates the Unix low level (open(),
close(), read(), and
write()) and high level “stdio” facilities,
but DOS C libraries have their own ascii/binary awareness to handle
the carriage return/line feed differences. Under Linux these are
gone and there is no need to worry about specifying these (although
the ascii/binary specification will be accepted). All of the DOS
device names are different under Linux. Linux systems keep their
devices in /dev. Here is a rough conversion chart:

Note that it is normal to print by queueing jobs via the
printing service (lpr) rather than
writing directly to ports. On typical Linux systems the /dev/lp*
files are protected so that a normal user cannot access them
directly.

Terminal Input

Terminal I/O is distinctly different in Linux than it is
under DOS. First, the POSIX terminal system is more modal than DOS.
To switch from one-character-at-a-time mode
(getch() in DOS) to a line-based editing mode
requires an actual termios request, which gives the new terminal
parameters to use. In addition, a program is responsible for
restoring the terminal state before it runs other programs and when
it exits. If you forget to do this, you may well need to switch
screens and kill the process, or you may find that the shell gets
confused by your terminal state and logs you out (which also fixes
the problem).

On output, Unix programs traditionally avoid using direct
cursor control codes and cannot write directly to video memory. The
reasons for this are obvious. The terminal in question may be a
different type of machine, in a different part of the world.
Handling all the different terminal types by hand is unpleasant, so
a library called curses is
available. A more modern library called
ncurses, which has such things as
colour support, is also available for Linux. Older versions of this
have had many bugs, but the latest appears very good indeed. See
article “ncurses: Portable Screen Handling for Linux”,
Linux Journal issue 17, September, 1995, for
an introduction.

ncurses provides you with
simple output control, colour (if the terminal supports it),
function keys, and other manipulations in a terminal-independent
manner. In addition, it optimises the updates it does to minimize
traffic over slow networks or serial links. It is free and comes
with a nice set of examples and good documentation. As it is an
implementation of System V curses, you can pick up a book on curses
from a library and use that as a reference or tutorial (as
appropriate).

Should you decide to use ncurses to do your output, it will
also provide all the routines necessary to do DOS style
character-by-character input via the functions
cbreak(), nocbreak(),
echo(), and noecho(). The
ncurses documentation explains all four.

Standard APIs for Mice

Until recently there was no standard control API for mice in
text mode (in graphics mode X-Windows runs the mouse and provides
all the facilities you would imagine a GUI to have). The normal
mouse behaviour is to provide text mode cut and paste. This is
managed by a program known as
selection and more recently by
gpm.

The gpm library allows your text mode application to handle
mouse events both on the console and under X-Windows in xterm shell
windows. Writing programs that use gpm to control the mouse is
explained in article “Writing a Mouse-Sensitive Application”
issue 17, September, 1995.

Terminate and Stay Resident Programs

Occasionally you will have to port a DOS
terminate-and-stay-resident (TSR) program. If you have been
accustomed to using undocumented DOS calls and switching stacks and
other horrible assembly language goings-on you will be glad to know
you can forget the experience.

First, the whole concept of terminate and stay resident is
gone. When a program exits all its resources are freed, and the
process ceases to exist. That does not mean the same facilities are
not present; they are present in different ways that are more
appropriate to a system which is already multitasking.

There are three main reasons for TSR programs under
DOS.

To provide a library of subroutines for supporting
some extended facility. Several loadable graphics libraries have
used this facility. Under Linux you can create a new shared library
and it will be available for linking with applications and sharing
between multiple users.

To add a device driver. Device drivers are kernel
code. Porting a DOS device driver will almost certainly be a major
rewrite. Linux also has loadable device drivers via the
modules support. Porting a DOS
device driver is definitely beyond the scope of this article. In
some cases the driver may be adding a high level facility that can
be provided as a library or as an actual program left running all
the time.

To create pop-up “hot key” based
mini-applications like phone books. There is no reason for these
under Linux. You have multiple console screens, the ability to have
multiple screens on even a fairly dumb terminal with the
iscreen program, and can run any application at
any time. Thus, there is no need for mini-applications carefully
patched into the kernel. You can just port it as a normal
program.

For the second example, some TSR programs can be ported as if
they were applications that provided services. The
gpm mouse management is a fine
example of this. It provides the core equivalents of the DOS mouse
services interrupt facilities as an application program that runs
in the background and a library of support routines which interface
with the server.

Porting Graphical Applications

Graphical programs are much more complex to port, because the
graphics hardware interface is not available. You can approach this
two ways. First, svgalib provides
the basic functions you need to port your application. Note that
you cannot use the BIOS functions, because they are only available
in 16bit mode. An svgalib application can be very fast (see
linuxsdoom for a superb example),
but cannot run remotely and is not easily ported beyond PC-based
systems.

The second approach is to use X-Windows. This makes for a
much harder port, as you will need to move to an event-based
paradigm (akin to programming for MS Windows) and rewrite your
interface dialog and menus into X widgets. Furthermore X-Windows
programming is—initially at least—quite hard to get the hang
of. The result, however, is a graphical program that is portable
and can run remotely. X-Windows is generally slower than raw SVGA,
as one might expect. There is, however, an extension (called
Xshm) that Linux and most Unix
systems include, which supports fast bitmap updating as occurs
commonly in games.

In the “cheat box” there is also Tcl/Tk, a front end
language for writing easy X-Windows interfaces. Its applicability
to a given program is very hard to summarise. However, applications
that are basically modal can generally make best use of Tcl/Tk.
Using Tcl/Tk to write front ends for programs is covered in “Using
Tcl and Tk from Your C Programs”, Linux Journal issue 10, February, 1995.

Alan Cox has been working on Linux since version
0.95, when he installed it to do further work on the AberMUD game.
He now manages the Linux Networking, SMP, and Linux/8086 projects
and hasn't done any work on AberMUD since November 1993. In real
life he hacks ISDN routers for I2IT.

Resources

A commercial clone of Borland's BGI library is also available
for Linux. If your program uses BGI graphics, this may be an
attractive option. A shareware version (US$15 registration) should
be available at sunsite.unc.edu in the file
/pub/Linux/apps/graphics/bgi_library.tar.gz by the time you read
this.

The sunsite.unc.edu ftp site also carries a wide variety of
database tools, from simple libraries for handling reading/writing
PC-style XBase files to SQL systems.

A Commercial package called FlagShip for porting clipper
database programs directly to Linux is available. A demo version is
available from ftp://ftp.wgs.com/pub2/wgs/Filelist