ThePortableData Base

The implementation of the portable data base in AutoCAD Release 9
finally completed the unification of the product across all
machine architectures. The development notes describing this
project are an example of the developer documentation that
accompanied code submissions in the period.

The Portable Data Base

AutoCAD databases are now portable between operating systems
and machine architectures. This allows efficient use of
networks containing both personal computers and 32 bit
workstations.

by John Walker
February 3rd, 1987

It was a dark and stormy night. The trees swayed in the wind,
and the rain beat upon and streamed in rivulets down the dark window
pane illuminated only by the cold light of a Luxo lamp, the
flickering of a Sun 3 monitor, and the feeble green glow of a
programmer debugging too long.

When the doorbell rang, I almost welcomed the interruption from the
task in which I was engaged: fourteen subroutines deep in DBXTOOL,
on the trail of a stack smasher which not only obliterated
AutoCAD, but wiped the information the debugger needed to find where
the error occurred. I glanced at the clock and noticed that it was
3:30. Since it was dark outside, it must be 3:30 in the morning.
Only a very few people show up at the door at 3:30 on a Sunday
morning.

Let's see: the stereo isn't on and no recent revelations have called
for celebratory reports from the carbide cannon or the .45, so it's
probably not the neighbors or the cops. That narrows the field
considerably. I fully expected to open the door to see Kelvin
Throop, as always slightly distracted, somewhat overweight, his face
looking like it had been slept in, but sparkling with anarchic and
subversive ideas.

With the usual irritation mingled with expectation, I opened the door
and discovered I was looking at the neck of my early-morning caller.
I looked up, and saw a face I had not seen for almost twenty years.
It was a face free of pain and fear and guilt. John Galt had come to
call in the middle of the night.

“Galt”, I said, “I haven't seen you since, when was it, 1967?
That's right, December 1967 it was. We were walking down the railroad
tracks in Cleveland; the snow was a foot deep on the ground, the sky
was grey and the only warmth was the switchbox heaters at every set of
points. Yes, it all comes back now. I remember you saying it was all
over and you were going to drop out, and me saying things were just
about to turn around. And I remember turning around and walking back
to study for the physics exam and seeing you disappear into the snowy
distance. Hey, come on in, have a Pepsi, tell me what you've been up to.”

Galt walked in the door, put down his paper bag and, as always, strode
to the refrigerator and opened the door. He poured a tall Pepsi and
made a peanut butter, turkey, swiss cheese, and onion sandwich,
polished both off, and then turned to me and spoke.

“As usual, you've got it all wrong. It wasn't December 1967, it was
November—November 8. The first Saturn V launch was
scheduled for the next morning, and you were bubbling over about how
the final triumph of technology would turn around a disintegrating
society. I said I'd had it with this decadent, exploitive culture,
and I was no longer going to allow my mind to be enslaved by the
looters. I tried to convince you to join me. But your time had not
yet come. So I moved on to convince others, and to work on my
speech.”

“Hey, I remember that speech. How's it come since that draft I read
back in '67.”

“Pretty well. I'm up to 560 pages now, and there's no filler in
there. I'm adding a refutation of the epistemology of Kant cast in
terms of Maxwell's equations, and that will probably stretch it a
tad.”

“Don't you think that's a bit long?”

“Well, with the attention span of this society down to less than 30
seconds, some of the induction steps may get lost in the shuffle,
but it's full of great sound bites and should play on the news for
days.”

“When 'ya gonna cut loose with it?”

“When the collapse of this decadent society due to its disdain for
the products of the mind, and the consequent disappearance and exodus
of the creators becomes self-evident.”

“Hey, Galt, lighten up! When I last saw you the cities were in
flames, the US was losing a hopeless war, the stock market had just
crashed, the gold standard was being abandoned, three astronauts had
died in a fire, the SST was facing cancellation, and the ABM was being
negotiated away. Look at what you've walked out on! We have peace
and prosperity, business is booming, and basic science and technology
have flowered in directions unimaginable by the world in which we last
spoke.”

Galt walked into the computer room. He looked at the PC/AT linking
AutoCAD. He looked at the Sun monitor, which was showing a full
compilation of AutoCAD in one window, a completed execution of the
regression test in another, and the debugger in a third. He walked
over to my bookcase and pulled out my copy of the Dow Jones Averages
chartbook from 1885 to the present. Moving in that eerie way he
always did, in one motion he pulled the book from the shelf, opened
it, and spread it in exactly the open space between the keyboards of
the Sun and the IBM. For a full ten minutes Galt was silent as he
turned the pages from 1968 through 1986. It appeared to me that the
man had been out of circulation for a long time. I watched his face
carefully to see if it registered surprise as he hit 1985 and 1986,
but as ever those stony features remained unmoved. Galt closed the
book, replaced it on the shelf, sat down on the chair in front of the
AT, and turned to me. “Just wait,” he said.

“So, enough about me”, Galt continued, “what are you doing?”

“Well”, I said, “where to begin? In '68 I…”

“Oh come off it,” Galt interrupted, “I have my sources,
after all. I mean what are you working on now.”

Sheepishly, I continued.

Background

When we ported AutoCAD to non-MS-DOS systems, we were faced with numerous
compatibility issues. Although all systems use the ASCII code,
compatibility stops about there. Various systems have adopted
different conventions for end of line and end of file detection; they
store multiple byte binary values in different orders in memory,
require different physical alignment of values on byte boundaries, and
even use different floating point formats.

These issues make it very difficult for systems to interchange binary
files. The only reasonable approach is to define a portable
format, hopefully close to the middle point between the systems, then
require every system to convert that format to and from its own
computational requirements.

Our existing (2.5 and 2.6) AutoCAD releases do not allow interchanging
binary files among major machine types (current major machine
types are MS-DOS, Apollo, IBM RT PC, Sun, and Vax). To move data
between systems, one must convert it to ASCII form, possibly translate
the ASCII file due to end of line conventions, then load the file onto
the other system and convert it back to binary. For drawing databases,
this means one must DXFOUT on the sending system and DXFIN
on the receiving system.

Given the difficulties in physically moving files between systems, the
small market initially anticipated for non-MS-DOS AutoCADs, and the
major work needed to make binary files portable, we chose not to
address this problem previously. Sales to date of non-MS-DOS
machines indicate that this decision was correct.

The advent of high speed networks and file sharing protocols such as
Apollo's Domain, DEC's Decnet/Vaxmate, and Sun's NFS have begun to
erode the justification for this decision. Many AutoCAD users,
particularly in larger companies, have inquired about configurations
involving a file server, one or more 32 bit workstations, and a number
of MS-DOS machines, all on a common network. Such a configuration
economically provides large central storage, high performance when
needed, and very low cost individual workstations for routine work.
The usefulness of such an installation is drastically reduced if every
transfer of a drawing from a PC to a 32 bit workstation requires a
DXFOUT and DXFIN, as these are lengthy operations which
consume a large amount of disc space and network bandwidth. As we
increase our sales efforts in large accounts, a competent solution to
the issues raised by heterogeneous networks will be a major point of
distinction which can distance us from the competition.

The first step toward a compatible database was taken when Bob Elman
redesigned the entity database code in release 2.5. Galt broke in,
“The Bob Elman”. “Yes”, I responded,
and showed him the listing of EREAD.C. He shook his head and
said, “That's Bob”. Bob's code resolved all issues of byte
ordering and alignment in the entity data portion of the database, and
did it in a particularly efficient way that takes advantage of the
properties of the host machine's architecture. Entities are written
with no pad bytes and Intel byte ordering. Thus MS-DOS machines, the
overwhelming segment of our market, pay no speed or space penalty.
Bob's code did not address machines with non-IEEE floating point (the
VAX is the only exemplar of this class).

Providing drawing database compatibility between machines, then, is
primarily an issue of fixing the drawing header record (MASTREC), the symbol tables (SMIO), and the headers on the
entities themselves, plus resolving the issue of differing floating
point formats. In addition, the other binary files that AutoCAD uses,
such as DXB files and compiled font and shape definitions
should be made compatible. The work described herein defines
canonical forms for these files, implements a general package for
system-independent binary I/O, and uses it to make AutoCAD drawing
databases and the other aforementioned binary files interchangeable.
The code has currently been installed and tested on MS-DOS and Sun
systems, which may now share files in an NFS environment. The work
needed to port it to the Apollo and RT PC should be minor. A VAX
version will require certification of the code to interconvert VAX and
IEEE floating point formats.

Galt interrupted, “So what you're saying is that before, if you
hooked big ones and little ones together on a wire, it was a pain in
the neck, and now you've fixed it so it isn't”.

For a longwinded pedant, the man does have a talent for coming to the
point.

The Binary I/O Package

To read and write portable binary files, include the file
BINIO.H in your compilation. You must include
SYSTEM.H before BINIO.H. BINIO.H declares
numerous functions, which are used to read and write binary data items
on various systems. Each of these functions is of the form:

b_{r|w}type(fp, pointer[, args…]);

where type is the mnemonic for the internal type being written,
fp is the file pointer, pointer is the pointer to the datum
being read or written (must be an lvalue), and args are optional
arguments required by some types. For example, when writing a
character array an argument supplies its length.

Thus, to write a real (double precision floating point) number
val to a file descriptor named ofile, use:

stat = b_wreal(ofile, &val);

Each of these routines returns the same status FREAD or
FWRITE would: 1 for single item reads and writes, and the
number of items transferred for array types. Currently defined type
codes are as follows:

char

Characters. Signed convention is undefined.
Canonical form in the file is a single 8 bit byte.

uchar

Unsigned characters. Used for utility 8 bit data.
Canonical form in the file is a single 8 bit byte.

short

Signed 16 bit integers. Canonical form in the file
is two's complement, least significant byte first, most significant
byte last, two total bytes.

long

Signed 32 bit integers. Canonical form in the file
is 4 bytes, starting with the least significant byte and ending with
the most significant byte. Two's complement.

real

Double precision floating point numbers. 8 bytes in
a file. Canonical form in the file is an 8 byte IEEE double precision
number, stored with the least significant byte first and the most
significant byte last.

string

An array of char items. The third argument
specifies the number of characters to be read or written. Canonical
form in the file is one byte per item, written in ascending order as
they would be addressed by a subscript.

If the binary I/O package is to do its job, you must be honest with
it: only pass the functions pointers of exactly the type they are
intended to process. If you use b_wstring to write a
structure, you're going to generate files just as incompatible as if
you used fwrite. And you must never, never
use an INT as an argument to one of these routines.

When using the binary I/O package, you must explicitly read and write
every datum: there is no way to read composite data types with one
I/O. Bob Elman's code in EREAD solves this problem by packing
data into a buffer, then writing it with one call. Since this
handles the entity data, which is by far the largest volume of data
that AutoCAD reads and writes, I felt that taking a simpler approach in the
binary I/O package would have no measurable impact on performance. I
felt that the complexity of the mechanism in EREAD was not
required for handling the other files.

On a system such as MS-DOS, whose native internal data representation
agrees with the canonical format of the database file, the various
read and write functions are simply #defines to the
equivalent calls on FREAD or FWRITE. The variable
TRANSFIO in SYSTEM.H controls this. If it is not
defined, all of the binary I/O routines generate in-line calls on
FREAD and FWRITE. If TRANSFIO is defined,
machine specific definitions in BINIO.H are used to define
the I/O routines. Compatible types such as char may still
generate direct I/O calls, but incompatible types should be defined as
external int-returning functions.

If a machine uses a non-IEEE floating point format, the b_rreal
and b_wreal functions must convert the IEEE format in the file
to and from the machine's internal representation. In addition,
because the entity data I/O code in EREAD.C does not use the
Binary I/O package, you must tell it to perform the conversion.
You do this by adding the statement:

#define REALTRAN

in the SYSTEM.H entry for the machine. This will generate code
within EREAD.C which calls two new functions your binary I/O
driver must supply. Whenever a real number is being written to a
file, EREAD will call:

realenc(bufptr, rvalue);

where bufptr is a “char *” pointing to
an 8 byte buffer in which the canonical IEEE value should be stored
(remember, lsb first), and rvalue is the real number value to
be stored, passed in the machine's internal type for double.
When a number is being read, a call is made to:

rvalue = realdec(bufptr);

in which bufptr points to an 8 byte area containing the IEEE
number. Realdec must return the corresponding internal value as
a double.

Each machine architecture must define a binary I/O driver providing
the non-defaulted I/O routines, and if real number conversion is
required, realenc and realdec. Examine the driver for
the Motorola 68000 family (BIO68K.C) for an example of such a
driver.

Modifying AutoCAD

Utilising the binary I/O package within AutoCAD to implement portable
databases involved modifications in several areas. The changes are
large, numerous, widespread, and significant, despite their limited
impact on what gets written into the file. Installing them and
debugging database compatibility was not a difficult design task; it
was simply a matter of hacking, slashing, slogging, and bashing until
every place where a nonportable assumption was made was found, and
then fixing them all. “That's what you were always best
at,” Galt interjected. I said that I hoped so, for I know of no
single project I've done within AutoCAD which is so likely to
destabilise the product as this one. The following paragraphs cover
the highlights of each section.

The Drawing Database

Making drawing databases compatible consisted of several subprojects. The
result of all of this is that an AutoCAD with the new code installed
can read existing drawing databases written by the machine
on which it is executing, old MS-DOS databases, and new
portable databases. It writes new portable databases, which can
be read by any AutoCAD with this code installed.

The ability to read both formats of databases is implemented via the
flag rstructs. When a drawing database header is read by
MVALID, if it is an old, nonportable database,
rstructs is set to TRUE and the file pointer used to
read the file is saved. Subsequent reads from that file will use the
old code to read aggregate data. At the end of every database reading
operation, such as INSERT or PLOT, rstructs
is cleared.

The drawing header.

The drawing header is managed by code in MASTREC.C. The
header is defined, for I/O purposes, by a table called MTAB.
This table previously contained pointers and lengths for all the items
in the header, and each was written or read with an individual call on
FREAD or FWRITE. Compatibility problems were created by
the fact that the header contained several kinds of composite objects:
symbol table descriptors, transformation matrices, the “header
header”, a view direction array, Julian dates, and calendar dates.
I modified the table to contain an item type and implemented a switch
to read and write each item with the correct calls on the Binary I/O
package. Special code had to be added for each composite type to read
and write it; just adding entries to the table for the components of
the composite types falls afoul of the mechanism that allows addition
of new fields to the header. I tried it; it doesn't work. The
symbol table descriptors have a several unique problems: first, their
definition contains a “FILE *” item. The length of this item
varies depending on the system's pointer length, so the structure
changes based on this length. On MS-DOS systems, data in the
structure totals 37 bytes, and different compilers pad this
structure differently. The file pointer field means nothing in a drawing
database, but it is present in all existing databases and it varies in
length. But if you think that it never uses a pointer read from a
file, you haven't looked at the code in WBLOCK.C that saves and
restores the header around its diddling with it. Look and see the
horror I had to install to fix that one.

The symbol tables.

The symbol tables, managed by SMIO.C, were an utter catastrophe
from the standpoint of portability. The problems encountered in MASTREC with their headers was only a faint shadow of the beast
lurking within SMIO. To refresh your memory, each symbol table
has a descriptor which is usually in the drawing header (another
symbol table is used for active font and shape files, but it is not
saved with the drawing and does not enter this discussion). The
descriptor for the symbol table contains its length, the number of
items in the table, the file descriptor used to read and write it, and
the address within the file where it starts. There is no type
field in a symbol table. Symbol tables are read and written by the
routines GETSM and PUTSM, which are passed the descriptor.
Each symbol table entry consists of a structure containing several
fields of various types.

Previously, GETSM and PUTSM did not care about the content
of the symbol table record; they just read and wrote the structure as
one monolithic block. That, of course, won't work if you want the
tables to be portable: each field has to be handled separately with
the Binary I/O package. So in order to do this, GETSM and PUTSM must know the type of table they are processing.

“So,” said Galt, “add a type field to the table.”

“Heh, heh, heh,” I said, walking over to the Sun and bringing up all
the references to the block symbol table descriptor in CSCOPE.
There are few data structures within any program that are chopped,
diced, sliced, shuffled, and maimed as much as an AutoCAD symbol table
descriptor. Most (but not all) live within the drawing header. They
can point to their own file or be part of a monolithic database.
They contain that ghastly variable length file pointer which gets
written in the drawing header. They get copied, created dynamically
in allocated buffers, and in WBLOCK, saved to a file, modified
to refer to another file, then read back in. And that “length”
field I mentioned, sm_eln. Well, it may include a trailing
pad byte on the disc depending on which compiler and options made your
MS-DOS database. And it gets used both to seek into the file and to
dynamically allocate symbol table descriptors except in the places
where it uses sizeof(struct whatever) instead. One
week into this project, I had the feeling that I had not stuck my
head into the lion's mouth—I had climbed into the
lion's stomach.

The most severe fundamental problem was that I had to both decouple
the symbol table descriptor on disc from the one in memory, and also
introduce separate lengths for the symbol table as stored on disc
(used to seek to records) and in memory (used to allocate buffers,
copy tables, and so on). I ended up adding two fields to the symbol
table item in memory, sm_typeid and sm_dlen, which
specify the type of the symbol table (mnemonics are defined in SMIO.H) and its length as stored on the disc. When a symbol table is
in memory, sm_eln specifies the length of the structure
in memory. When a symbol table is written out, the two
new fields are not written: instead the disc length is written into
the sm_eln field and the type is expressed implicitly in the
symbol table's position in the drawing header.

By the way, the lack of a type code in symbol tables has been felt
before: there is some marvelous to behold code in WBLOCK.C that
figures out which symbol table it is working on by testing the pointer
against the descriptor address. I did not fix these to use my new type
codes. Somebody should some day. Once the type codes and disc
lengths were present, the changes to process the symbol tables
separately were straightforward to install in SMIO.C.

Because the code to process the symbol tables field by field is
substantially larger and also somewhat slower than reading a
single structure, I set up conditional compilation to use the old code
on MS-DOS. Since MS-DOS already writes the tables in canonical form
and has the most severe memory constraints, there's no reason it
should have to pay the price of compatibility code it doesn't need.
Note that if you remove the #ifdefs on MSDOS from the file, it
will still work fine: it will just be bigger and slower.

The entity headers.

There is a fixed set of fields which precedes every entity in the
drawing database to specify its type, flags, length of the packed
data which follows, and a pointer. When Bob made the entity data
compatible, he could not use his scatter/gather mechanism for these
fields because they control the scatter/gather process. I
modified EREAD.C to use the Binary I/O package for these fields.
In addition, if REALTRAN has been defined on this system, the
gathreal and scatreal functions call realenc and
realdec routines to translate floating point formats.
If REALTRAN is not defined, no additional code is compiled or
executed, so IEEE-compatible systems pay no price for the possibility
of floating point format conversion. The floating point
conversion mechanism has never been tested.

Shape and text font files

Compiled text font and shape files were made compatible by using the
Binary I/O package within SHCOMP.C when compiling a shape file
and in SHLOAD.C when loading it. The shape files written by
the modified code are identical to those generated by an MS-DOS AutoCAD
but are incompatible with other systems. All .SHX files on
non-MS-DOS systems must be recompiled when converting to this release
of AutoCAD. Attempting to load an old format file results in an I/O error
message. It was my judgement that considering the tiny installed base
of non-MS-DOS systems, it just wasn't worth putting in some form of
level indicator and generating a special message. This code has
never been tested with “big fonts” (e.g., Kanji).

DXB files

Type codes greater than 127 did not work due to some code
incorrectly copied from SLIDE.C.

An fread was done into an int, resulting in
failure on any machine whose ints are not 16 bits.

The AutoCAD manual documented .DXB files as being
in Intel byte order, but the code did not perform the
required reversals.

I modified all I/O within DXBIN.C to use the Binary I/O package,
and corrected these problems. All systems now read DXB files
which are compatible with existing MS-DOS files. Since the existing
code in non-MS-DOS systems could never have worked, compatibility with
existing non-MS-DOS DXB files is not a consideration
since none exist.

Slide files

I corrected a problem in my earlier submission of code to make slide
files portable which was found by the regression test. A null slide
file created by MS-DOS (or the new portable code) would get an I/O
error if you attempted to view it on a Sun. SLIDE.C was reading
the in-memory length of the slide file header when it validated the
header. I changed it to read the portable length in the file.

Compatibility status summary

The following is a summary of AutoCAD file portability as of the
integration of this code.

Drawing files

Fixed to be compatible. All systems read both their
own old-format files and the new portable files. All systems
emit portable files.

ASCII files

Fixed to be compatible. Note that this causes the
following file types to become compatible: HLP,
HDX, SHP, DXF, DXX, MNU,
PAT, LIN, PGP, MSG, LSP.

ACADVS

The virtual string file is compatible by design.

Filmrolls

Compatible by design.

Slides

Fixed to be compatible. Systems can read their own old files
and portable files. All write portable files.

Slide libraries

Compatible by design.

SHX files

Fixed to be compatible. Old MS-DOS files are
portable. Old non-MS-DOS files must be recompiled.

IGES files

Compatible by design.

DXB files

Fixed to be compatible. Previously worked only on
MS-DOS. Old MS-DOS files work without modification.

[MNX files]

Incompatible. A system must use menus compiled by
its own AutoCAD.

Upper and lower case

I have done nothing in this project to resolve the issue of case
conventions for file names. I consider this issue so controversial
and politically charged that I'm not yet ready to step into it. I
hereby submit my recommendations for comment. Each system will define
a tag in SYSTEM.H called CASECONV. It shall be set to one
of four values:

CCMONOU System is monocase and uses upper.
CCMONOL System is monocase and uses lower.
CCULU System uses both cases and prefers upper.
CCULL System uses both cases and prefers lower.

When a system writes a drawing database, it stores its CASECONV
setting in the drawing header. This is referred to as the “case
convention of the sending system”. When a system reads a drawing, if
it was created on a system with a different case convention, it
processes file names in symbol table entries based upon a matrix of the
sending system's case convention and its own. If the receiving system
is monocase, file names in symbol tables are not translated, but FFOPEN and its clones translate all file names to the receiving system's
case convention before submitting them to the system. If the
receiving system uses both cases and the sending system was monocase,
names in symbol tables are translated at read-in time to the
preferred case of the receiving system. The names are then used as
modified, without further modification by FFOPEN. This is
asymmetrical and impossible to justify except by convincing yourself
that this is the best approximation to what's best for the user.

My throat was feeling a little dry after such a lengthy dissertation.
I got up to refill my glass. When I walked back to my chair, Galt was
flipping through the listing of SMIO.C next to the Sun. He
turned to me and said, “Why do you do this? Here you are in the middle
of the night struggling trying to trick this megalith of
software into threading its way around incompatibilities between
computers that aren't even of your making.”

I replied, “Differences in products are a consequence of their
rapid evolution in a free market. Incompatibility is the price of
progress”. John Galt was speechless for at least 12 seconds.

He rose and said, “Join us. You weren't ready in 1967. Now, in 1987
you should see that you're struggling to make money in a world where
the money you make is taxed away and handed to defence contractors
like Lockheed and McDonnell-Douglas, who turn around and compete
against you with products your taxes paid to develop. While so many
others are sleeping, you labour to produce intellectual property, then
you listen to others lecture you on their “right” to steal it.
Can't you feel the circle closing? Can't you see that this can't go
on? Why not hasten the inevitable and pave the way for a brighter
day? You should drop out, or work to hasten the collapse.”

I looked at the DIFFs of my portable database code. I said,
“After this project, I can't help but feel that hastening the
collapse would be an exercise in supererogation.”

Galt shrugged. He sat back down and said, “Your time hasn't yet
come. I try to talk to people when they'll see the issues most
clearly. I try to find the times when they see what they're doing and
begin to wonder why. I'll be back. It may be in two days, two years,
or maybe twenty years.”

We talked for an hour or so about old times, common friends, and
shared interests. He left as the sun was rising.