Christoph Reichenbach and Lars Skovlund on FreeSCI

O'Reilly Network: What are the unique programming
challenges in reverse engineering a game engine that you've
experienced?

Skovlund: The fact that we don't have any exact
specifications makes it difficult to determine how closely you need to
follow the original code or even the purpose of a particular piece of
code. Often you have to see an engine feature being used by a game before
you can start making assumptions about it. You often have to make strict
assumptions about a particular feature, which can be relaxed later, but
only after your game engine is finished enough to run a game that uses
it.

The little things that change between versions are another problem. I
have a collection of 15 game interpreters just for the supported range of
games. Often the changes between each version are just bug fixes, but
there are sometimes small differences in the way things behave which don't
become clear until someone notices a graphical glitch (or whatever) while
playing a game.

Reichenbach: While one of the core motivations of such
a project is, of course, to solve a problem created by the lack of the
free availability of the original system's source code, it is easy to
forget about portability and just focus on getting the reimplementation up
and running on your personal platform of choice. The first versions of
FreeSCI were specifically tailored toward Unix-like platforms running on
32-bit machines; only later was the library and run-time environment
required by the interpreter changed to allow reasonably easy
implementations on other platforms. However, as recent porting efforts
have shown, there remains much still to be done for us in this region.

Related Article:

FreeSCI: Rebuilding Sierra's Classic Quests -- Few publishers were as important to adventure games as the venerable Sierra On-Line. Their King's Quest, Space Quest, and Leisure Suit Larry series paved the way for other fine installments. Though Sierra has moved on, their games live on through the FreeSCI project. Howard Wen explores how FreeSCI lets you play your favorite old games -- and, just maybe, create new ones.

Another problem is estimating the amount of features needed. Unlike
reverse-engineered programs based on decompiled versions of the original
code, you do not, in general, know exactly where you're going; designing
ahead is a mixture of guessing the most likely features required in a
certain piece of code and cleaning up parts whose functionality are
believed to be well-understood.

As an example, consider upcalls in SCI. Some SCI kernel functions,
which serve as the SCI function library, providing file I/O, graphics
primitives, access to the sound server, etc., invoke bytecode functions
for certain functionality, similar to what kernels like MACH do in some
situations (with "bytecode" corresponding to "user space"
here). Initially, we didn't know this, so our execution stack did not
support plugging in calls to C code in between calls to bytecode;
fortunately, this turned out to be relatively straightforward to
implement.

O'Reilly Network: Any advice--legal or technical--for
those who are looking into reverse engineering a game engine?

Skovlund: Two words: stay
legal! It's very important when you reverse engineer a game that
the original author has no valid reason to complain. This usually means
that reverse engineering and implementation should be done by different
groups of people. I hardly wrote a line of code in the first years for
this exact reason.

Another thing to watch out for is patents. Sierra was granted a few
patents on key technologies used in later SCI games. So far we haven't
needed to deal with them, but we are going to have to do that for SCI01
and later games. One of those can be worked around, while the others might
pose a problem because they describe very general concepts.

Reichenbach: We have one part of the team doing
investigations on their interpreter and documenting these and another part
reimplementing them. This way we avoid legal issues.

While it could be argued that it's a lot more work this way, it's also
much more entertaining. We have done a lot of things very differently from
the way Sierra implemented them (such as the graphics subsystem), which
not only serves to considerably weaken a potential case Sierra's lawyers
might try to make, but it also allowed us to add new features and checks
(which may be of interest for future SCI game developers).

On the technical front, first, my recommendation would be not to use
weakly typed and notoriously unportable languages like C and C++ for
reimplementing the engine. It is far too easy to write dangerous, slow,
and unportable code in these languages. Other languages that might come to
mind would be popular scripting languages like Perl or Python. However,
the amount of type-checking offered by these is far too small to make them
useful for any large programs. More expressive, well-defined languages
like Standard ML or Eiffel would, in general, be a much better choice.

If you still decide to use C or C++, keep the following in
mind:

You cannot portably serialize/de-serialize information such as saved
games straightforwardly.

Some sort of workaround (such as our code generator) will be needed
for this. That's a highly relevant issue for games supporting
saved games.

Avoid code duplication.

It seems much easier to do a port of a program by copying the source
code and adapting to a specific hardware platform. However, this soon
becomes a maintenance nightmare.

sizeof(int) != sizeof(void*). Assuming otherwise will break
support for the Alpha architecture.

Alignment

Some architectures cannot do immediate 16- or 32-byte reads from "odd"
addresses. This is a particular problem when dealing with old
bytecode. Try to read it in single-byte fashion by default. If it helps,
you can make platform-specific optimizations later.

Byte order

Remember that not everybody is little endian.

Abstract

Try not to buy into one particular graphics or sound library. FreeSCI
started out designed for the libggi graphics library, which, at that time,
appeared to be one of the libraries most likely to become generally
accepted and ported to a vast number of architectures. Today it's pretty
much dead.

By abstracting our graphics API, we have significantly simplified
porting to new architectures (and different visuals on the same
architectures). Thus, we don't depend on, say, SDL supporting a certain
platform in order to run on it. Of course, this creates some problems
with API-specific optimizations, but, at least for graphics drivers, these
tend to be sufficiently similar to allow them to be taken advantage of
generically.

Perform checks

Depending on what you're trying to reimplement, it's possible that the
computers you're targeting are more powerful than the platforms the code
was originally targeted at. Thus, you can usually do more checking on
whether what the game is trying to do is consistent with your perception
of what it should be allowed to do (this, of course, is particularly
relevant to flexible interpreters). Unless you know exactly the semantics
of the issues you're dealing with, you can't hope to provide an accurate
or, even, a better rendition of the original engine. Building fences and
watching the script code run against them (by triggering run-time
warnings, errors, or even failing some static analysis, if you're bold
enough to implement that) is usually the only way to figure these out.

Document

Unless the game you're trying to reimplement is fully documented
already, the single biggest mistake you can make is not to record what you
find out when examining the original code. Unless you happen to get your
reimplementation right the very first time (which, of course, doesn't
happen in practice), you'll have to examine the original code again when
you try to fix your bugs.

For interpreters, in particular, documenting also helps people develop
other, orthogonal tools. Also note that, when doing a clean-room
reimplementation, documentation arises almost naturally as an artifact of
the communication between the decoding and the reimplementation teams.