Asked how she had come to choose GHC as the topic for her
award-nominated PhD dissertation, freshly graduated doctor of
software archeology Simone Tolduso revealed:
"At first, there were a few small curiosities that triggered my
interest, like why were darcs patches sent to the cvs-ghc
mailinglist, or why did GHC releases traditionally bundle
the predecessor of the current Cabal version when the missing
libraries depended on its successor?
But then I looked into the repository, with its layers on layers of
build systems, source formats, deprecation warnings, directory
structure fragments, todo logs, broken builds resulting either from
OS-tools advancing and playing havoc with the built-in assumptions
of fragile build configurations or from multiple, partially
completed, mutually incompatible heart-liver-and-lung transplants
supporting the newest language extensions (which of course were all
needed to build the compiler branch supporting said features, and
whose documentation tended to be spread over user manual, API
comments, mailing list threads, research papers, plus half a dozen
different Wikis and ticket trackers), supported by often outdated
documentation in a never-ending variety of formats, and I knew I had
stumbled onto a goldmine.
Not to mention remains of earlier projects (what were fptools, or
libraries?), a variety of test and compilation languages (including
Haskell, C, Perl, Python, alongside the usual scripting suspects),
or the proliferation of sediment layers into user space by the
simple, but ingeneous, means of binary incompatibility. In spite of
its comparatively small size, the project was beginning to rival the
complexities of other Microsoft products of the same period.
In what seems to have been an attempt to push open source ideas to
their logical conclusion, you actually had to guess at the right
combination of versions for a number of independently evolving
toolchains, libraries, OSes, and use those to bootstrap from a
consistent snapshot of the compiler, library, and sometimes even
tool sources, or nothing works - a situation which was later
increasingly exacerbated by the dispersion of the Haskell Cabal
replacing coordinated releases. Preliminary mining of the relevant
mailinglist and bug tracker archives suggests that binary releases
were mainly public data points used to indicate intermediate states
of GHC _not_ suitable for specific applications (apart from the
obligatory Cabal pre-version lacking the new features needed for
installing the extra libraries, other examples include versions of
Data.ByteString _not_ based on the famous paper, _not_ supporting
essential optimisations, or _not_ supporting API safety fixes). So
there seemed to be no way to avoid direct access to the source
repositories with their associated build processes and toolchains.
And let us not forget that, unlike the programmers at the time, we
are in the fortunate situation of already having complete
repositories for the pieces and dependencies involved. Finding
matching versions is a non-trivial, but essentially combinatorial
exercise, while for them, the process of building GHC would often
have involved developing and submitting the patches that make up our
repositories of all the pieces of software GHC builds depended on.
We still haven't found the key that enabled the ancients to navigate
this labyrinth and to keep their toolchains up to date while still
making any progress in their daily work, not to mention recording
such progress via darcs (in itself written in Haskell, and not free
of troubles). Agent-based simulations of developer communities at
arbitrary slices through the repositories show the majority of
agents getting stuck in a recursive cycle of installing, debugging,
and updating dependency chains without ever reaching a productive
state, so we do know that we are missing some crucial information.
Several of my correspondents have come to favour the somewhat
controversial theory that the general programmer in those days
must have been substantially more intelligent than people are
today. And it does make sense, in a way - I mean, if anyone had
been the slightest bit bothered by all this complexity, surely
someone would have tried to simplify things?
Of course, my work has not all been happy progress: for instance,
while there really was an 'evil mangler', the equally persistent
rumour that GHC was named after some scottish town has turned out
to be a wild goose chase (cf Appendix GC); my colleagues in dirt
archeology assure me there was no town called 'glorious'. The
'real' archeologists, as they call themselves, had a field day
laughing about my gullability there. Still, there are so many
burried treasures in this area - just waiting to be investigated."
Dr Tolduso is currently working on a follow-on project, "Haskell
by committee - design and syntax through the ages".
Dept. of Software Archeology, University of New Atlantis
(for immediate release)