The slow birth of distributed software collaboration

Nowadays we take for granted a public infrastructure of distributed version control and a lot of practices for distributed teamwork that go with it – including development teams that never physically have to meet. But these tools, and awareness of how to use them, were a long time developing. They replace whole layers of earlier practices that were once general but are now half- or entirely forgotten.

The earliest practice I can identify that was directly ancestral was the DECUS tapes. DECUS was the Digital Equipment Corporation User Group, chartered in 1961. One of its principal activities was circulating magnetic tapes of public-domain software shared by DEC users. The early history of these tapes is not well-documented, but the habit was well in place by 1976.

One trace of the DECUS tapes seems to be the README convention. While it entered the Unix world through USENET in the early 1980s, it seems to have spread there from DECUS tapes. The DECUS tapes begat the USENET source-code groups, which were the incubator of the practices that later became “open source”. Unix hackers used to watch for interesting new stuff on comp.sources.unix as automatically as they drank their morning coffee.

The DECUS tapes and the USENET sources groups were more of a publishing channel than a collaboration medium, though. Three pieces were missing to fully support that: version control, patching, and forges.

Version control was born in 1972, though SCCS (Source Code Control System) didn’t escape Bell Labs until 1977. The proprietary licensing of SCCS slowed its uptake; one response was the freely reusable RCS (Revision Control System) in 1982.

The first real step towards across-network collaboration was the patch(1) utility in 1984. The concept seems so obvious now that even hackers who predate patch(1) have trouble remembering what it was like when we only knew how to pass around source-code changes as entire altered files. But that’s how it was.

Even with SCCS/RCS/patch the friction costs of distributed development over the Internet were still so high that some years passed before anyone thought to try it seriously. I have looked for, but not found, examples earlier than nethack. This was a roguelike game launched in 1987. Nethack developers passed around whole files – and later patches – by email, sometimes using SCCS or RCS to manage local copies. footnote[I was an early nethack devteam member. I did not at the time understand how groundbreaking what we were doing actually was.].

Distributed development could not really get going until the third major step in version control. That was CVS (Concurrent Version System) in 1990, the oldest VCS still in wide use at time of writing in 2017. Though obsolete and now half-forgotten, CVS was the first version-control system to become so ubiquitous that every hacker once knew it. CVS, however, had significant design flaws; it fell out of use rapidly when better alternatives became available.

Between around 1989 and the breakout of mass-market Internet in 1993-1994, fast Internet became available enough to hackers that distributed development in the modern style began to become thinkable. The next major steps were not technical changes but cultural ones.

In 1991 Linus Torvalds announced Linux as a distributed collaborative effort. It is now easy to forget that early Linux development used the same patch-by-email method as nethack – there were no public Linux repositories yet. The idea that there ought to be public repositories as a normal practice for major projects wouldn’t really take hold until after I published “The Cathedral and the Bazaar” in 1997. While CatB was influential in promoting distributed development via shared public repositories, the technical weaknesses of CVS were in hindsight probably an important reason this practice did not become established sooner and faster.

The first dedicated software forge was not spun up until 1999. That was SourceForge, still extant today. At first it supported only CVS, but it sped up the adoption of the (greatly superior) Subversion, launched in 2000 by a group of former CVS developers.

Between 2000 and 2005 Subversion became ubiquitous common knowledge. But in 2005 Linus Torvalds invented git, which would fairly rapidly obsolesce all previous version-control systems and is a thing every hacker now knows.

Questions for reviewers:

(1) Can anyone identify a conscious attempt to organize a distributed development team before nethack (1987)?

(2) Can anyone tell me more about the early history of the DECUS tapes?

SHARE was doing public domain software distribution almost from its 1955 beginning. They even had a complete OS distributed as freely-copiable tapes, the SHARE Operating System (SOS) for the 709 and 7090, in 1959.

Why do you think DECUS developed in a vacuum? SHARE was very well known in the computing world. I would not be the least bit surprised to hear the DECUS founders say “we need a group like SHARE!”. That would go for program exchanges as well as things like user advocacy and shared experience documentation and training.

>I would not be the least bit surprised to hear the DECUS founders say “we need a group like SHARE!”.

I wouldn’t be surprised, either. On the other hand there’s no actual evidence for it.

And no genetic traces. We can look at READMEs in today’s software distributions, READMEs is comp.unix.sources postings from the early ’80s, and READMEs on DECUS tapes, and go “Ohhh, right. I recognize that practice.” Is there anything parallel we can say about SHARE distributions, any feature or trait we can recognize as continuous?

Of course, you have to be careful with old recollections. I’m pretty sure Bill Joy was influenced, at least in style, by the the MTS $EDIT visual mode (he went to UM in the relevant period). But he, as far as I can search, never mentions it.

I’m on the hunt for tools that will let me examine the SHARE tapes in Paul Pierce’s collection. There are nine of them, with some 3600 contributions between them. Hopefully, I won’t have to install a 7090 emulator and bring up a copy of IBSYS to look at them…

CVS was originally implemented as crude automation layer on top of RCS. RCS doesn’t have a concept of multiple files, so it obviously doesn’t know about file names, changes in file names, or even deleted files. CVS treated renames as add/delete pairs, and added its own hack to do deletions, but it didn’t work in any but the most trivial cases.

Deletes failed both ways. Deleted files would randomly appear and disappear in other branches or tags as the file was deleted (or even just tagged) on other branches. This would happen differently if the file was modified on a branch vs HEAD.

It caused quite a bit of chaos when new files appeared in checkouts of old tags, and even ended a few projects.

Git can catch you off guard, too. My co-workers occasionally manage to confuse gitlab, and I’m not sure how they’re doing it yet. They never seem to be able to reproduce it. Normally it seems to fabricate new history, as branches, though.

Some years back I was thinking about how DVCses work and realized that – had I known it was important to do it – I could have easily written a Subversion-equivalent VCS between 1982 and 1990, before CVS was released. See, Walter Tichy already got the hard part – the delta journaling – right when he shipped RCS. I wouldn’t have had to mess with that; getting the overlayer right is simple by comparison.

It’s tragic that CVS got that overlayer wrong, because getting it right would have been easy. Just follow two rules: (1) Never, ever delete an RCS master file, and (b) have one master per repo devoted to nothing but journalling changesets, renames, and tags. (A changeset would have been a list of RCS path/revision pairs recorded as a revision in the journal. Native RCS tagging would never have been used at all except in the journal. Simpler than CVS, more powerful, without its horrible failure modes.)

These design rules aren’t magic. You can get to them by knowing why databases journal transactions, and I already did. Execution was well within the capabilities of 1982 tools. First, build an implementation that manages a local repository as described above, using RCS to manage versioned storage. Then write a socket server to accept updates over TCP/IP – initially as whole files, later as diffs.

But dammit, I didn’t yet know version control was important. If I could whisper one technical hint to my younger self, that would be it. I could have has something Subversion-like in production as early as ’85, and if I hadn’t gone on to invent DVCS (and pretty likely I would have) someone else would have by the early ’90s. We lost fifteen years we didn’t need to because I wasn’t fscking paying attention in the right place.

I was a fairly newbie programmer then – it never occurred to me that better version control could be my problem to solve, and I assumed other people were smarter about it than me. Alas, they weren’t.

How should renames have ideally been handled? Rename the master file and record enough information (how?) to allow checking out older revisions with their contemporaneous names? Or never rename anything and each changeset contains a potentially arbitrary mapping between in-tree filenames and master file name/revisions?

By the way, I’ve noticed that the tzdb repository on github (I don’t know how it was converted exactly, but the early bits are largely a conversion from SCCS) has all files under their “modern” names, with the consequence that older revisions don’t build correctly. Does reposurgeon have a way to detect this case?

>How should renames have ideally been handled? Rename the master file and record enough information (how?) to allow checking out older revisions with their contemporaneous names? Or never rename anything and each changeset contains a potentially arbitrary mapping between in-tree filenames and master file name/revisions?

You got it on the second try.

The mental shift to make here is to think of VCS revisions as database transactions and ask what update rules need to be followed in order to guarantee that, once a revision has been pushed, it can never be lost (that is, assuming no bad thing happens at the file-store level). CVS got this horribly wrong; the way to get it right is to manage your repository via what one recent VCS calls “monotonic updates”.

In a monotonic store, master files are never deleted once created, are never renamed, and are only modified by appending revisions. The crucial property this guarantees is that if you journal a repository-level revision (eg. a set of master names and per-file revisions) that journal entry remains valid forever. You never even have to recompute it, an operation which is a dangerous defect attractor in systems that (for example) try to express file renames as master renames.

So, your record of commits is a monotonic store, too. Each commit is a change comment and “a potentially arbitrary mapping between in-tree filenames and master file name/revisions”. This design could have been done in a relatively efficient way using RCS for the delta journaling of masters at any time after RCS shipped in 1982. Probably in no more than 2-2.5KLOC for a proof-of-concept single-user version. I kick my own ass for not having done this every time I think about it.

A lot of big projects were using CVS, and there was always hope that someone would eventually make CVS better. The fact that there was only one VCS in wide use, and that it was so broken for so long, led to the perception that the problem was much harder to solve than it was.

There were some contenders between CVS and SVN though. It seems a lot of people wrote toy VCS systems in the 90’s, all of which did something better than CVS, but couldn’t import a CVS history without significant losses.

SVN’s DB-based on-disk format was met with a lot of skepticism when it was new (“how will we hand-edit our repos when the inevitable bugs ruin our day?”), but seasoned CVS admins had already become well-practiced in restoring repos from backups three times a year. We figured even if SVN blew up every other month, it was worth it to get deletes that worked.

SVN might have been one of the first to implement the idea of importing history from a foreign repo. Today that’s table stakes.

>all of which did something better than CVS, but couldn’t import a CVS history without significant losses.

Yeah, well, about that…in the worst cases, nothing can import a CVS history without significant loss. I maintain tools that do a significantly better job than anything else and it is still a world of pain and grief. Unless you get very lucky and/or the operators have been exceptionally careful.

The technical limitations of CVS and Subversion as a barrier to adoption are not to be underestimated.

For some years before git, Linux was in BitKeeper, and before BitKeeper, there was a choice of Subversion, CVS, or nothing (i.e. tarballs and emailed patches). Linus chose the third of those three alternatives and defended that decision right up to the first BK import. A few people were running tarball-to-CVS gateway repos, but there was nothing official before BK.

In my experience, SVN uptake was only strong toward the very end of your suggested 2000-2005 range, but it’s probably accurate enough. And of course there was a whole lot of competition around the SVN successor, from which git would eventually emerge, though perhaps that’s not important here. I’m not old enough to help with anything pre-CVS, but I can do some copy checking.

“CVS, however, had to significant design flaws”

If it had two, I’d like to know which. But I suspect that’s just an extra word.

Your history of source control seems to be very Unix centered. There were other version control and patch utilities being used on other systems going back further. CDC had the modify/update programs well back into the 70s and possibly the 60s, I knew about them in college in the mid-late 70s and they were well-established then. Paper from 1981: http://richardhartersworld.com/cri/2001/update81.html
IBM had the IEBUPDT utility in the 60s. Obviously the capabilities of source control systems continued to expand and improve over time but the foundations were forming prior to Unix.

I wouldn’t call IEBUPDAT/IEBUPDTE a source control system. Still, they did exist on MVS, albeit a bit later; PANVALET and LIBRARIAN were the best-known. Both were reported to have more than 3000 installations in 1978, according to the Wikipedia article on the former.

A more interesting case is the MVS system maintenance program SMP. It was built on top of the MVS utilities such as IEBUPDTE and AMASPZAP, and could maintain a history of modifications for every file it tracked. Performance got really unwieldy if you kept too much history around, though, especially if you were trying to back out a specific modification (you had to roll all of the mods out back to the last canonical version, then reapply the ones you wanted to keep), so it had an operation (ACCEPT) that permanently added the mods you named to the system and threw away the history information. Once you ACCEPTed, you could no longer roll back past that point.

SMP and it’s successor SMP/E were big, complex programs that took a lot of training to make the most use out of, but they gave unparalleled control over just what the hell was on the system.

Only because Unix was where distributed development actually got figured out. What it’s really centered on is the chain of developments that led to that – thus I ignore Perforce, for example, as it had no influence on today’s practices. Neither did CDC Update nor IEBUPDT. Researching, I find that IEBUPDTE was limited to fixed-length records and could therefore have not realistically been used on any language newer than FORTRAN II or III.

Your reference is somewhat interesting in that it shows that CDC Update must have predated patch(1) by at least three years. I’d ask if you think CDC Update influenced the design of patch(1), in which case it would belong in this history…but that’s a silly idea. The patch(1) utility is a straight inverse of diff(1), it doesn’t need any other causal explanation.

While there may not be a direct link between something like CDC’s update/modify and diff/patch, ideas tend to move around to different companies and organizations as people change jobs or communicate with each other so indirect connections are very likely. The idea of patch files certainly dates back to the 60s to early 70s timeframe from multiple sources. Very few inventions come out of a vacuum, we stand on the shoulders of those who came before us.

>ideas tend to move around to different companies and organizations as people change jobs or communicate with each other so indirect connections are very likely. […] [W]e stand on the shoulders of those who came before us.

This is true, in general.

In this specific case, though, I know context that you don’t – patch(1) was invented by Larry Wall, and I know him, and I was part of the same culture he was at the same time he was inventing patch(1). The shoulders he was standing on were Unix diff(1), the history of which is well documented.

It is just barely possible that CDC Update introduced one of diff’s two very primitive precursors at Bell Labs. But I think it’s quite unlikely, both chronologically and from what’s recorded about their design.

>It probably wasn’t, but cross-pollination happens all the time in this industry.

I emailed Larry Wall to check. He confirms that he knew nothing of CDC Update. His more detailed explanation is interesting; I’ve asked for permission to quote it in a future blog post, and expect I’ll get that.

* 1986 – opensource CVS as first lockless version management system
* 1991 – proprietary TeamWare by Sun as first distributed version control system

There were other distributed version control systems before Git and Mercurial (both of which were created independently at the same time), like

* proprietary BitKeeper, which was used for some time for developing Linux, and which loss of free (with restrictions) version was the main reason for developing Git and Mercurial
* Monotone, which hit unfortunate performance regression at the time Linus was searching for open-source DVCS as BitKeeper replacement; the idea of SHA-1 based object identifiers (content-based addressing) was, I think, borrowed from it
* Bazaar (baz), then Bazaar-NG (bzr) later renamed to simply Bazaar – the third DVCS after Git and Mercurial, I guess brought down by its over-engineering engine (and network effects of using DVCS)
* Arch, tla, ArX – difficult to learn, and I think difficult to use; difficult to port off POSIX
* Darcs – with interesting ideas, but supposedly abysmal performance for larger repositories (at least in its 2.0 version); there is modern DVCS based on similar ideas that avoids performance hit, which name I have forgot

Does Perforce fit into this story somewhere? It’s proprietary, but used at large organizations that hire a lot of hackers (e.g. Google) and had distributed-use features well before git. It goes back to 1995, though I don’t know which features were in the initial release.

You may have a terminological problem. Perforce supports use across the net, but that’s not enough to make it “distributed”; for that you need a sync operation that doesn’t bless a fixed central repository.

The fact that Google solved the scaling issues of Perforce being single instance by building a clone of it on top of their Spanner distributed DB is probably a source control cul-de-sac that’s of little interest or consequence to mainstream OSS development beyond the confines of this comment thread.

I don’t know if this is germane, but back when computer programs were written in BASIC and distributed in magazines for typing in, what in retrospect could be considered patches were distributed through the same magazines by providing only the changed lines, identified by the line numbers.

I had no idea CVS was so young. When I came in contact with it in college in 1995, everyone on my team at my first internship treated it as if it had always existed. Frankly, learning to use a VCS this early helped me immeasurably in my career.

As a fairly senior dev now in corporate America, it seems that 1-2 years ago, we’ve started seeing the masses coming out of college knowing how to use git. Prior to this, I’ve had to train every single college grad I hired on how to use any kind of version control, as they’d never encountered it. Now, any young programmer has used Github, at least.

And now the younglings are actually better than the grizzled old developers who never participated in the open source community, that I have to yell at for not having commit messages, or going weeks without committing their code, etc.

I think it’s hard to state what difference Github has made in this. Though Sourceforge was first, and I prefer both Bitbucket and Gitlab, Github is practically a household name, and that’s really important.

>I had no idea CVS was so young. When I came in contact with it in college in 1995, everyone on my team at my first internship treated it as if it had always existed

This sort of thing happens a lot. People get familiar with a technology, they rely on it, and they start to behave like it has always existed and mentally start to backdate it.

Before I remind people that bitmapped color displays only became generally available in 1992 they have a tendency to backdate that, too. Oh, and ask anybody old enough to have been there when video arcade games busted out; the answer you get is likely to be hilariously wrong.

Oh, and ask anybody old enough to have been there when video arcade games busted out; the answer you get is likely to be hilariously wrong.

I’ll test this. Depending on what you mean by “busted out”, I think it’s around 1979; I’m fairly sure I was nine years old when I saw a Pac-Man machine at my local grocery in a small town, and so arcade games would probably have had to been big on the scene perhaps a couple of years prior.

…looking up Pac-Man on Wikipedia, the timeline seems about right. I will say that I also had a braincell claiming I was only seven, so I might be slightly prone to backdate-itis.

You’re at least a year early; the North American release was October 1980. Small towns weren’t likely to have seen a Pac-Man machine for a couple of years after that, because it took a while for arcade operators used to electromechanical pinball machines to warm up to video games. I’d guess your first exposure was actually 1982 or so.

At that, you’re doing better at accurate recall than a lot of people. Perhaps confused by dim memories of early home consoles like the Atari 2600, they often back-date titles from the “golden age” of video games in the early ’80s back to the mid-70s or even earlier.

My error is actually more embarrassing than that. I was nine years old in 1981, and somehow typoed that as 1979… no clue how that happened.

Atari 2600: I can’t confidently date it faster than I could just look it up from somewhere reliable. …I’m pretty sure I can date Colecovision to circa 1984 or maybe 1983, because I distinctly remember a commercial for it where they named the year. And I’m sure Colecovision postdated the 2600, so the 2600 is older than that. (Spoilers: it came out in 1977, which surprises me; I tended to think of the 2600’s version of Pac-Man as its killer app. I didn’t think it lived that long on just Pong and the like.)

The friction costs of CVS cannot be underestimated. I toyed with using it on personal projects over the years, knowing that version control was a Good Thing.

Each time this lasted for a few commits before abandoning the experiment. Until a few years ago when I started trying git, still only haphazardly at best, but the system was easy enough to understand that it wasn’t like pulling teeth from the rear.

>The concept seems so obvious now that even hackers who predate patch(1) have trouble remembering what it was like when we only knew how to pass around source-code changes as entire altered files. But that’s how it was.

Might it also be difficult to remember what it was like because passing around entire altered files is less onerous now than it was then (given multi-megabit-per-second connections and single-user machines with terabytes of disk)?

I do think (as several others have noted) the role of BitKeeper in should be mentioned in how the transition from NetHack-style development to DVCses happened.

This is especially true because of the ideological implications (which might not justify explicit mention in this history, but are important anyway). RMS/FSF objected to BitKeeper being used because it was “unfree” (under a purity rule that you could only use “unfree” software in order to replace it), and encouraged people to stick with free CVS, which had all the flaws that kept Linux using patch files as its approach. Torvalds used BitKeeper because it was (to him, of course) the best available tool for the job. With the experience thus gained by using BitKeeper, Torvalds then had the knowledge and experience to know how to replace it when it became necessary, which he wouldn’t have had if he had kept just using patch files.

Would there eventually have been a “good” open-source DVCS if developers had followed the FSF recommendation to shun BitKeeper? Possibly. But it seems unquestionable that it would have arrived later, plus Linux wouldn’t have benefited from BitKeeper in the meantime. Breaking the FSF purity rules to use “unfree” software both benefited Linux and, rather than causing “lock-in”, accelerated the arrival of the open source equivalent.

>I do think (as several others have noted) the role of BitKeeper in should be mentioned in how the transition from NetHack-style development to DVCses happened.

I’m not trying to write a general history here. BitKeeper was never a thing every hacker once knew.

Anyway, RMS’s opinions are irrelevant in this context. Linus used the GPL but has never liked RMS and has never bought his ideology. I was actually present when they first met in ’96 (a dinner expedition at the first and last Free Software Conference) and the instant mutual hostility was…startling.

You are probably right about BitKeeper helping to equip Linus to write git, but overestimating its path-criticality. There were other distributed VCses independent of Bitkeeper competing for Linus’s attention, and I know he studied some of them. Monotone and hg, in particular. Bk experience was a sufficient condition to start Linus thinking, not a necessary one.

(He’d probably answer me if I asked about this. I try not to bother him for small reasons, though, and this one is small.)

The fact is DVCS concepts had been in the air for some years years quite independently of anything McVoy did to promote them. I myself circulated a white paper titled “Distributed RCS — A Proposal” in 1995 that anticipated DVCS repository sync – five years before BitKeeper, six years before Arch, and a decade before git. That is my direct reason for not considering BitKeeper essential; somebody else was going to get there, and in fact it was very nearly me.

As with CVS a decade earlier, I had the right insights but didn’t know the problem was important enough to pursue. Yes, that’s right – I missed being the reigning deity of version control twice. I don’t think I have any bigger regret about my career.

As with CVS a decade earlier, I had the right insights but didn’t know the problem was important enough to pursue. Yes, that’s right – I missed being the reigning deity of version control twice. I don’t think I have any bigger regret about my career.

There were other distributed VCses independent of Bitkeeper competing for Linus’s attention, and I know he studied some of them. Monotone and hg, in particular.

Speaking of backdate-itis, the first appearance of Mercurial didn’t appear until a couple weeks after Git’s introduction. Both systems were directly inspired by the downfall of BitKeeper and competed for Linus’s choice in moving Linux to a new VCS. Git had a significant advantage of being written by Linus, however ;)

I’m not trying to write a general history here. BitKeeper was never a thing every hacker once knew.

Fair

There were other distributed VCses independent of Bitkeeper competing for Linus’s attention, and I know he studied some of them. Monotone and hg, in particular

Mercurial didn’t even get announced until Bitmover declared it was withdrawing the free version of BitKeeper. Monotone was earlier, but initially released in 2003, after Torvalds had already started using BitKeeper for Linux (a cursory check shows it was established practice at least by October 2002, and I haven’t actually tried to backdate it further).

It’s true arch dates to 2001, but it died in 2006, which suggests it was severely deficient.

As with CVS a decade earlier, I had the right insights but didn’t know the problem was important enough to pursue.

Right, because it wasn’t an itch of yours that needed to be scratched.

That’s the reason I attribute a catalytic role to BitKeeper being used by Linux; it wasn’t until Linux devs knew the itch could be scratched (no later than 2002) that any open-source scratchers came along (Monotone 2003; Bazaar, Mercurial and Git 2005). If everybody had avoided BitKeeper, how much longer would it have been before people perfectly capable of solving the problem would have actually solved it?

>If everybody had avoided BitKeeper, how much longer would it have been before people perfectly capable of solving the problem would have actually solved it?

So many random variables. I mean, I could have done it. That makes it more difficult for me to form a judgment, not less.

Everybody avoided arch, and the problem got solved anyway. I’d put more weight on the experience of using BitKeeper, except…

Have you actually looked at BitKeeper’s data model? I have – I had to lift the NTPsec code out of BitKeeper, so I got pretty intimate with it. It’s really rather weird. There are no branches – you branch by cloning a repo, then there’s a merge operation you can only do if two repos have a common tail sequence of revisions (I think – that was murky). After sequences of branch-merge operations you end up with a whole bunch of peer repos each with with one head, one tail, and weirdly incommensurate commit graphs that are (I can’t put it better than this) all bulgy in the middle.

On top of that, the UI is horrible because there are knobs relating both to the SCCS underlayer and the bk overlayer and heaven help you if you misunderstood the unnecessarily complicated relationship between them.

Based on my own experience lifting ntpsec I have to wonder whether, when it comes to thinking through a sane DVCS design, bk experiance wasn’t more of a hindrance than a help.

Not buying it, not that early. There was no WAN for them to distribute over, and I am not going to accept phone calls, snail mail and magtapes as equivalent to email and ftp. Too much latency, and too little predictability in the latency. Under those constraints you can’t form the kinds of expectations about practices and workflow that nethack pioneered; the infrastructure for it is just not there.

/me reads more….

On the other hand, the Wikipedia description does sound like MTS adopted lower-latency channels as they could, so maybe later in the game? For comparison with nethack, the questions is when and how how thoroughly the practice of the development groups caught up with what had become technically possible. Alas, the summary gives me little to go on.

On the gripping hand, even if practice looked reasonably modern around single hosts, you’re not to nethack-equivalent expectations and behavior if dataflow between different installations was drastically slower and more expensive. And that has to have been the case before MAILNET in 1982.

The interesting question is whether MTS got enough ahead of evolving Internet/UUCP practice between ’82 and ’87 to really approximate modern distributed development before nethack did. Possible, certainly, but I’d want to have a conversation with someone who was there.

A strike against this possibility is that none of their artifacts seem to have survived. I know there’s a semi-functional MAD compiler, but only because I wrote it.

>Some of the MTS sites were involved in the Merit network, which connected the mainframes at multiple sites starting around 72/73, so they weren’t limited to phone calls, snail mail and magtapes.

Reading…MERIT supported cross-network file copies and remote job execution. Good, but not sufficient. To ge a nethack-equivalent workflow you need person-to-person email and they didn’t get that until 1982.

Inboard modems (expansion cards for your computer) were also known, but never sold as well because being located inside the case made them vulnerable to RF noise, and the blinkenlights on an outboard were useful for diagnosing problems.

This is not true; inboard modems, and even modem circuitry built into the motherboard of some laptops and desktops, became the dominant variety during the nineties (close to the end of life for PSTN modems as consumer products). Internal modems had the disadvantages you speak of (though most came with a speaker so you could distinguish “happy song” from “sad songs”), but also the advantage of coming with their own UART, at least at first, so they were not capped in speed by limitations of the host PC’s RS232 hardware. Accordingly, higher-speed modems tended to come in an internal variety first.

Of particular note is the accursed “Winmodem”, not much more than a DAC and ADC with audio out to the phone jack; all of the modulating and demodulating was done by the host CPU. These not only tied up some of your precious CPU time with doing the audio processing, they also required Windows to run. Their reduced cost made them enormously popular among OEMs in the late nineties and they were bundled with many a cheapass PC. I remember being particularly gleeful at getting my hands on a ZOOM 56k external modem, for fear that any internal variety for sale would not work with my Linux-based kit.

>This is not true; inboard modems, and even modem circuitry built into the motherboard of some laptops and desktops, became the dominant variety during the nineties

If that happened, it was not where I was seeing it. Yes, I tossed out a few winmodems from hardware that fell into my hands, but I don’t think I saw any kind of inboard modem in live use more than twice – which is against dozens of outboards. And though I heard rumors of mobo modems I never saw one of those ever.

My experience too is that internal modems were in the majority. Serious users would have used external ones, of course, for exactly the advantages that you give – but, for casual and home users, internal was almost universal. In my “store of things that may come in useful one day” I have only two external modems, both of those being from our household, but dozens of modem cards taken out of old computers for spares. There would be even more had I not thrown some of them away to save space.

Whether these were winmodems or at least proper ones with a UART I can’t say. But internal ones definitely ruled.

Built-in laptop modems were pretty common at the end of the dialup Internet era. I can’t speculate on how many were on daughterboards vs. the motherboard, but I’m sure that more than a few were on the motherboard.

I don’t recall ever seeing a modem on a desktop motherboard, though.

Inboard modems were common even before the Winmodem era simply because not having to deal with a case and a power supply made them cheaper.

As another anecdote on the pile, I have several medium-sized boxes full of ISA and PCI internal modems pulled from machines between about 1996 and 2004 or so. Some of these are winmodems, some aren’t. In the late 1990s pretty much everyone I knew with a computer had a modem in it, and I do mean *in* it – externals were available at every computer shop, but I didn’t see them in use very often. I did use one myself for a while, but that was mostly because winmodems were so common that an external was the only way to be sure you weren’t getting one! (And then came the truly abominable USB winmodems, but the less said about those, the better, and thankfully they were never very common AFAICT.)

However, as relates to a skewed sample, the people I knew back then were mostly not hackers, more somewhere between random lusers and the gamer/power-user set. Come to think of it, the one fellow I knew with an external modem was also the only hacker I knew at the time.

I wonder if hackers were the only ones who cared. Since the jobs of a modem was to make the magical screechy sounds that connected one to the local BBS, AOL, or the Internet, it seems Winmodems did the job well enough that the implementation details didn’t matter to most.

That’s going back as far as my memory goes, I was born in 91, but in 96, the four computers I was familiar with all had internal modems, and in ~98 the one my parents had was a U.S. Robotics 56k unit to replace whatever was there (casualty of lightning).

The laptops I work on from the Pentium 2-pentium 3 days pretty much universally have winmodems, too. I always assumed their low part count made them fit in the laptop better.

My experience is much more aligned with Jeff Read’s. I think our samples are of different populations. Yours with hacker types, and ours with consumers. I was in junior high and high school in the 90s, and active on BBSs until I got dial-up internet access in 1997. BBS owners almost all used external modems, but users were more like 50/50 or so. Mainstream consumers with dialup internet were almost all internal modems.

My experience was that whenever people bought modems as separate items, they favored externals, yes. So as individual items, inboards never sold as well.

However, starting circa 1995, prebuilt home computers from major consumer brands always came with inboard modems, because “Internet capability” was now expected as standard in the consumer market. Aunt Tillie didn’t want more complicated hookup than putting the phone line into the phone jack on the computer (and her tech support, whether corporate or nephew, wouldn’t want to deal with issues like whether she remembered to turn the modem on or not). With the result that the vast majority of people using modems in AOL’s glory years were using the internal modem that came in their box.

I suspect very few people in your social circles were among those buying Packard Bells or eMachines to get on the Internet. As a history of what hackers did (at least when they didn’t have to rely on parent-purchased hardware), I think your description of modems holds, since people choosing their components kept the old tendency to buy external modems, and hackers would be among them. In a history of the general computer business, though, internals probably had a net majority thanks to the Internet dial-up era.

I got my first internet-capable computer in 1997, and it had a built-in 33.6 kbps modem(as a separate card, not part of the mobo – IIRC, it was PCI, though it could even have been ISA). When we later upgraded to a 56 kbps modem in ca. 1999, it was also inside the case, ditto the 56 kbps modem built into the machine we got in 2000. Outboard modems existed, but they were pretty rare in the wild by the late 90s, at least from my highschool-aged perspective at the time.

I know that by the time I did a stint at a small computer shop in 2000, outboards were no longer for sale in our shop.

There was another project to improve on CVS: CVSNT, which started around 1999. I think the last public source code is available at https://github.com/yath/cvsnt — March Hare took over maintenance around 2004, and have since made it commercial. It fixed several of the issues with CVS while being able to take over older repositories, and work with older clients. It also added new facilities, and I had success using it for a few projects. It was out-competed by Subversion though.

Have you deliberately left out mention of the other DVCS systems: monotone, bazaar, darcs, mercurial, etc.? Some of them predate git, and mercurial at least is still in large-scale use today (e.g. by Mozilla).

@esr> You are probably right about BitKeeper helping to equip Linus to write git, but overestimating its path-criticality. There were other distributed VCses independent of Bitkeeper competing for Linus’s attention, and I know he studied some of them. Monotone and hg, in particular. BK experience was a sufficient condition to start Linus thinking, not a necessary one.

Mercurial (hg) could not be one of DVCS studied by Linus when he was searching for a post-BitKeeper DVCS, because (as it was said already) it was created in parallel, at the same time as Git. I remember from reading (now defunct) KernelTrap and KernelTraffic that the announcement of first release was nearly at the same time, I think withing a week or a month at most; I don’t remember which one was first to be announced on LKML.

@esr> Based on my own experience lifting ntpsec I have to wonder whether, when it comes to thinking through a sane DVCS design, bk experience wasn’t more of a hindrance than a help.

Linus Torvalds wrote that he followed (somewhat) UX of BitKeeper, but deliberately wrote the actual engine differently (borrowing some ideas from Monotone); more as filesystem writer than version control writer.

I think it might be interesting (but perhaps outside the scope) to note that larger projects that used CVS, like OpenOffice (pre-split) and IIRC Mozilla heavily augmented it with custom scripts and tools to make it sane (e.g. atomic tagging, better branch support).

Another interesting historical note is that it was Mozilla wondering about switch from CVS or (more probable) Subversion that was the prompt for fast-export format to be born. (FYI Mozilla chosen Mercurial then). Nowadays I think all modern VCS provide fast-export / fast-import support (I think I was the one that suggested adding it to Veracity). And because of it reposurgeon was possible to be created…

The first modems I saw were external ones, and that was in the late 80s, or perhaps the early 90s, when computers were a thing that my father used for business work. When I think to the mid 90s, when computers became a thing shared by the family upon which I could play videogame, I recall less external modems. I vaguely recall an incident somewhen before the family computer in which I observed a computer making modem sounds, but with no modem anywhere in sight. When I queried this impossibility, I was told the modem was internal to the computer.

In the mid-late 90s, I was running my own computers composed from systems and parts salvaged from garage sales and others’ trash. The modems I got my hands on were all internal ones; I don’t recall seeing any external modems. When a sysop of a local BBS offered to send me a more modern modem so I could connect faster than at 2400 baud, the modem I received in the mail was also an internal one. And when I bought my own brand-new computer in the late 90s, the modem was again an internal one. It wasn’t until the mid 00s that I recall seeing an external modem again, and that’s because I made the point of finding one to buy, so to ensure I got a not-winmodem I could use with my Linux box.

The only mobo modems that I personally recall seeing have all been on laptops. This happened all the way into the mid-late 00s. The Dell laptop I bought through college in the mid-00s had a modem port. There’s some older laptops from 2008 here at work which also have them. My personal laptop is from 2010, and it has an option for a modem. These latter two examples are ‘mobile workstation’-class laptops, which tend to have more hardware configuration options, even obscure ones which have since been dropped from from less expensive models. As for mobo modems on desktop hardware, I don’t doubt they were used. If I had to guess, I’d say they were most likely on inexpensive mid-range and low-end models targeted towards J Random Consumer. The kind of computer I’d expect hackers to be buying during that era (if they weren’t assembling their own) would’ve been connecting to the Internet through an Ethernet jack.

> Before I remind people that bitmapped color displays only became generally available in 1992 they have a tendency to backdate that, too.

The Amiga was first made available in 1985, and could display a perfectly cromulent 640×200 color bitmap screen (640×400 if you were willing to deal with interlacing) – sufficient for 80-column text – on a common TV or RGB monitor. IBM EGA (released in 1984) actually had roughly comparable specs, albeit obviously relying on a dedicated monitor. These systems were obviously not super common before the 1990s, but they were far from “unavailable”!

>These systems were obviously not super common before the 1990s, but they were far from “unavailable”!

Yeah, very time this comes up I have to remind people how badly those ’80s displays sucked.

They were only acceptable if you’d never seen workstation-class 1Kx1K screens – nasty halo effects around the edges of type, you couldn’t really do even 80×24 without eyestrain. Bleagh. And I’m not even especially picky about this sort of thing (I cheerfully tolerate low enough refresh rates to drive most people batty) it’s more that the rest of you have a tendency to filter your memories through a nostalgic haze.

The very first color PC displays fit for use by a civilized being didn’t appear until around 1990 just before their 1Kx1K barrier got busted, but at that time they not really mass-market gear yet. B&W displays with decent ergonomics had been available much sooner because the masks were cheaper and easier to fabricate – even the original IBM monochrome display wasn’t bad.

Not counting the uber-expensive workstation-class hardware that had shown me color displays did not have to be crap, I first saw a decent one in 1992 on some piece of new Apple kit I’ve forgotten the marque of. Took almost another two years for PCs to catch up.

Why did I care about this while being tolerant of low refresh rates? Well, you see, I like to typeset my own books…

They were only acceptable if you’d never seen workstation-class 1Kx1K screens – nasty halo effects around the edges of type, you couldn’t really do even 80×24 without eyestrain.

Now, I wasn’t old enough to be paying close attention until good color displays were common (though we had a monochrome display, and my five-year-old brain was absolutely in *awe* of the 16 color VGA display my dad’s work machine had, and thought we were *so* behind the times), but something that I know from having read in adulthood about the architecture of a lot of 80’s micros makes me wonder if you might be remembering things a bit worse than they were: A lot of those micros used composite output so that they could use a TV for a monitor, and I’m wondering if the haloing effects you’re recalling are actually the crosstalk that NTSC had between luma and chroma. Machines like the PC (post CGA) or the Mac that used RGB monitors purpose built for computers wouldn’t have had this issue (but, at the same time tended to have smaller color spaces than other micros that were often more gaming focused).

> you might be remembering things a bit worse than they were: A lot of those micros used composite output so that they could use a TV for a monitor, and I’m wondering if the haloing effects you’re recalling are actually the crosstalk that NTSC had between luma and chroma.

Early on, yes, that was the dominant problem. But there were two other issues. One was simple dearth of pixels. If your screen resolution and consequent dot pitch is too low, small fonts are going to look like crap due to edge stairstepping whether or not you still have the NTSC/PAL crosstalk problem. The region of non-suck begins at about .24 dot pitch and you’re not in really good shape until .22 or better.

The other problem is that it took a while for the manufacturers to get three-color masks right. This is another forgotten gotcha, because LCD/LED displays don’t have it. Poorly manufactured masks gave you fringe and aliasing effects similar to, though admittedly not as severe as, the chroma/luma crossover in analog displays. This was actually, as I recall, a particular problem in the five years around 1990 – worst case you could get a display significantly fuzzier than the nominal resolution would lead you to expect.

> Yeah, very time this comes up I have to remind people how badly those ’80s displays sucked.

Yes, the displays were sucky compared to a more modern (SVGA-resolution) monitor. And yes, many people stuck with pure monochrome for a long time in order to take advantage of its improved sharpness. But even though the solutions available in those days were indeed sucky in modern terms, this hardly implies that people wouldn’t make use of them when appropriate!

Heck, if you _really_ care about avoiding eyestrain, the _proper_ solution even today is to retrofit an e-paper display for pure terminal-like usage (including “notebook”-like graphical terminals such as Jupyter; either way, no quick refresh or scrolling needed, thus accommodating the well-known limitations of e-paper!). After all, we now know that reliance on bright, actively-emitted light (as opposed to passively reflecting ambient light, as paper does) is a major source of eyestrain. But somehow, people don’t do that – even though anyone who has _seen_ an e-paper reader ought to be well aware of how nasty most displays actually are, and how unfit for use by civilized beings!

No scrolling needed is easier said than done. You’d have to rewrite major applications that scroll (IRC clients, text editors*, and of course the standard teletype emulation command line use case both with and without readline) to have some other method than line-at-a-time scrolling, perhaps moving half the screen to the top (upon reaching the bottom) and leaving the bottom half blank. Or just moving the cursor to the top and clearing lines as needed.

*Vim allows you to, out of the box, configure it to scroll a user-specified number of lines up to half of the current window size. It behaves a bit oddly if you cheat by setting it to a larger number and then resizing the window to be smaller: scrolling up still uses half the window, scrolling down will scroll up to the whole window and also skip a line.

Uhm…what’s “back then”? Before IBM unbundled their software in 1969, introducing their first three for-charge program products, nobody thought you could sell programs. Sharing code was what people did.

esr wrote: No. How could [reposurgeon detect file renames in CVS histories]? The rename tracking is not present in the metadata for it to mine.

You might not be trying hard enough. Or, conversely, too many of the renames that could be done are too hard to detect via methods other than metadata mining, or would create too many problems to justify the effort.

There are easy(ish) cases one could imagine. For instance, the difference between two revisions is one file name missing, another file name added, the files they refer to have the same checksum, and every occurrence of the old file name in other files is replaced with the name of the new one. Voila! You’ve detected a file rename.

A slightly harder example would be different checksums, but a diff reveals so few changes that you can conclude that the new is obviously a rename of the old, and everything else is otherwise the same. Even harder is multiple missing and new files, and diffing other files reveals which pairs of names denote the same file. I can imagine code that could mechanically figure all of this out and infer a rename without metadata, although I don’t know reposurgeon code well enough to write it myself.

Another thing I don’t know is how often this actually happens. (Versus, say, the pathological case where someone renamed a dozen files and also did heavy refactoring on those and others.) So it might turn out to not be worth the effort. Then again, if there are some venerable projects worth rescuing…

Without a demonstration of really compelling need and/or someone waving bundles of cash at me I’m not going to try doing any of this.

One major problem is that your proposals assume that the concept of a (repository-wide) revision (a changeset) is well defined, when in fact most of the nastiest issues in CVS lifting stem from the fact that it is not. There are only per-file revisions; they can only be associated into changesets by heuristics that can easily fail due to common metadata malformations and other problems like client-side clock skew (yes, that’s right, the timestamps on CVS revisions are unreliable and not necessarily monotonically related to commit order).

>> esr wrote: No. How could [reposurgeon detect file renames in CVS histories]? The rename tracking is not present in the metadata for it to mine.

> Paul Brinkley: You might not be trying hard enough. Or, conversely, too many of the renames that could be done are too hard to detect via methods other than metadata mining, or would create too many problems to justify the effort.

Why not borrow the idea from Git (which is also used by Mercurial), namely do a heuristic similarity-based rename detection?

>Why not borrow the idea from Git (which is also used by Mercurial), namely do a heuristic similarity-based rename detection?

In addition to the reasons I gave Paul Brinkley, here are two other issues:

(1) RCS/CVS cookie expansion – things lilke $Id$ – would screw with your similarity detection. It means you can’t normally rely on O(n log n) comparison of sorted lists of checksum/path pairs, which would otherwise be the smart way. Instead you have to do something like pairwise diffs – O(n**2) and damned expensive for individual checks.

(2) Delete marking is not reliable in CVS. A previous commenter reported that projects have been sunk by deleted revisions spontaneously reappearing in checkouts; I believe him.

Really, the deeper you look into trying to be clever about CVS reconstruction, the more evil it gets. There’s no firm ground – there isn’t even a monotonic clock tick.

Well, I thought “heuristic” implied inexact matching [which, yes, results in the O(N^2) issue], but you could, well, unexpand them to generate the checksum. You could also exclude any file that already exists un-renamed if you only have to detect renames and not copies or scenarios like renaming one file and renaming a different file to the first file’s old name, which means N is usually not that large.

But as I mentioned in my other comment, that’s solving a completely different problem than the one I was asking about – the old data exists under the old name and the new data exists under the new name in Git and CVS and the problem is associating them, but in SCCS and possibly RCS the whole thing is renamed to the new name (so you have to detect “was this once called something different” without much clue). Tags might actually help with this for RCS, but SCCS stores them unexpanded.

The problem I had described was a file rename in a SCCS repository where the master file was renamed, leaving no trace of the old name in metadata (and a single unbroken history under the new name), not a CVS-style delete/add pair.

the same issue could also hypothetically exist for a RCS repository managed this way, but at least the actual content of $id$ and $source$ tags as checked in seems to be preserved in the master file, whereas the equivalent for %W% tags does not seem to be true for SCCS.

>> Why not borrow the idea from Git (which is also used by Mercurial), namely do a heuristic similarity-based rename detection?

> In addition to the reasons I gave Paul Brinkley, here are two other issues:
>
> (1) RCS/CVS cookie expansion – things like $Id$ – would screw with your similarity detection. It means you can’t normally rely on O(n log n) comparison of sorted lists of checksum/path pairs, which would otherwise be the smart way. Instead you have to do something like pairwise diffs – O(n**2) and damned expensive for individual checks.

First, I don’t remember if CVS/RCS stores “keywords” like $Id$ expanded or unexpanded, but even if it is the former, you can do rename detection based on canonical unexpanded representation.

Second, Git has also to deal with the same problem. By default it only checks for similarity only those files that were modified, though there are options to make it do use more expensive version.

Still, Git technique for dealing with renames (and copies) is not perfect. For merges heuristic similarity-based rename detection works well, but for example `git log –follow` is just a bit of hack, and does not always work correctly.

>First, I don’t remember if CVS/RCS stores “keywords” like $Id$ expanded or unexpanded, but even if it is the former, you can do rename detection based on canonical unexpanded representation.

Ha ha ha ha ha *ouch*

You mean you didn’t know?

It’s not at all unheard of for expanded $-cookies to slip into CVS masters. I don’t know how this happens – possibly some sort of obscure operator error – but it means you cannot in fact do “rename detection based on canonical unexpanded representation” reliably.

As I keep trying to explain, the amount of cruft and corruption in large old CVS repositories is far worse than you imagine. Pretty much every data regularity that you think you can count on to make clever deductions from is, like this one, violated often enough to be a problem. The deeper you dig, the worse it gets.

Er, the point he was making is that you could unexpand them, before running the hash algorithm. co -p rev file | sed ‘s/$\(\w*\):.*$/$\1$/g’ | sha1sum

Anyway, I happened to do a test of this while making one of my other comments: RCS does store them expanded (which would be a positive for detecting my scenario of renamed master files); SCCS unexpanded.