Sex, software, politics, and firearms. Life's simple pleasures…

Main menu

Post navigation

Solving the CVS-lifting problem

Last month I added CVS-reading support to reposurgeon. The reason I haven’t blogged in ten days is that this pulled me down a rathole out of which I am just now beginning to emerge. And now I have a a request for help – I need to collect some perverse CVS repositories, preferably relatively small ones.

Y’all might recall that the program I adopted as a CVS-repository-reading front end was cvsps, after I had hacked it to emit a git fast-import stream. Sadly, cvsps (which had been basically untouched by its former maintainer since 2008) turned out not only to have a whole bunch of unintegrated fix patches pending, but to be seriously buggy even after those were applied. There was a showstopper in the branch-analysis code that would often put gitspace branch points at the wrong place to represent the CVS history, and attribute files added just before the join to the wrong branch. Ugly stuff.

The worst bugs are fixed now, and I can prove it, because I built a regression-test suite and have been adding bug cases to it. But the process basically forced me to rewrite the cvsimport tool in the git suite. It uses cvsps, and it relies an ancestry-tracking option that I had to remove because it was broken. On top of that, testing revealed that git-cvsimport was itself a source of several kinds of conversion bugs which my new export code entirely eliminated.

So, then I had to do a round of politics to sell that fix to Junio Hamano and the git list. That negotiation seems to be done now; I expect to be able to ship a patch tomorrow that will be merged with a minimum of fuss. Alas, though, CVS is not yet done with me. Because through a peculiar accident I’m now the maintainer of yet another CVS lifter, parsecvs.

parsecvs is the code my occasional friend and ally Keith Packard – one of the co-designers of X – wrote to lift the X repositories from CVS to git. I found out when I was looking for a CVS-reading front end that Keith had abandoned it after it got its job done, after which it got picked up by somebody named Bart Massey who lost interest in it in turn. I had written them explaining that I wanted to dust it off turn it into something that could ship a fast-import stream to standard output.

Bart and Keith were radio silent, so I found cvsps, did a bunch of fixups, and its maintainer (Dave Mansfield) dropped it in my lap. Then, a week later, Bart gets back to me to convey me that he’s lost interest and I should probably take over parsecvs.

OK, now I have a duty. Both of these dusty hunks of code have fallen into my hands; I should figure out which one can do a better job, polish it up, and publicly end-of-life the other one so nobody puts future effort into a dead end. Which is when I started thinking about writing a CVS torture test.

Now my goal is to assemble a rogue’s gallery of CVS perversities, then test them against cvsps, parsecevs, *and* cvs2git (the spinoff of cvs2svn). Use the test to pick a winner by objective success, then end-of-life anything I’m maintaining that lost and pour my effort into improving the winner.

Though, actually, there’s another possible outcome for parsecvs; even if it doesn’t do CVS as well as one of the other two, it does collections of RCS files without CVS metadata. In one possible future, I test parsecvs against the Ruby rcs-fast-export maintained by Giuseppe Bilotta, which can only do RCS collections that are either multi-branch or multi-file but not both. If parsecvs turns out to be better, I’ll make the case to Giuseppe that the Ruby rcs-fast-export should be EOLed and replaced with a renamed parsecvs.

Anyway, this is a general request for the location of perverse and nasty CVS repositories that I can snarf and add to my torture test. Bonus points if they’re relatively small.

Google+

12 thoughts on “Solving the CVS-lifting problem”

Mozilla’s primary CVS repository from before we switched to Mercurial (in particular, mozilla/ in :pserver:anonymous@cvs-mirror.mozilla.org:/cvsroot) is pretty nasty in a bunch of ways, though absolutely not eligible for the bonus points for being small.

Off the top of my head, some of the things in it are: frequent copying of ,v files within the repository in order to preserve history for file moves (initially just as copies, later with a script that replayed all the old commits as the current date and without the tags; in at least one case there was a copy into a directory where there was already a file of the same name in Attic/), frequent minibranches (branching of a small number of files rather than the whole repository) in order to generate a release tag that’s almost but not quite identical to the current state of a branch, and in some cases rather nested branches (revision 1.205.2.2.6.2.4.1.2.1.2.2.2.1, or worse, 1.314.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.2.1.0.2). There are probably other things I’ve forgotten.

The history of copying the ,v files is the worst part, and something I wouldn’t expect a generic tool to deal with correctly; it might however be good to test that the tool succeeds in importing a useful history from it.

There was also a bit of history of moving existing tags, but I think that’s just unrecoverable.

I can’t help with small and nasty, but I can help with large and nasty. The NetBSD CVS repository is huge and has had its share of “incidents”. I think if a tool can manage it, it can manage any repository. At the very least, the attempt would produce loads of tests to minimize.

If I remember correctly I had to edit the *,v file I have copied from RCS to CVSROOT, and change the initial version number, or something like that. I don’t remember if I had problems with cvs client, or with `git-cvsimport`, or with `parsecvs`.

I’m sorry, I don’t remember the details; that was a loooooooooooong time ago.

Back in the old high school computer club (Club name: INFLUENCE = Interest Negotiators For the Liquidation of User Enterprises and Nullification of Congregating Enemies), we felt that knowing pi to twenty decimal places was adequate.