Interview Transcript: Matt Welsh

This is Dale Dougherty and I'm here with Matt Welsh, one of the founders of the Linux Documentation Project, also an O'Reilly author of Running Linux. Matt, tell me, in Linux years you're about eight years old. What were you doing in year one there?

Matt Welsh:
I started with Linux back in about 1992 when, much to the amusement of my friends, I went out and bought a new hard drive and tried to install what was really the only Linux distribution that you could call serious at the time, something called MCC Interim. It came on three or four floppy disks.

Dale Dougherty:
Where did you get that from?

Welsh:
I actually downloaded that from the Internet, but it was kind of problematic to make the floppies on a Sun workstation and get them to work under Linux. It was pretty low level and hard to install -- very difficult to even know what you had to do to get it installed. That was the early days before X Windows and before networking.

Dougherty:
You were a student?

Welsh:
Yes, I just started my undergrad degree at Cornell and when I bought this new hard drive I was plugging it into the box that went into my dorm room there at Cornell University. It was a 200 megabyte hard drive that cost me $550, so that kind of tells you in computer years how long ago that was, right?

Dougherty:
So, how did the Linux Documentation Project evolve?

Welsh:
This was basically the result of myself and Lars Wirzenius and Michael K. Johnson -- I guess Michael's now at Redhat and I don't know exactly what Lars is doing these days -- getting together and saying we wanted to produce printed documentation for Linux, so we wanted to write manuals. Michael was going to work on a kernel hackers guide and Lars was going to do a systems administration guide and I was going to do the installation guide. We got together and hashed out our ideas and started writing some documentation and putting this stuff up online. It was just written in LaTeX so the only way you could view it was to actually print it out or to use Ghostview or something like that. It wasn't extremely practical, but this did pre-date the Web. After maybe about a year or so, this evolved into a larger project where we were actually taking a lot of the documentation that people were writing anyway -- Read Me's and FAQ's and things like that -- and helping to organize and distribute that. Again, this was all before the Web, so mailing lists and Usenet newsgroups and things like that were the main ways that we got the information out there.

Dougherty:
Did Linus produce any documentation himself?

Welsh:
He had a pretty weak Readme file that went along with the kernel source tree, but that was really about it.

Dougherty:
So, in terms of raw materials, you didn't have much to work with?

Welsh:
No, no. I mean when I started writing documentation, there was some pretty inconsistent documentation out there already, some Read Me's and FAQ's. One of the most famous early documents was somebody produced a whole directory listing of every file on a working Linux system. He said, "Well, I finally got Linux to work, so here's the listing of where every file is on the system," and it printed out to 40 pages of listings of just where the files were. So, that was kind of interesting, but it's not extremely helpful to somebody. We got together and said, "Well, we're going to write some new documentation and talk about how to actually get a system working from scratch."

Dougherty:
So, what time period was that?

Welsh:
This must have been summer of '92 to summer of '93. That academic year for me was when all this really started happening.

Dougherty:
And, eventually you built a tool setup for producing the documentation?

Welsh:
Right. So much later -- well not much, you know, in Linux years it was a year or so later, but it seemed like a long time. It was becoming frustrating to develop Linux documentation in LaTeX and HTML was just kind of coming out. Nobody wanted to write in HTML. There was a bunch of people who wanted to write groff or something like that. It was becoming problematic. I found a suite of tools that let you write documents in a simplified SGML format and convert that to groff. It gave you plain ASCII and HTML as well as LaTeX. The set of tools was already out there and I just took those packages and repackaged them and documented them and added some features and nice front-ends so that it was very easy to write your document in this format and then just run one script. It would give you plain ASCII and HTML and Postscript, -- now people who wanted to print it out could get postscript and people who wanted to view it on the Web could use HTML. But, by far the most important format back then was plain ASCII. I ended up calling these tools LinuxDocSGML because it was SGML and it was for the Linux Documentation Project. And, I guess about a year later some people took that project and renamed it and repackaged it again and called it SGML Tools and that's still a working project.

Dougherty:
Right. Although on the site, it says it's not current.

Welsh:
Is that right?

Dougherty:
Yeah.

Welsh:
So, maybe it's fallen by the wayside a little bit.

Dougherty:
So, LinuxDoc.Org, when did that get launched?

Welsh:
I don't remember exactly when that domain name started getting used. I think it was after I had stopped working on it. The original Linux Documentation Web site was written by myself and I think it was one of the first Linux Web sites; it may have been the first Linux Web site and that was hanging off of sunsite.unc.edu which is now called MetaLab.

I just said "Hey, let's put all this HTML documentation we have as well as links to everything Linux related on this one big Web page." It was just one giant Web page with links to everything on it. That was the first Linux Web site and it was very popular. The number of hits that we were getting was pretty significant. There are still many, many, many links out there on the Web pointing to that original page. After I stopped working on these things so much, the LinuxDoc.Org domain name was pointed to that Web site and over the course of the years the maintainership of that has changed hands a couple of times.

Welsh:
No, after a few years at Cornell I just became too busy. I got involved doing research work there and also the book project with O'Reilly, that kind of took a lot of my effort. It seemed that a number of people, such as Greg Hankins, who was at Georgia Tech at the time, had taken on a lot of the roles that I was originally playing in that field and were doing an amazing job at it; people were filling the gap that I'd left behind.

Dougherty:
So, is it on its third generation?

Welsh:
Maybe third or fourth, I don't know. It has a life of its own and I have to say I don't know most of the people involved right now, but the Linux Documentation Web site has a whole new look. The content is fundamentally the same as what we originally had.

Dougherty:
When did the Linux How-to's emerge? Is that something that you were involved with?

Welsh:
Yeah, I started the How-to project along with Ian Jackson. Ian Jackson was one of the original Linux documentation guys and this emerged out of our frustration with the original Linux FAQ. The FAQ was this enormous document that was posted in seven separate sections to Usenet once a month. It was just unbelievably baroque and complicated. We decided to just throw it in the trash can and start from scratch with a new FAQ that was just really frequently asked questions, not everything about Linux, right? Frequently asked questions only. We'd start writing a new series of tutorial-oriented separate documents about Linux called How-to's; Ian was the one who actually recommended the name How-to for that. So I wrote the installation How-to and the X Window How-to and the Network How-to. Along with the Linuxdoc-SGML tool suite, I put those out there to people and said "Hey, here's this new thing, there's How-to's and here's the tools that you use to write them. If you want to write a How-to, here's how to do it and e-mail it to me and I'll format it and post it once a month to the Web and to the Newsgroups." And, that's how it got started. It was kind of a fun thing because now there's hundreds of How-to's.

Dougherty:
Right. It's on one hand a very obvious idea, but it still looks to me fairly. Others haven't necessarily picked it up -- other projects. It is that it is a format that almost anyone can contribute something to. When you say, "I'm going to write an administrator's guide" it kind of implies a book with chapters and all that.

Welsh:
Right.

Dougherty:
And this is, "I solved this problem, let me document it for others and submit it to the How-to's."

Welsh:
That's totally the idea, that it's completely collaborative in the sense that you don't need to collaborate with another person to write a How-to. That's the reason that it seems to work. People can just go off on their own and write a document about getting the Croatian keyboard mapping to work under Linux. We have How-to's like that. If you write that and you send it out there, it kind of takes on a life of its own. Multiple people might contribute and collaborate on one How-to, but more often than not, it's somebody doing their own -- scratching their personal itch to put it in Raymond-esque terms, right?

Dougherty:
So, in many respects, people point to the Linux Documentation Project as at least one of the better models out there for trying to organize information that's needed for new users. But, I know from talking to you it's not necessarily a solved problem and there are a lot of interesting things that need to happen, not just for LDP but really any of the open-source documentation projects. Can you talk a little bit about some of those?

Welsh:
What's happening now is that there are just so many sites that you can go to to find Linux information that the real problem is just navigating all that information. Originally, if you wanted to learn something about Linux, it was pretty straightforward. If there wasn't a How-to about it, then nobody knew how to do it. I mean it was easy to go to this one site and to find the How-to's and to do that, so now there's this proliferation of Web sites. The real challenge is to create the Linux portal. Several people are interested in getting involved in that space. How do I distill out this information and provide a way of searching for what I want and navigating to the information that I'm looking for. That's the biggest remaining problem with documentation. We've got lots of content out there, lots of very good content, some bad content. How do we make people aware that that content even exists?

Dougherty:
OK. What I see is that these documents become living documents. They might pass through the hands of several different authors over the life span, just as here at O'Reilly we have books that go out of date. There's documentation that goes out of date and it's not always clear from an open-source perspective, how can I go about changing it or improving it or updating it.

Welsh:
It's pretty informal. The documents are effectively in electronic form and the How-to's especially are short enough that it's pretty easy for it to change hands? An entire book is a little bit harder to do that with. I had problems getting my original Linux book, the "Linux Installation and Getting Started Guide," into the hands of some people that could actually take care of it. So, getting access to writers that know what to do with a particular document is a lot easier when it's shorter, right? And, so that's one of the ways that works in the How-to's projects is that when somebody says that I'm not maintaining the How-to anymore, they find somebody to do it and it's usually somebody who has made some contribution to that on the site.

Dougherty:
OK. Well let's -- you've moved on. You went from Cornell to Berkeley. Let's talk a little bit about some of the work you're doing there. You told me you were working with Java and Linux, which is sometimes a problem putting those two together, isn't it?

Welsh:
Yeah, I've said that this is like chocolate and cheese, I mean, you really like Java and you really like Linux, but the two don't always go together. One of the problems is that there's a number of competing Java implementations for Linux and it's not helped out by the various kind of commercial interests that the parties involved have. It's clear that there is enough interest in having Java on Linux that numerous people are going to be providing that solution. So, Sun and IBM and separately the Blackdown porting team as well as the Cygnus Redhat GCJ project and Kaffe and I could go on and on with the number of people out there trying to do Java on Linux.

We're interested in building really high performance workstation cluster applications in Java. We're taking what is called Beowulf, which is just a bunch of PCs in a cluster all running Linux. All of the applications are written in Java and there's a lot of reasons we're doing that. I could go into detail but it's not really all that interesting. The main thing is that we think that Java is a great programming environment for doing certain things that are very difficult to do in C and C++. So, if you're going to write your applications in Java, and plenty of people are doing that anyway, and you want to get high performance, what are the problems that you have to solve with that? That's where my own personal research is going.

Dougherty:
OK. Well, thank you, Matt, for coming in here and I appreciate your spending time with us today.