Producing Good Software From Academia

Writing and maintaining good software from academia isn’t easy. I’ve been thinking about this because last week my student Yang Chen defended his thesis. While I’m of course very happy for him, I’m also depressed since Yang’s departure will somewhat decimate the capacity of my group to rapidly produce good code. Yang looked over my group’s repositories the other day and it turns out he has committed about 200,000 lines of code while working with me.

Specifically, I’m bummed about not having a very good story for maintaining tools like Csmith and C-Reduce. Ironically, I was the original C-Reduce developer and maintained it for several years but then Yang stepped in and wrote 30,000 lines of C++ that made it get much better results — and this work became part of his thesis. However, I’m not super interested in maintaining his code myself: it’s pretty big and pretty hairy and it interacts closely with Clang — a fast-moving project that requires active effort to keep up with. Csmith was never my code base, though I did contribute a little in its early days. The intent was always to throw it away after we figured out the right way to do random program generation. Well, guess what? That goal remains just a goal, and in the meantime it would be nice if we could keep working on Csmith, which continues to be useful to people. Prior to C-Reduce and Csmith my group produced a number of tools that I’d have liked to maintain, but that got abandoned due to lack of development power.

I suspect my situation is a common one for mid-career CS professors who work in systems, software engineering, security, PL, and other engineering-oriented parts of the field. So how should we go about producing good code? Here are some possibilities.

Forget About Coding

I’ve heard people say that it’s not our job in academia to produce good software. I find that to be a silly position to take. Our job is to do research that has impact. If we need to produce good software to do that job, then producing good software is part of the job. There are plenty of other reasons to write code:

Ideas without implementations often have gaps and flaws that would have become immediately apparent if an implementation had been attempted.

When you write code, you explore and reject a large number of program designs; with each rejected choice you learn something. I’m convinced that the cumulative effect of these small pieces of design feedback is very important in developing and maintaining a solid intuition for research.

Feedback from users is also very valuable; you can’t get users without writing code that is at least moderately usable.

Some researchers are capable of doing top-quality work without these types of feedback. Many more are not.

Embrace Crappiness

Even when we reserachers do produce software, the code is often released as a versionless tarball with no particular license. The makefile tends to refer to specific paths on the system where the software was developed. There might be a few test cases and if we’re lucky a README. The tarball was uploaded around the time the final copy for some paper was submitted and it has not been updated since then. In all likelihood, the only way to compile it is on some old version of Linux. This version, needless to say, must be determined by trial and error since it’s not mentioned in the README. Is this sounding a little bit familiar? Also see the CRAPL. Of course, a crappy software release is better than none at all, and at least it serves the goal of reproducibility, assuming that compilation challenges can be overcome. Due to sites like Github, the research software has improved at least a bit, but the bit-rotted tarball is far from dead.

I’m not going to pretend that I haven’t repeatedly embraced crappiness. In fact the ability to do so is one of the main perks of being a researcher in the first place. On the other hand, when an idea ends up having legs, the crapware option stops making sense.

Professor as Hacker

It’s no secret that maybe a third of CS professors are completely inept at writing code. At the other extreme we have people like Xavier Leroy and Matthew Flatt who are principal developers and maintainers of large pieces of high-quality software. When I go to lunch with Matthew he’s always like “Yeah… I really didn’t like the ARM code coming out of Racket so I rewrote the jitter this morning.” Most of us are in the middle somewhere and as far as I can tell, many of us are bummed out that we have very little time to write code anymore.

So in case that wasn’t clear, the problem with maintaining the code ourselves is that we don’t scale very well. For example, I can and do maintain some small things such as the 2,000-line main body of C-Reduce that is now dwarfed by Yang’s C++, but I can’t keep taking on new tasks.

Hire Long-Term Research Staff

A time-honored way to produce high-quality software from academia is to run a research empire large enough that it can sustain full-time research stuff over a period of at least a decade. These staffers are people who like academia and are proficient at code craftsmanship, but who lack a desire to run the show.

For many of us, the problem with this plan is that empire builders become managers who spend most of their time acquiring and keeping grant money. The real work gets done via delegation. I recall a time when my advisor in grad school was going to about one PI meeting per month — that cannot have been fun. Empirically, there’s very little overlap between the set of professors who are PIs on major grants capable of supporting long term staff and the set of professors who write code and are otherwise in touch with the low-level details of their operations.

Leverage Open Source

In this scenario, a research group develops a prototype, open sources it, and then a community of volunteers takes over subsequent development. This is the dream outcome. It could happen, but I don’t think it is very common. We’ve gotten a number of patches for Csmith but most of them have been either relatively simple cleanups or pretty substantial sets of changes that were done to support someone’s custom compiler and that we haven’t been too eager to integrate into our version of Csmith since they would make our own work harder. Csmith isn’t very modular. With C-Reduce, we’ve had better luck — we’ve gotten plenty of good bug reports, a number of patches, and even some new features. But overall, it is not easy to build a new open source community. George Necula, who developed the wonderful CIL tool, once complained that CIL had a lot of users but very few were contributing back. Maintenance of CIL was eventually taken over by volunteers, though it may have now stagnated a bit — LLVM has moved into this research niche in a big way.

A Chain of Students

A final alternative is to try to recruit students who have, or can acquire, strong code craftsmanship skills and then put some of the code development and maintenance burden on them. In this model a new student spends some time developing and maintaining existing codes before, and in conjunction with, working on her own projects.

As I said at the top of this piece, this is a strategy that I have used to some extent. It sort of works but has various problems. First, a very light touch is required: it isn’t fair to burden students with too much work that does not contribute to their own progress towards a degree. Second, even PhD students aren’t around for all that long, in the larger scheme of things. Third, not all students are interested in or capable of maintaining large codes, or of writing code that is worth maintaining.

Conclusion

In general, academia is setup to reward quick projects that result in a few papers and maybe a few tarballs linked to grad students’ web pages. Long term development and maintenance of code requires either heroic effort by the PI or else big funding.

A possibility that I deliberately left out of this piece since it isn’t “producing good software from academia” is spinning off a company. I have a huge amount of respect for academics who form startups, but this option seems so invasive in terms of overall lifestyle that I haven’t seriously considered it yet.

23 thoughts on “Producing Good Software From Academia”

How about, instead of a tarball of source code, releasing a snapshot of a fully configured system in a virtual machine? (I guess that requires a bit more work than just documenting the dependencies though.) I think it would be sufficient to alleviate the feeling of guilt of releasing code that you have no intention of maintaining.

Ryan, sure, I think VMs will play (and are already playing) an important role in solving the software archaeology problems we are creating for ourselves. Of course they can also create problems, such as when I need packages A and B to work together but they require different VMs.

Ryan, Philip Guo has done some interesting work on giving research software a longer life (http://www.pgbovine.net/academic.htm). AFAIK, his work doesn’t attempt to address ongoing development, just the ability to continue running software in the face of library/OS/compiler/… version changes.

If the project produces enough interest from (and value to) industry, then you might be able to get some of them to support maintaining it.

One, totally off-the-wall idea would be to set up a source-repository + build-cluster + continuous-integration instance using one ore more public clouds that’s paid for via donations (PayPal, bit-coin, etc.) from whoever benefits from it. Ideally, a single service could be set up to support many different projects. As a fringe benefit, this would allow cross project indexing (find me all usages of my project in other peoples projects) and even cross project semi-automated refactorings (hey, I’m changing the API to my library, here’s a pull request that updates your project to use the new API!).

Nice article! What are the obvious examples of successful long-running code bases still in the control of one research group? (In the theorem-proving world, it seems as if the long-running systems are all basically still owned by just one group.)

Hi Michael, I was thinking about doing a post about examples of good academic software. A few that come to mind are: various SAT/SMT solvers like CVC4, some programming language implementations like Racket and OCaml, some distributed systems like Condor, numerical stuff like ATLAS and FFTW.

Hi bcs, I wonder if something like what you suggest is the next evolutionary step for sites like Github? Github (and before it, Sourceforge) seems to have revolutionized academic software development (not that its influence is limited to academia, obviously).

One way out of the conundrum is to do the research, design, and initial implementation in academia, where there is time to do things properly. The fleshing out, turning into a whole product, polishing and maintenance can be done by a company employing the appropriate subset of the programmers. Of course this assumes that what is being built has some value to users; not all programs written in academia need to fulfill that.

In our case [1], the MetaEdit+ language workbench was made by the MetaPHOR research project at the University of Jyväskylä, and commercialized by MetaCase. Initially no programmers worked at the company, then some worked evenings to do the minimal non-research work necessary to make a plausible product, and eventually those researchers who were interested moved full-time to the company. The research project produced ten PhDs over six years, the company is now 22 years old and going strong, and in a recent experiment by researchers building a competing product, our program was shown to still be 10x faster to use than the best of the commercial and open source competition. (I’m really sorry about the extremeness of that figure – see Fig. 2 in [1] and http://tinyurl.com/gerard12 for the data. I only quote the figure to show that if the foundations are laid in academia, with good people and without unnecessary time pressure, the results can be far beyond what is common in commercial or open source development.)

I certainly don’t think this model will work for everyone – we’ve obviously been fortunate in many ways – but I do think there are elements that would benefit many projects.

Great! These limited resources for maintenance and perfection are no different from what other people are experiencing in industry. More of us in SE research should immerse ourselves in the practise and arrive in these frustrating situations. It gives a sense of reality and purpose. We decided to redesign and reimplement from scratch to arrive at a much smaller and much more flexible system. The result has produced research in refactoring and comparing design patterns as side effects, among other things. More of these frustrations and perhaps we will arrive at more solutions!

PLT has a history of delivering good, usable software from academia. So here are some thoughts, first some general thoughts, then a PLT-specific one:

1. The reward model in academia (CS) is wrong.

CS departments are set up to reward individuals who can solve well-defined, small problems. An author can explain such problems on a page or so, can describe a solution in five or six pages, wrap a motivation and some comparative prose around it and voila you have a conference paper. If you write plenty of them and you don’t lose the current research fashion out of sight, you can quickly build a citation record, especially if you are ‘nice’ and cite everyone else and some of those cited cite you, and so on.

Universities and CS departments count beans: the number of papers published, the number of papers cited, the H I J K L and M indexes, the number of dollars brought in — as if the latter had anything to do with the quality of your research or the degree of your impact.

Final thought here is that it is mind-boggling that we have colleagues who don’t write 1,000 lines of code per year in __any__ language not counting LaTeX and relatives. I consider your 1/3 estimate low. It’s like running a symphony with virtual (non-practiciing) violinists.

2. The reward horizon of CS departments (and grants) is all wrong.

CS departments want more beans every year: more money, more papers, more PhD students, more stuff. It is an absurdity that we even mention the quarter-oriented nature of publicly listed companies. We are worse.

3. Individuals — especially those with tenure — buy into the reward model w/o much thought.

When you have tenure, you have a responsibility to explain to your colleagues over and over again, that this model is a mistake. But yes, I know the problem. I do complain but I am not a politician so I don’t go anywhere. Perhaps deep down I think “you deserve what get because you’re asking for it.”

And so we are sending out a signal to junior colleagues that more papers|grants|PhD students is all that matters and they become tenured and they continue to play this game.

I wish I had allies.

4. PLT has followed a course of ‘distributed realm’ (I don’t want to call it empire).

Each of contributes what we can do best. Matthew hacks Racket, Robby hacks DrRacket and Redex, I employed a research staffer for 20 years and trained PhD students whose code migrates somewhere. I am the one who is most removed from code, but I still get to maintain a frequently used library and write a small project every year. We try to raise money with and for each other.

The other part is that we also agree that we need to act a little bit like a hidden company. A start-up is the wrong model, because it is all about survival (7/8 fail) and when survival is at stake, ideas and code go out the window. So you really need to think steady-state company with a flow of people and capital. My students’ code on contracts flows into Robby’s world. My students’ code that modifies Racket (rare but exists) flows into Matthew’s. All of our students are encouraged to participate in this model and to fix bug in any component that they think they understand well enough.

20 years and still growing. But perhaps there are no general lessons, just luck/hard work and wonderful coincidences.

I consider myself blessed with such real colleagues. Thanks Matthew and Robby and Shriram and Jay and Sam and John and Kathi

I’ve thought about this a lot, as a systems builder who recently finished a PhD, and realized that I threw out about ~2-4 years worth of software development time. In many ways, this is a waste. On the other hand, the small focused experiments I did would have required a ton of additional effort to become broadly useful.

I think commercialization, either via venture-backed startups or adoption by large tech companies, is the “right” way to maintain research projects. There are many examples of people doing this (Spark, Hadapt, Vertica, VectorWise, LLVM to a certain extent). I think the computer science academic community would benefit if they began to recognize this as “successful research”, rather than only counting papers. Today, researchers are “penalized” if they pursue this path, as it distracts from publishing papers. (Although there are lots of people willing to help, if you don’t want to commercialize it yourself).

The problem is that this leaves no solution for projects that are valuable to *researchers*, but not to the broader public. CSmith might fall in this bucket, as do things like NS2, or benchmarks, etc. These tools are invaluable, as they make it significantly easier for researchers to do their work, and avoids half-assed reinvention of these tools. However, today these projects are doomed. The only solution is again to change the incentive systems to consider this work to be “real” research, or to convince the funding agencies that supporting these projects financially could be worthwhile. If the “hard sciences” get grants to buy equipment, why can’t CS get grants to develop and support custom designed tools?

Very good questions, and a lot of interesting comments already. Here are a few more.

I’ve learned through painful experience that, realistically, you can’t maintain good software with only a half-hearted effort. You either need a nearly full-time maintenance effort on it, or you have to write throw-away crap, or you have to write something good but then just mothball it and barely keep it going. I don’t really know what other options there are.

To echo others, if you want the software to stay good, you probably have to associate a commercial effort with it. Typesafe does this for Scala. Semmle does this for Oxford-style Datalog. In both cases, the people involved are pausing their academic careers while it happens, so to an extent this answer amounts to “well don’t be an academic”.

Note that while this article is about academia, the same thing happens in industry. I have worked on several giant code bases at this point, and I don’t think I am disclosing anything to say that every one of them has a large amount of legacy code that is essential to the running of the company. The same thing happens as with Ph.D. students: someone or some team wrote it 5-10 years ago, it became a hit within the company, and now those people are all gone. Everyone who is left has their own quarter-driven 🙂 goals that they are trying to achieve, so there’s a little bit of hot potato in who will have to fix the next bug in the legacy code.

On a positive note, it makes a big difference to write test cases and a repeatable build, and run them on some builder. If you do that much extra, you can at least keep the legacy code working even though nobody is really developing it.

Additionally, don’t blow off code review just because you are in academia. Github makes it mechanically easy to do, and it really does make sense even with one-person projects. Find someone like-minded and become “review buddies” with each other. It could be a prof and a student, or two students, or two profs. It is really eye-opening how much crap you will see once someone forces you to look at your own code.

Ben and John — thanks for the compliment about CDE! Ironically, I no longer have time to maintain it; it saddens me to have to respond that way to users, but that’s no way I can start my academic career and have enough time to maintain CDE, especially because that’s no longer a component of my research narrative going forward.

on a side note, Docker http://www.docker.io/ seems to be a production-quality tool (maintained by a software company, I think) that tries to solve some of the problems that CDE addressed back in the day.

Sure you’ll get nothing out of open source if you fail to understand how open source works. You don’t get a successful open source project by just dumping the code out there. You have to build a community around the code. You have to develop things openly *from the beginning*. This is especially important if you hope to pass maintainership off to someone else, as you’ll need a strong community by that point for it to work.

Very nice thoughts, and much to agree with here. My own views / experiences are summarized in the following short piece – it focuses on natural language processing as a domain but programming is programming 🙂

I personally like to see GIT repos (preferably on github). I maintain my SAT solver there, but that’s still somewhat rare unfortunately. Many people (including academics) are ashamed to put their changes online because they think they are bad coders. I think that’s not a good idea, since everybody is fallible and being open about it means you can get feedback and improve. But openness and academia… that is worth a book.

1. I’ve long mulled the ‘leverage open source’ route. Jikes RVM is a great decade-long example of the problem. As a ‘product’ it can never compete with something from Oracle funded to the tune of hundreds of millions over a decade or so. However, it is powerful as a research platform in a way a product never can be (v. different design criteria). So what you have is an open source project whose end users are exclusively researchers (1/10000 the size of the open market??). Both the very fact that it won’t ever be a ‘product’ and the narrowness of the market seem to make it very hard to tap into the general pool of open source contributors. So what happens is you end up with a large number of academic users and just a small handful of contributors. The brokeness of the reward model then amplifies this. I agree that we need to push back.

2. I’ve started an experiment here of identifying and then employing our best systems undergrads to work over the summer and part time during the year on our large systems projects (all open source). This morphed out of my experience with the Google Summer of Code (which I don”t think works well for research systems that often require deep expertise to get going). Relatively modest grants seem to go a long way when spent on this.