The hardware store

Recently Zach Tellman and Factual open sourced several libraries that they wrote to handle specific needs where nothing else existed. In the comments on Reddit some people were griping about the possiblity that this software might be abandoned in a year or two, and if they depended on it then they would be stuck. I think this mindset comes from a misguided and selfish perception of open source.

As a new software engineer, it can be attractive to treat open source libraries, applications, and frameworks like a hardware store. If you have a problem and the standard library doesn’t address it, then pull in a dependency. Need a util function? Do a search on GitHub, and add in a dependency. Want to take advantage of this new fangled single page app craze? Pull in another one. Need to process XML with Ruby? Just install Nokogiri in a minute andyou’reawaylaughing.

This may work for a while, but it misses something vital about software engineering: software decays over time, a.k.a. entropy. Software doesn’t exist in an isolated system, it interacts with other systems that change over time. These are things like your operating system, memory, other external services, databases, CPU’s, network, I/O devices (printers, displays), and most importantly users. These systems are replaced or updated with newer systems. Sometimes the changes are backwards compatible, sometimes they aren’t. As a consequence code is written once but maintained forever. Using someone else’s open source code can be a great help for you, because you don’t have to write it. However as things change over time, entropy will kick in and the code will need to be maintained and updated.

It takes a village to raise a library

Instead of treating open source software as a hardware store, I think a better metaphor is that of joining a village raising a child. Every dependency that you pull in needs to be maintained over time, along with it’s dependencies too, all the way down. The question isn’t about whether the maintenance will need to be done, it’s who is going to do it? Larger communities have more resources and time to do this. Older projects have been refined and battle tested and have more stable API’s. If you’re working in a young, obscure, or fast moving language then more of the maintenance may fall on you.

I think of someone releasing open source software as a gift to the world, not as claiming a responsibility to maintain it for you. Some projects do claim that responsibility, but it’s not automatically conferred just because someone released a project on GitHub. I think much more of the responsibility falls on the person using it. It’s your code that will be using it, your code that will need to be upgraded, and your code that will break. These are a few things that you should ask before you start using a library or framework.

What does this software depend on? What are it’s dependencies dependencies? Are they reasonably up to date? Are there any libraries here doing weird things with classloaders, bytecode, or messing with the runtime? These are much more likely to break in new versions of your language or runtime.

How much benefit do I get from using this instead of another library or framework, or just writing it myself?

Is this library well written? Are there comprehensive tests for the code? Do they pass?

Does the author recommend this for use in production, or is it just a proof of concept or exploratory idea?

Does the author have a history of maintaining open source software? Are they using it themselves? If I want to add a feature or fix a bug, is the author likely to accept it, or is it ‘open source with no pull requests’? This is fine by the way, it just means that you may need to maintain your own fork if your needs diverge.

If this is a database driver, is it promptly updated for new versions of the database? For example Netflix’s Cassandra driver Astynax is lagging behind the latest version of Cassandra.

What is my and my employer’s risk tolerance?

Do I have the time, permission from my employer, and capability to maintain or improve this library myself?

If relevant, has the library been through a security audit?

What does the author say about API stability?

What does the issue tracker for a project look like? Is the author responsive or are they not involved anymore?

Is the license compatible with the rest of your software?

If this is supported or released by a commercial company, are they likely to change the license or save important features for enterprise customers in the future?

Is there a common API that multiple libraries implement so I can swap between them? In Java these are things like JPA and XQJ that can reduce lock-in to one library.

When was the last nontrivial commit? How old is the whole project?

Is there a community of users around this? Is there a mailing list?

What is the lifespan and criticality of the code I’m writing?

Once you’ve answered these questions, you’ll have a much better understanding of the risks you inherit from using the library and the likely future direction of the project. If you do decide to adopt it then I recommend joining the mailing list, watching it on GitHub, or otherwise staying up to date with changes.

alt.deps.options

Pulling in a dependency should be a considered approach, and there are several other options to look at first:

If you only need a small amount of relatively simple code and the license allows it, then just copy that code into your project.

Make sure that the standard library doesn’t offer something comparable. If it’s just a wrapper library around another dependency, can you use that one directly?

If you’re pulling in another dependency for a data structure, is there an alternative algorithm you could use which doesn’t require this data structure?

Is this something that you should write yourself? While this isn’t always the best option, sometimes there’s nothing around that meets your quality standards and you will need to build it yourself.

Is there a commerical option? While open source is free as in beer, it’s also free as in baby. Paying someone else to maintain software adds incentive for them to continue maintaining it for you and may be a preferable option for many businesses.

fin.

Open source software is a massive boon to programmer productivity, saving man centuries of effort. But remember, just as you own your availability, you also own your software and everything that goes into it.

I didn’t get to this point on my own, I owe a big thanks to Colin Taylor, Derek Troy-West, and Mark Derricutt for their advice on this, and rejecting my code reviews when I pulled in unnecessary dependencies!

Some wise words from Rich Hickey, the creator of the programming language Clojure. The context around this was people complaining that Clojure requires patches to be created, rather than allowing GitHub pull requests.

I’m responding to Jay here (because we’re friends and I know he can take it:), but this is for everyone who feels similarly:

I prefer patches. I understand some people don’t. Can’t we just agree to disagree? Why must this be repeatedly gone over?

I’m not sure what value you think a message like this is going to provide to the thousands of participants in this list. Does it make you feel better? It will not convince me otherwise.

Here’s how I see it. I’ve spent at least 100,000x as much time on Clojure as you will in the difference between producing a patch vs a pull request. The command is:

git format-patch master --stdout > your-patch-file.diff

There are two sides to change management - production/submission, and management/assessment/application/other-stewardship. People who argue that the process should be optimized for ease of submission are simply wrong. What if I said patches were twice as efficient for me to assess and manage as pull requests? (it’s more than that) Do the math and figure out how the effort should best be distributed.

I don’t think asking for patches is asking too much, and I truly appreciate the people who are going to the extra effort. And, I respect the fact that people disagree, and they might decide not to participate as a result. However, I don’t need to justify my decision over and over. How about a little consideration for me, and the other list participants? There is a real diluting effect of get-it-off-my-chest messages such as these on the value of the list and the utility of reading it for myself and others.