Sharing Technology for the Public Good

We’ve been talking to government IT departments at many levels about how to use open technologies, and how to release their in-house code for other jurisdictions to use. In that process, we’ve made a couple of profound discoveries:

Everyone would love to open up their code, as long as it doesn’t cost anything to do so.

That is usually followed by an even more profound thought:

But we all know that there is a cost to going open. So the real question is: what are the benefits? What’s our return on investment going to be?

The answer is “Possibly significant — if you do it right.”

To explain why, I’ll start with a quantitative example, then look at the harder-to-quantify picture behind it.

One of the collaborative projects I’ve worked the most on is Subversion (a system for tracking changes — ”versions” — made to files and folders; hence the name). Subversion was started by my then-employer, CollabNet. They needed a better version control system for their customers, as part of a larger hosted online collaboration service, and realized that ubiquity and clear lack of lock-in would be strong assets. So CollabNet decided to release Subversion as open source software from the very beginning, and they knew, from past experience with open source projects, that they’d need to put some effort into drawing a community around the code and making it easy to collaborate on the project.

At different points in the project — the first time starting three or four years in, if I recall correctly — various people at CollabNet have tried to measure the “amount” of contribution coming from outside. I put “amount” in quotes because this is a very tricky thing to quantify, and before I give the numbers, I want to put a huge warning label on them: evaluating software productivity is hard, your mileage may vary, past performance is no guarantee of future gains, etc.

But I do think what they found is meaningful (I’ll explain why in a moment):

As near as we could tell, 75% of the code came from contributors who were not paid by CollabNet; the remaining 25% came from CollabNet’s own developers — including the ones who spent part of their time on engagement with the development community, e.g., evaluating contributed changes, prioritizing bug reports, etc.

If you take that at face value, it’s a 4-to-1 return on investment for open-sourcing their code and stewarding the community with care. Not bad — cash-strapped civic IT departments take note! But should we take the figure at face value? Are lines of code a reasonable thing to measure here?

In this circumstance, I think they are. While lines of code are not recommended for measuring individual programmer productivity, they are a decent way to measure relative amounts of code, in aggregate, in a mature project. The code in Subversion is pretty well vetted, precisely because it’s an active open source project. There aren’t a lot of unnecessary lines in it — anything that makes it into the codebase and stays there has passed the sniff test of a developer collective.

And actually, that’s the real story here. The quantifiable contribution ratio — 3-to-1, 2-to-1, 4-to-1, whatever — might vary based on a lot of factors. The true “RoI of open” usually shows itself before a given piece of code makes it into the project. Many times one of us, the CollabNet-salaried developers, would post a proposal for a feature design, or even post a concrete implementation, and the non-CollabNet community would find bugs and potential improvements in it. They would also contribute new features themselves, in some cases quite major ones. They contributed documentation, and frequently handled user feedback, often integrating it into the project in the form of actionable bug reports. All of this directly benefitted the software and should be counted as part of the return on investment too. (In fact, a great measurement would be to look at which developers respond to bug reports — that is, who has the necessary conversation with the original reporter to turn the report into something useful — and then correlate that with which bugs actually get fixed. I’ve not yet had a chance to do that kind of study, but my educated guess is that the wider development community’s constructive role would loom pretty large.)

These sorts of effects are ones you can see in almost every open source project, as long as it’s active and has users. You can’t always easily figure out exactly what percentage of the code was contributed by whom, but you can see, just from watching the project logs and forums and bug database, that having more people join the project usually benefits all the participants. And because development communities tend to evolve mechanisms for absorbing their own communications overhead, it is normal in a healthy project that the costs of collaboration grow more slowly than the benefits do. This is true for every participant, not just the originator of the technology.

What does this mean for a tax-funded, budget-conscious civic IT department trying to figure out whether it’s worthwhile to open up some in-house code?

It means that if your jurisdiction

plans to use the software for a long time, and

therefore plans to maintain the software anyway, and

it’s something other jurisdictions might need too

…then there’s a good chance that opening it up could be a responsible decision. We can’t say it always will: every project can have idiosyncracies and should be looked at on a case-by-case basis. But if you’re considering opening something up, and are examining the costs of doing so, maybe the above will help clarify the potential benefits. When someone from another jurisdiction finds and fixes a bug, your users benefit as well as theirs. There are well-established techniques for forming these development communities and making sure that the bugfixes and feature enhancements flow into the core code and eventually back out to all the users. Part of Civic Commons’ mission is to make sure those techniques are widely known and properly adapted to civic technology projects.

In the best case, the more jurisdictions use the technology, the more each of them wins, as maintenance and development are amortized over all of them. The benefits of that can be great enough to outweigh the costs incurred by setting up the collaboration in the first place.

So there can be real return-on-investment in releasing your code. With projects like the Enterprise Addressing System and the IT Dashboard, we’re trying to show how in detail, and to adapt open collaboration to the particular circumstances of civic IT projects.

When I was with the Library of Congress in 2001-2003, we were working on a digital repository system to allow libraries all over the world to catalog and eventually combine their contents. While the overall effort was supported by all the right people from all the right groups, the problem we kept facing as a government agency was Licensing.

Under the terms of the contract, we gave it to the Library under one license but they were required to release it public domain but it incorporated the works of other projects (Tomcat, Lucene at the time) which introduced different licenses. While it wasn’t impossible to figure out the proper process, it slowed things down and killed off many efforts prematurely. If a government agency – at any level – is going to get involved in Open Source, they need to make sure that the ways and rules under which they choose to participate are clearly laid out and unambiguous. It saves lots of headaches for everyone involved.

Karl,
I can see how svn would work well, it is a well defined idea, and there have been, and are, many products from which a basic set of requirements could be drawn.

However, in other projects this basic set of requirements might have a core of commonality, but with various features that are unique to each group.

How did you decide which features went in and which didn’t.
I am thinking if I coded up some abomination to svn and checked it into the code base that only benefited 10% of the community, but impacted 100% of the community .. who made the decision to tell me “no, that feature is not going in”?

Summary, even open source needs an “owner” who can care for and direct the product. Someone who is open minded enough to understand other peoples needs, but selfish enough to ensure the product doesn’t drift too far from its fundamental focus.

JonHB

Where did you learn math and your understanding of what ROI means? Collabnet only contributed 25% of the code while 75% was from other contributors – that would be 3 to 1. However, ROI isn’t about the sharing of costs, but the benefit from the costs you contributed. Read the words – RETURN ON INVESTMENT – you’re not answering what the return is. How can you measure the return by only looking at the contribution costs?

You also state that “CollabNet decided to release Subversion as open source software from the very beginning, and they knew, from past experience with open source projects, that they’d need to put some effort into drawing a community around the code and making it easy to collaborate on the project.”. So, what was the cost of that effort?

Seriously, some of you guys on the coding side need to get in touch with the bean counters so you can really understand investment and returns.

JohnC

JohnHB: Right on! You hit the nail on the head.

In truth I bet the return on investment was deeply negative. Factoring in the extra publicity for Collabnet which might or might not have sold extra licenses / billed out extra time (Not sure how they make their money) how could there be any “return” on an open source project?

All they saved was the cost of buying a version control software package but at the cost of paying those 25% people to mentor and code for it for as long as they’re involved.

All those 25% of developers were doing is undermining jobs for their own bretheren and themselves.

Why programmers are in such a rush to end the viability of programming as a career is beyond me.

http://civiccommons.com/ Karl Fogel

JonHB,

I thought pretty carefully about the math, sorry if I messed something up. My reasoning was: if you put in 1 unit of investment, and get exactly one unit in return, that’s a 1:1 ROI (the units are abstract here, but bear with me for a moment). If you put in 1 but get 2, therefore, that must be a 2:1 ROI. Since CollabNet put in 1 but got 4, that’s a 4:1 ROI. I.e., the 3:1 you are claiming would be if CollabNet had written 33% of the code, not 25%.

Since the point of this post is to contrast the costs of going open with the benefits, it’s reasonable to normalize the units by saying that had CollabNet developed its new version control system in a closed fashion, that would have been a 1:1 ROI for our purposes. That’s not 1:1 in the “we put X dollars in and got X dollars out” sense. Rather, we’re looking at the derivative here: it’s 1:1 in the sense that it’s the baseline against which we’re comparing another method. In other words, if the closed method were to achieve A:B ROI, and the open method achieves C:D, and the ratios between D and C are the same as between B and A, then the “ROI” of doing it open was just 1:1. On the other hand, if the ratio of C/D is twice as big as A/B, that’s 2:1. Etc, etc.

I think you’re right to ask “what was the cost of that effort?”, regarding CollabNet’s overhead just for being an open source project (i.e., engaging with the other developers, etc). As I wrote, that was included in the CollabNet developer’s salaries — it’s just one of many things we spent time on. On the other hand, it would be fair to ask how much code we could have written if we didn’t have to do that (in other words, accurately calculate the B side). I am positive we didn’t spend more than half our time on that; probably a good deal less, as a team. So if you take our number of lines of code and double it, that brings us from 25% (of the current total) to 50%, and what we would lose would be 75% (again, of the current total, not of the lesser program we would have written if it were closed). Thus it might be somewhere between a 4:1 and a 3:2 ROI. That’s a big span, and I’m sure you can appreciate how hard it is to calculate the true figure, if there even is one. I know, for example, that some of the toughest code in the program happens to have been written by non-CollabNet contributors. And a lot of the communications overhead was for discussions we’d have to have had among ourselves anyway; we just had more participants, and sometimes they shortened the discussion rather than lengthening it. I’m not sure how to account for things like that.

Not sure what to say about JohnC’s followup comment. I and many people I know have been making their living from this for a couple of decades now. Obviously, someone must be willing to pay for open…

http://civiccommons.com/ Karl Fogel

Credzba,

The point you raised has been raised by others in private correspondence. I think these reservations are well-founded. A jurisdiction has to make the decision for each project, taking all factors into account. But note that:

* There are many customizations that are truly of general interest. (For some of the projects I’m working on opening up right now, it is clear that the majority of customizations being discussed are of general interest).

* Customizations that are specific to a local data format are still useful to hear about — to “socialize”. They are where the opportunities for format unification bubble up from. I wouldn’t expect every one of those opportunities to be taken, but some of them will be.

* When you generalize core code to allow foreign input or output formats, you also make the code more useful for yourself. Because the next month, you may need to add a new input, and the generalization will pay off then.

Again, I’m not saying the decision is always “open it”. It won’t always be. But there are common situations in which even apparently local customizations actually benefit the original creators of the code.

Figuring out who the “owner” or project steering committee is is not unique to civic IT projects. It’s the deep question of open source governance, and that’s a whole separate post. Suffice it to say that the threat of forks usually causes projects to make the right decisions about what to absorb and what not to.

http://civiccommons.com/ Karl Fogel

Keith,

Coming up with — and testing — policies for sustainable government participation in open projects is exactly what Civic Commons is about, and we’re working on it right now. And we’ll always be able to give licensing advice; that’s something where there is no substitute for experience, and no way for everyone in a project to acquire the experience needed (nor should they have to).

By the way, thanks for your kind words about the book. I’d seen your review before, and thought I’d added it to the book’s reviews page, but it wasn’t there — fixed now.

Daniel D

Subversion I can understand as it is a tool of benefit to a large community, who all share common requirements and all happen to be developers. Kind of handy for contributions don’t you think? But how do you extrapolate that into civic type projects? Are you assumming that I am just busting at the chance to contribute code in my spare time for a council rates application or for my local water utilitiy?

I don’t think so.

Dan

http://civiccommons.com/ Karl Fogel

Dan,

I’m assuming that current civic IT developers will get involved in projects when it is advantageous for them to do so — and that they will have management support, because management will see concrete benefits from it. (E.g., Google is not involved in Linux kernel development out of charity. They do it because they depend on the software.)

http://3dblogger.typepad.com/wired_state Catherine Fitzpatrick

I found this article incredibly fake and unconvincing.

You don’t have to be a coder (I’m certainly not) to see the fatal flaw at the center of it (or maybe it helps NOT to be a coder).

You select as your example of a piece of software that had a lot of use and ROI a very basic and useful utility involving the SVN and the version controls, i.e. that *was about the software and its making itself*.

That sort of utility is NOT what social media is — say a Twitter or your average web site on Drupal or some other widget or app or whatever. Not at all.

Then that software isn’t just ‘about itself’ (tracking itself) but has to perform a function in society.

And there, no, you cannot get that sort of return.

In fact, the time expended (that SOMEBODY has to pay for, somewhere!) on open source making is a big subtraction — so large on many projects, and so *ongoing* (you don’t tell the truth about that, either) that it offsets any “ROI”.

The worst thing about your hijacking of various cities’ software under the guise of collaboration is that the people of these cities did not have a chance to become aware of this much less vote on it.

It’s just a bunch of geeks getting together on their networks and scratching each others’ backs. There is little incentive to demonstrate ROI outside this magic circle then.

JohnC is absolutely correct. The dirty little secret of all the open source illusion is that big IT pays for it and benefits from it — as does government, for that matter. And all those people working for free are paid either by big IT and do this even on company time, or their “spare” time, or Mom is paying for the basement or it’s the university. But it’s not really free.

People should be paid for their work. There’s nothing wrong with cities having proprietary code made by local firms with relationship with the officials. You are globalizing code in ways that accentuate all the negatives of globalization, without any of the universality of the positive.

http://3dblogger.typepad.com/wired_state Catherine Fitzpatrick

>Summary, even open source needs an “owner” who can care for and direct the product. Someone who is open minded enough to understand other peoples needs, but selfish enough to ensure the product doesn’t drift too far from its fundamental focus.

And who gets to decide what the product is and what its focus should be?!

See, this is why the ethics and culture of open source cannot be allowed to bleed into our real-life democratic government.

http://red-bean.com/kfogel kfogel

Catherine,

Regarding your objection to what is being measured (i.e., the “R” in “RoI”): I think I already answered that in my response to JonHB of October 18th, which you can see above.

Regarding “hijacking of various cities’ software under the guise of collaboration”: this kind of misleading language helps no one.

There’s no hijacking going on here; quite the opposite. In fact, when taxpayers pay for software and yet the code is not available under open source terms, that seems much more problematic to me. This isn’t private sector funding, after all — citizens paid for this code, and they should have full access to the fruits of their investment. The only question is, is it a good use of taxpayer’s money for the government entity in question to actively engage with the open source development community that springs up when code is made available? (And my answer to that is yes, it often is worthwhile.)

No one ever said that working on open source software was “free” in the sense of zero-cost. Perhaps you’re being confused by the term “free software”, which is synonymous with “open source”. However, the “free” in “free software” refers to freedom (i.e., the liberty to modify and redistribute the code), not to “free” in the “zero-cost” sense. If you were under the impression that I, or anyone else, was claiming that developer time comes at no cost, please shed it right now. No one is saying that.

From the point of view of an entity considering opening up some code, however, there is no reason to count costs paid by others. If I spend X hours a day maintaining closed code and get Y amount of progress per day, and then I open it up and spend the same X hours engaging with an open-source community on the code and get Y+Z progress per day, then the net benefit to me is positive. There is absolutely no fiscal reason for me to take on *their* costs as part of my calculations — they can do that for themselves, and presumably they wouldn’t be participating unless there were benefit to them as well.

The corollary of “developers should be paid” is “people should be trusted to make rational decisions about how they spend their time”. If some people seem to make irrational decisions, it’s not your nor my problem to solve that; and it’s quite possible that their decisions are actually rational if looked at with an open mind.

http://3dblogger.typepad.com/wired_state Catherine Fitzpatrick

kfogel,

Oh, I don’t think it’s misleading *at all*. I didn’t consent to a band of Silicon Valley irregulars roaming around the country in search of data to come scrape the data of New York City — I live here, I’m a tax payer, I vote here, and we did not have any say in this. You think that’s irrelevant? You think your buddies on the geek circuit get to do what they want? I don’t. I’m supposed to be a benficiary of your data scrapes and your data dumps, and I have questions. And that’s ok. That you try to transpose that into “non-helpful language” like a school-marm can’t conceal the issues here: you are not democratic. You are not deliberately. You just do what you want *because you feel entitled*. I don’t feel that way about you. Most people *don’t even know* what you are doing.

As a taxpayer, I’m more than fine with proprietary software. It’s proven, it has companies with boards, it has help desks with people who show up for work at 9:00 am every day, it has a known price to its license. All to the good.

Open source is a bunch of hackers who work odd hours, who aren’t available, who run up huge consulting bills and sell you a bill of goods about how wonderful and moduled things like Drupal are, when in fact they turn out to be a pain in the ass for many users and managers. No thank you.

Open is a terrible culture of tyranny and thin-skinned unwillingness to hear criticism and a continual bath of self-pity and belief in supremacy. No thank you.

No, dear, I’m not confusing any terminology. I’m versed in this topic, and I’m not stupid. But do have your bit of geeky “gotcha” enjoyment for the day if you need it. I don’t. I know that open source software always has ballon payments. The pitch is often that it is free, that it has no cost, that the license to obtain it is cost-free and that “the commmuuuuunity” will help — that being the legions of hackers and slackers who might or might not be available to help *their fellow geeks* speaking in *their own language* but not the user. Users who show up with complaints are batted away and told they are PICNIC.

You’ve got so many religious views about software, and you have absolutely no willingness to understand that the rest of the world — the ordinary users and the thinking citizens — don’t *have* to abide by your religion. They don’t believe in it.

I don’t have any religious belief — as you do — that it is “good” for the taxpayer to have opensource code and this “open source community that springs up”. This “commuuuunity” that you are all so enthralled with is just the closed circuit of geeks and api engineers. It’s not the larger public and it’s not people with the larger public interest in mind — they are *self* interested to perpetuate themselves.

My point is that when someone goes to get a proprietary system, either in a Microsoft box with a price tax and a license and a help desk fee known in advance, or as a contracted job that one can bid out and get various pricings for, that can be a much better buy in the long run than the open-ended OS dev balloon payments that inevitably come on these projects because they inevitably don’t work — and yet there is always unwillingness to admit this. I know some very big companies that have gotten some very big cats in the bag with open source and its huge dev payments.

So I refuse to let you talk through your hat. You can’t cost this out reasonably and rationally at all, because you’re willing to concoct this outrageous “ROI” concept based on…the figures from making an SVN. As I explained in my earlier criticism here, making some utliity which is an artifact of software production itself *can’t count*. That’s like saying you made an ROI on paper and paper clips.

A company with bottom-line costs has a way of managing time and resources and demanding outcome that the lovely open crew don’t have. The market success of Windows over Linux systems should make the case, but of course, it never does for people like you.

The amount of time wasted fooling around with open source on jobs is never honestly calculated. The amount of time you spend hacking around with other hackers isn’t really “on the clock”. If it were, it would be a shocking cost. You won’t admit that.

It would be one thing if you did your OS shtick as merely your hobby, like some people collect stamps, or merely as a cynical big IT gambit, the way IBM slurps up OS code out of various commuuuuunities and then ingests it as part of its profit-making enterprise — and you conceded that. It would be one thing if you always conceded CHOICE.

But you never do. You pretend to only when challenged. But you’re not really about that. Anyone with half a brain can see what you’re up to here: wiring up all the cities to your own elite set so you can run the politics. It’s the politics you are after, and the power. Again, no thank you.