Interview for French Translation of “Producing Open Source Software”.

The folks at Framabook graciously sent me some copies of the print version of their French translation of my book (in French, “Produire Logiciels Libres“, in English, “Producing Open Source Software“). They also sent some questions for an online interview to accompany the release, and Olivier Rosseler translated my responses.

From: Karl Fogel
To: Christophe Masutti, Alexis Kauffmann
Subject: Re: Interview french version POSS
Date: Fri, 11 Mar 2011 19:05:00 -0500
Christophe Masutti writes:
> Hi Karl, could tell a few words about yourself to our French speaking
> readers?
>
> The French version of POSS has just been published, and your book was
> translated or is being translated in other languages. What are your
> feelings about all theses remixes of your work, all made possible
> because you chose to put your book under a free licence?
My feelings are 100% positive. This has simply no downside for me. The
translation makes the book accessible to more readers, and that's
exactly what I want. I'm very grateful to all the translators.
> If you were to write a second version of POSS today, what would you
> change in it or add to it? By the way, do you plan on doing such a
> rewriting?
Well, in fact I am always adjusting it as open source practices change.
The online version evolves steadily; maybe eventually we'll announce
that some kind of official "version 2.0" has been reached, but really
it's a continuous process.
For example, five or six years ago, it was more common for projects to
run their own development infrastructure. People would set up a server,
install a version control system, a bug tracker, a mailing list manager,
maybe a wiki, and that would be where project development happens.
But there's been a lot of consolidation since then. Nowadays, only the
very largest and very smallest projects run their own infrastructure.
The vast majority use one of the prebuilt hosting sites, like GitHub,
Google Code Hosting, SourceForge, Launchpad, etc. Most open source
developers have interacted with most of these sites by now.
So I've been updating the part of the book that talks about hosting
infrastructure to talk more about using "canned hosting" sites like the
above, instead of rolling your own. People now recognize that running a
hosting platform, with all its collaboration services, is a big
operational challenge, and that outsourcing that job is pretty much
required if you want to have time to get any work done on your project.
I've also updated the book to talk about new versions of open source
licenses (like the GNU General Public License version 3, that came out
after the book was first published), and I've adjusted some of the
recommendations of particular software, since times have changed. For
example, Git is much more mature now than it was when I first wrote the
book.
> FLOSS is being produced pretty much the same way now than five years
> ago. But forges have appeared that differ from the SourceForge model.
> I'm thinking of GoogleCode, and especially GitHub. GitHub can be
> considered as the "Facebook" of Open Source forges, in the way that
> they offer social network functionalities, and that it is possible to
> commit directly from one's browser. The notion of "fork" here is
> different from what we are used to. What do you think about all that?
Actually, I think the notion of forking has not changed -- there has
been some terminological shift, perhaps, but no conceptual shift.
When I look at the dynamics of how open source projects work, I don't
see huge differences based on what forge the project is using. GitHub
has a terrific product, but they also have terrific marketing, and
they've promoted this idea of projects inviting users to "fork me on
GitHub", meaning essentially "make a copy of me that you can work with".
But even though there is a limited technical sense in which a copy of a
git-based project is in theory a "fork", in practice it is not a fork --
because the concept of a fork is fundamentally political, not technical.
To fork a project, in the old sense, meant to raise up a flag saying "We
think this project has been going in the wrong direction, and we are
going to take a copy of it and develop it in the right direction --
everyone who agrees, come over and join us!" And then the two projects
might compete for developer attention, and for users, and perhaps for
money, and maybe eventually one would win out. Or sometimes they'd
merge back together. Either way, the process was a political one: it
was about gaining adherents.
That dynamic still exists, and it still happens all the time. So if we
start to use the word "fork" to mean something else, that's fine, but it
doesn't change anything about reality, it just changes the words we use
to describe reality.
GitHub started using "fork" to mean "create a workable copy". Now, it's
true that the copy has a nice ability to diverge and remerge with the
original on which it was based -- this is a feature of git and of all
decentralized version control systems. And it's true that divergence
and "remergence" is harder with centralized version control systems,
like Subversion and CVS. But all these Git forks are not "forks" in the
real sense. Most of the time, when a developer makes a git copy and
does some work in it, she is hoping that her work will eventually be
merged back into the master copy. When I say "master" copy, I don't
mean "master" in some technical sense, I mean it exactly in the political
sense: the master copy is the copy that has the most users following it.
So I think these features of Git and of GitHub are great, and I enjoy
using them, but there is nothing revolutionary going on here. There may
be a terminology shift, but the actual dynamics of open source projects
are the same: most developers make a big effort to get their changes
into the core distribution, because they do not want the effort of
maintaining there changes independently. Even though Git somewhat
reduces the overhead of maintaining an independent set of changes, it
certainly does not reduce it so much that it is no longer a factor.
Smart developers form communities and try to keep the codebase unified,
because that's the best way to work. That is not going to change.
> In June 2010, Benjamin Mako Hill remarked in his "Free Software Needs
> Free Tools" article that hosting open source projects on proprietary
> platforms was kind of a problem. According to you, is this a major
> problem, a minor one, or is it no problem at all?
> http://mako.cc/writing/hill-free_tools.html
Well, I know Mako Hill, and like and respect him a great deal! I think
I disagree with him on this question, though, for a couple of reasons.
First, we have to face reality. It is not possible to be a software
developer today without using proprietary tools. Only by narrowing the
definition of "platform" in an arbitrary way is it possible to fool
ourselves into thinking that we are using exclusively free tools. For
example, I could host my project at Launchpad, which is free software,
but can I realistically write code without looking things up in Google's
search engine, which is not free software? Of course not. Every good
programmer uses Google, or some other proprietary search engine, daily.
Google Search is part of the platform -- we cannot pretend otherwise.
But let's take the question further:
When it comes to project hosting, what are the important freedoms? You
are using a platform, and asking others to use it to collaborate with
you, so ideally that platform would be free. That way, if you want to
modify its behavior, you can do so: if someone wants to fork your
project (in the old, grand sense), they can replicate the hosting
infrastructure somewhere under their control if absolutely necessary.
Well, that's nice in theory, but frankly, if you had all the source code
to (say) Google Code Hosting, under an open source license, you still
would not be able to replicate Google Code Hosting. You'd need Google's
operations team, their server farms... an entire infrastructure that has
nothing to do with source code. Realistically, you cannot do it. You
can fork the project, but generally you are not going to fork its
hosting platform, because you don't have the resources. And since you
can't run the service yourself, you also can't tweak the service to
behave in the ways you want -- because the people who run the physical
servers have to decide which tweaks are acceptable and which aren't. So
in practice, you can't have either of these freedoms.
(Some hosting services do attempt to give their users as much freedom as
possible. For example, Launchpad's code is open source, and they do
accept patches from community members. But the company that hosts
Launchpad still approves every patch that they incorporate, since they
have to run the servers. I think SourceForge is about to try a similar
arrangement, given their announcement of Allura yesterday.)
So, given this situation, what freedom is possible?
What remains is the freedom to get your data in and out. In other
words, the issue is really about APIs -- that is, "application
programming interfaces", ways to move data to and from a service in a
reliable, automatable way. If I can write a program to pull all of my
project data out of one forge and move it to a different forge, that is
a useful freedom. It means I am not locked in. It's not the only
freedom we can think of; it's not even the ideal freedom. But it's the
practical freedom we can have in a world in which running one's own
servers has become prohibitively difficult.
I'm not saying I like this conclusion. I just think it is reality. The
"hunter gatherer" phase of open source is over; we have moved into the
era of dependency on agricultural and urban infrastructure. You can't
dig your own irrigation ditches; you can't build your own sewer system.
It's too hard. But data portability means that if someone else is doing
a bad job of those things, you can at least move to someplace that is
doing a better job.
So I don't care very much that GitHub's platform is proprietary, for
example. Of course I would prefer it to be entirely open source, but
the fact that it is not does not seem like a huge problem. The thing I
look at first, when I'm evaluating any forge-like service, is: how
complete are their APIs? Can I get all my data off, if I need to? If
they provide complete APIs, it means they are serious about maintaining
the quality of the service, because they are not trying to lock in their
users through anything other than quality of service.
> In France, high school and junior high students don't have computing
> classes. Do you think computing as a subject -- and not only as a tool
> for other subjects -- should be taught in schools?
Absolutely. The ability to understand data and symbolic processing is
now very important. It's a form of literacy. You don't have to be a
programmer, but you need to understand roughly how data works. I had a
conversation the other day that showed this gap in a very clear way.
I was at the doctor, having some tests done. The test involved a video
image of my heart beating (using an ultrasound device), and the entire
sequence was recorded. It was amazing to see! So afterwards, I asked
at the front desk if I could get the data. Those were my exact words:
"Can I please get all the data from that echocardiogram?" The clerk's
reply was that they could give me a sheet with low-resolution pictures.
"Thanks, but I actually want the data," I replied. Yes, she said,
that's what she was offering. To her, the phrase "the data" did not
have the very specific meaning it does to the data-literate. What I
meant, of course, was that I wanted every single bit that they had
recorded. That's what "all the data" means, right? It means you don't
lose any information: it's a bit-for-bit copy. But she didn't have a
definite concept of data. To her, data means "something that I can
recognize as being related to the thing requested". For me, it was
informational and computational; for her, it was perceptual.
I realize this sounds harsh, but I really believe that is a form of
illiteracy today. You have to recognize when you are getting real
information versus fake information, and you have to understand the vast
difference in potential between the two. If I go to another doctor,
imagine the difference between me handing her a USB thumb drive with the
complete video recording of my echocardiogram, and handing her some
printouts with a few low-resolution still images of my heart. One of
these is useful, while the other is utterly pointless.
Increasingly, companies that have a deep understanding of data -- of
data about you -- have ways to use that data that are very profitable
for them, but are not necessarily to your advantage. So computing
classes, of some kind, are a form of defense against this, an immune
response to a world in which possession of and manipulation of data is
increasingly a form of power. You can only understand how data can be
used if you have tried to use it yourself.
So yes, computing classes... but not only as a defense :-). It's also a
great opportunity for schools to do something collaborative. Too much
of learning is about individual learning. In fact, schools outlaw
many forms of collaboration and call it cheating. But in computing
classes, the most natural thing to do is have the students form open
source projects, or participate in existing open source projects. Of
course, the majority of students will not be good at it and should not
be forced to do it. This is true of any subject. But for those who
find it a natural fit, optional computer classes are a great opportunity
that they might not have had otherwise. So as a chance to expose people
early to the pleasures of collaborative development, I think computing
classes are important. It will have an amazing effect for a subset of
students, just as (say) music classes do.
> Now one last question: what would be your advice to young programmers
> wishing to enter the FLOSS community? Please answer with just one
> sentence and not a whole book :-)
Find an open source project you like (preferably one you use already)
and start participating; you'll never regret it.
Best,
-Karl

14 Comments on "Interview for French Translation of “Producing Open Source Software”."

Karl, i disagree with you about the need for free infrastructure. Mako had it right in the first place, and i don’t find your argument about the practicality of running your own servers terribly convincing. I agree with you that services need to offer solid, stable, complete APIs for data transfer (so you can “escape”) as a baseline. But I also think that you should use services that are implemented as free software. I agree that 100% purity may not be possible, but chasing 100% purity is probably a mistake in any domain. And the impossibility of purity doesn’t mean you shouldn’t take a strong stand.

Consider a complicated piece of free software (e.g. the Linux kernel). For most people, the idea that they might get the smallest patch into the Linux kernel is a preposterous idea, easily as far into the realm of impracticality as them running their own server farm. Should those people sigh, shrug, and accept a non-free kernel as long as it presents a reasonable and stable API?

No, they should prefer a free kernel because other people can actually practically make use of those freedoms, and by doing so keep the existing development community around the kernel on-track. The freedom of the tool matters because it puts a boundary on what kinds of anti-user nonsense will be able to make it into the code.

What’s more, the use of any tool (without needing to contribute development expertise) is a contribution to that tool. For example, If i know my way around Adobe Illustrator, my use contributes to that tool’s commercial and social standing. I can help friends who use that tool, can participate in user forums, mailing lists, etc. I can even write tutorials that other people use to improve their own knowledge and skill with Illustrator. All of this activity contributes to the tool in question. If i do the same thing with Inkscape instead of Illustrator (despite not being able to contribute technically to either project), my contributions improve a free tool, and help free software become more effective and therefore more appealing to everyone else.

These kinds of user-driven “network effects” are at least as important (probably moreso) on network services as they are on free software. We should not encourage people to contribute their energy and their social capital to services that have tightly held proprietors.

So: while i not personally about to set up and operate a full clone of github *or* gitorious, I strongly prefer (and encourage people to use) gitorious, because of software freedom. If either network service changes their terms of use to something noxious, i’m pretty confident that a replacement for gitorious will spring up quickly, with all the features i’ve come to expect, simply because some group that found the service useful will commit to maintaining the infrastructure (and doesn’t have to rewrite the code). I have no such confidence in github.

Yes, i do regularly use the google search engine. It sucks, and it frustrates me; and i’d love to have an alternative running free tools. For searches that i think are sufficiently well-structured or narrowly-defined, i search wikipedia or free-software-driven, topic-specific forums first.

Sure, i’m not 100% pure on the network services front. Most BIOSes i deal with are also non-free, and i reluctantly use proprietary firmware on some of my network cards. That doesn’t stop me from saying that free software is to be strongly preferred, and eagerly awaiting the day that i can use something like coreboot.

We have good free software-driven network services available, and we should strongly prefer their use to the use of network services driven by proprietary tools.

There’s nothing I disagree with here; I’m just saying it’s a matter of degree.

It may be that gitorious is sufficiently close to github, in terms of functionality, that using it works out okay. Or one may feel github is simply so much better that one can’t not use it (and of course, network effects are part of the consideration there too — the “everyone is on github, so my life will be easier if I am too” argument, just as with Facebook or, indeed, the Internet itself).

But my point is that for any set of proprietary services, there is always going to be some subset for which there is no convincing free replacement. If Mako’s point is “use a freedom-positive service whenever you can, as long as it doesn’t become a huge handicap to whatever you’re trying to get done”, then I’m all for that (and indeed, that’s how I behave, and how you behave too).

I wasn’t arguing that free software shouldn’t be strongly preferred. I’m just saying it’s silly when free software advocates ignore other people’s perfectly legitimate ideas of what constitutes functionality for them and pretend that a given freedom-friendly service is “as good” as its nearest proprietary competitor when it’s not. We do this by ignoring the things that are actually important for that user, and then we lose any ability to persuade them because we show that we don’t really understand their concerns.

So, for example, Gitorious doesn’t have an issue tracker. For some people, that’s the end of the conversation right there, and I don’t blame them. It’s great that it’s free, but it’s not a replacement for Github.

Talking about freedom: good. Reality denial: bad. That’s all I was trying to say.

I’m not denying reality. I’m saying that it’s critical to realize that your actions have consequences.

One really bad consequence is vendor lock-in, whether that’s gained through proprietary data formats, proprietary software, proprietary network protocols, or proprietary network services. Vendor lock-in is bad because it means the vendor can get away with nasty, anti-user things (e.g. spying, anti-features, etc) that no one would ever consider desirable or even acceptable.

Unfortunately, people don’t always see how their use of a tool or service contributes to that service’s ability to lock them (and everyone else) into a really unhealthy contract.

We need to emphasize that free data formats, free software, free network protocols, and free network services are critical to avoiding situations where you have a proprietary gatekeeper for (what has become) basic social interaction. There are free issue trackers, even if they are not hosted on gitorious. To say “i’d rather use github because github has an issue tracker” is not sufficient. To convince me that the tradeoff is being made consciously, i’d need to hear something like “I need github’s issue tracker so much that i am willing to help github become (or maintain their position as) a proprietary gatekeeper in my community.”

If the free issue tracker of choice doesn’t integrate well with a repository hosted on free services like gitorious, then using that free issue tracker and actively encouraging them to integrate it better with other free services helps everyone.

Put another way: freedom is a powerful feature. People’s choices affect not only their own freedom, but the freedom of others (this is particularly true in situations where there is a network effect). People need to be made aware of the consequences of their actions. We shouldn’t cavalierly excuse a choice that has negative consequences.

The existence of a network effect *strengthens* the ethical argument to choose free tools, rather than weakening it.

“One really bad consequence is vendor lock-in, whether that’s gained through proprietary data formats, proprietary software, proprietary network protocols, or proprietary network services. Vendor lock-in is bad because it means the vendor can get away with nasty, anti-user things (e.g. spying, anti-features, etc) that no one would ever consider desirable or even acceptable.”

That’s exactly what I advocated in the interview: the lock-in is the problem, so if you can address that problem, the fact that the service is not running entirely free software becomes an annoyance rather than a showstopper. APIs, plus a reasonable confidence that you can move to another hosting service or to your own service, mean there is no lock-in, or at any rate the lock-in is minimized.

The area where I think we fool ourselves is in assuming that people make the choices they do simply because they’re unaware of the consequences. You said it yourself: “People need to be made aware of the consequences of their actions.”

What makes you think they’re not aware?

My experience working with people who procure software and services is that they are often perfectly aware of the dynamics of lock-in, and would like to avoid it, but that it is only one of many factors they are taking into consideration. Unfortunately, too often, the free software advocate comes to them and tells them that they are not considering lock-in seriously enough — yet the advocate has made no inquiry as to what other factors the person is considering nor why those factors might be important, nor about how strong the lock-in would actually be (e.g., in the case of GitHub, nearly non-existent).

So how can the advocate possibly know what they are talking about? That’s what I’m objecting to: write-only advocacy that pays insufficient attention to its targets’ actual situations.

At this point, you’ll probably want a concrete example.

Today, it happened to to me in a conversation. I mentioned that I’d had to give up running my own spam filters, even though I would strongly prefer to do so, and instead now rely on a proprietary, outside service. The response was: “kfogel: i’m not sure you tried the right thing.” There’s no inquiry there about how bad the spam load was, or what we tried, or how much effort I — with plenty of highly-qualified help — put into solving the problem on our own first, what the opportunity cost of all that effort was (the answer, by the way, is that we spent nearly a year trying to solve it with all sorts of combinations of free software, at a cost of tens of hours minimum, and other infrastructure improvement work lagged because of it). I don’t mean to pick on that one response, which was casual, but it’s typical of the pattern of response I see us free software advocates giving all too often. We essentially say “Oh, there’s an app for that”, without spending any time looking carefully at what “that” is or what efforts the other party has already made to solve the problem using free software. It’s advocacy without understanding, and is therefore poor advocacy.

How exactly is github a gatekeeper, by the way? I mean, is there any practical sense in which they’re blocking something? Savannah.gnu.org runs all free software, but provides much worse service (because staffed by overworked volunteers). That’s not hearsay; I’m speaking from extensive personal experience. If I had to say that one or the other of those services is “gatekeeping” me, it would absolutely be savannah. Sorry — I love the GNU Project and contribute, but there’s no doubt about which site puts up fewer obstacles. If somebody advocated to me that I should use Savannah because that way I’d be supporting freedom, I’d have a hard time avoiding laughing. I can’t afford to pay that much for freedom; few projects can.

(Gitorious I don’t have very much experience using, so can’t say there. I’m just using Savannah as an example because the contrast there is high enough to make the “gatekeeping” question meaningful.)

Congratulations on the recent translation of your book into French! I’m excited to read it finally in my native language.

Now about the interesting discussion that is unfolding in the comments… I’m not sure that it is fair to take a broad position on an argument, without laying out specifics, and then when it is disputed on the same broad level, cry foul that the disputer has not thought about a very specific case that you had not presented. I might be using the term triangulation incorrectly, but to engage in a slow revelation of the specifics of the criteria as the argument unfolds in such a way that enables you to invent the criteria at any time in order to justify your position would be triangulation.

Your criticism of Daniel G’s reply seems to be that this sort of general advocacy is unhelpful in certain specific scenarios and his failure to ask what the criteria are in a specific scenario that you have not presented is demonstrable proof of how unhelpful it is. However, you shifted the argument from a generalized claim to a more specific one in order to point out the generalized claim is not valid because you can think of an example where it doesn’t work. I dont think that is fair argumentation.

It may be that this specific case you are thinking of is in fact one in which the criteria of github make complete sense in a final sum evaluation. If that is so, then reveal what that specific case is ahead of time, and argue that this specific case justifies using non-free proprietary services. That is a much more straight-forward and honest argument. Otherwise engage with the original broad argument that you led with and and respond to the arguments that were made on the same merits.

Is your argument that there are specific cases where proprietary services make sense, or is it that in most cases proprietary services make sense? You should stick to one and not jump between the two or it is frustrating to engage in a reasonable discussion because the respondant doesn’t know which you are arguing. If its the former, then you are making a narrow case for that particular situation. If its the later, then you are making a broad claim that you should back up with broad arguments, not one specific example that you then claim is the example that proves the rule (overwhelming empirical data would also suffice).

Forgive me if I am misrepresenting, but it seems you are making the argument that in most cases proprietary services make sense, and anyone advocating otherwise is not helping and actually is hurting their cause? I think you would be right to say that Daniel G’s response would be unhelpful if you were both sitting in a room and your big client just presented you with their hard-thought specification’s and wanted your advice on what to do and Daniel G’s response to the client is to ignore all the criteria presented and go off on a rant about freedom. However, I dont think that is what is going on here, don’t try to pretend it is in order to make your claim.

So, for example, Gitorious doesn’t have an issue tracker. For some people, that’s the end of the conversation right there, and I don’t blame them. It’s great that it’s free, but it’s not a replacement for Github.

APIs, plus a reasonable confidence that you can move to another hosting service or to your own service, mean there is no lock-in, or at any rate the lock-in is minimized.

I am not aware of a way to take Github’s issue tracker with you once you decide to move away from their service. If it is in fact true that a main advantage of Github over Gitorious is an issue tracker, not being able to move the issues out would seem to bolster the lock in claim against Github.

Minimizing lock in seems like too risky of a gamble in most cases. If I am starting a new Free Software project, I don’t know ahead of time which bits of my infrastructure I want to move around. I may not have enough information to know what my lock in risks are ahead of time. Later, I may or may not want to take my bug tracker’s data as well as my project’s code with me away from a Github.

Github is a good example of minimizing vendor lock-in, but minimized freedom is just asking for bigger cages and longer chains so that you feel less trapped.

How exactly is github a gatekeeper, by the way? I mean, is there any practical sense in which they’re blocking something?

An entity does not have to actively be blocking something to be a gatekeeper. They only need the ability to block it. It isn’t that Github, or Google Code for that matter, have a record of arbitrartily removing projects. The issue is more that they are capable of almost anything, because they ultimately control access to the network. So while you may have a local copy of a git repo, they control the entirety of the public facing infrastructure of your project. This requires you to trust them to remain responsible with a) ensuring your and your user’s access to code/issue tracking, b) not using data gathered from people interacting with “your” infrastructure for something nefarious. You must then extend this trust to them in perpetuity, as well as to any tech giant that might buy them a year. All it takes to be a gate keeper is a gate, just because it is wide open doesn’t mean that it couldn’t be slammed shut or closed partially.

The question that has to be asked about all proprietary network services is what happens when they get bought out? If Facebook were to but Github would we feel as comfortable about it being freeish, or would we wonder what the hell Facebook was about to do with years worth of Github data? Part of the network effect problem here is that by using these proprietary data solutions, you are putting the privacy of your users into the hands of a company that may not have an interest in protecting that privacy; and if they do, whoever buys them almost surely won’t.

Mmm, yes, good point — my argumentation technique was unfair, and you’re right to point it out. Thank you for expressing it so rigorously and objectively. Part of the explanation is that I was in a bit of a bad mood yesterday (got better!), but also, there’s some context that I should have mentioned: I was having an IRC conversation with Daniel Kahn Gillmor and others on this topic at the same time as we were writing these comments. So the comments became an extension of that real-time conversation… but I didn’t indicate this anywhere in the comments, and thus put other readers at a disadvantage. Some of the specific examples I drew in came from that IRC conversation. So they were perhaps not quite as unfair as they seemed, but I didn’t include enough context for you (or anyone else who wasn’t in IRC) to know that.

In general, I think your understanding of the point I am trying to make is accurate. The place where it is best expressed is when Daniel wrote “People need to be made aware of the consequences of their actions.” When we walk in with the assumption that people are not aware of those consequences, we are often wrong! In many instances, I have found that they are aware, and that rather it is me who is unaware of some of the other concerns governing their choices. There is a recurring pattern in free software advocacy of the advocates being either unaware of those concerns, or not doing enough to understand & answer them. The only way I knew to demonstrate this was to bring in some specific examples — not because the examples themselves would prove the pattern, but rather because I hoped they would cause a flash of recognition, that is, that they would enable readers who are familiar with this field to call to mind more examples of the same phenomenon. They were meant only as an illustrative technique, not a rhetorical one, in other words.

I hope this explains my comments better. I’m sorry my responses yesterday were a bit irate and poorly organized; I’d go back and edit them, but since they have their own responses now, it’s probably better to keep the record pristine, despite my regrets at how I presented my case :-).

@nat:

I didn’t know that about the GitHub issue tracker. I completely agree: if it doesn’t have an API, then the danger of lock-in is much greater. It simply never occurred to me that they would launch a non-free-software issue tracker without APIs considering that one of their major competitors (Google Code Hosting) does have a pretty complete issue tracker API.

But as regards your other point, about how gatekeeping just means they have the ability to slam the gate shut: I confess I don’t really understand how it relates to the freedom of the underlying software (as opposed to the availability of APIs). Anyone who runs a hosted network service can do random things with the data gathered by that service, and can make arbitrary decisions about what to do underneath their domain name. This is true whether or not they are running purely free software on the servers or not. The statements you make about GitHub, for example, would be equally applicable to (say) Identi.ca.

It sounds to me like what you’re really talking about is not proprietary code vs free code, but rather the corporate structure of the entity that owns the domain name and the physical servers. That’s what affects lock-in and privacy concerns. The licenses of the software running on those servers has zero effect on privacy issues.

To lay it out:

a) A service can run all free software, but collect data and keep that data proprietary.
b) A service can run proprietary software, but make its data completely open and free (via APIs or bulk downloads).
c) Like a) but with one side flipped.
d) Like b) but with one side flipped.

Hmm, I guess a 4-way square would have been a better way to express that! But you see what I mean, anyway.

The thing that prevents lock-in is, primarily, the availability of APIs. Thus I have in some cases recommended Google Code Hosting: they have not only created APIs (which I’ve tested and have used specifically to alleviate lock-in concerns — see http://code.google.com/p/projport/) but they have publicly committed to avoiding lock-in as a strategy (see http://dataliberation.org/). As far as the lock-in problem goes, this stuff is much more important than running an all-free-software infrastructure (which of course they do not do). Of course I absolutely agree that there would be other advantages, for everyone, if they ran a free software infrastructure. But for the specific things that we free software advocates often claim are at issue — lock-in and privacy issues — the controlling factors are corporate structure, privacy laws, ToS agreements, etc. The software licenses are irrelevant to that.

Karl, I’ve already agreed with you that commitments to open APIs and the ability to move data in and out are important. I’m not arguing against that.

But i’m arguing that software freedom *still matters* in this context. It matters because if you get all your data out of googlecode due to some obnoxious terms of service, but no one else can actually run the services you need to act on or manipulate that data, then you are left with a bunch of data sitting around. So much for avoiding lock-in.

If the software was free, then anyone with the time and energy to run and maintain the software could pick it up and offer the same service, but without whatever onerous terms googlecode has decided to impose. I grant that this is a non-trivial task. However, it’s significantly easier than having to write the software from scratch in the first place.

G5. You agree not to reproduce, duplicate, copy […] any portion of the Service […] without the express written permission by GitHub.

This makes it sound like anyone who even has a github account has agreed to not try to make a github clone; good luck getting a developer to set up a functionally-identical bug tracker (if github will even let you export the data) if the developer can’t even log into the site to see what it is that you’ve come to expect.

PS about the slippery argumentation — yes, there was a blending of IRC, in a channel that is an extremely geek-heavy channel, where i did say to Karl “it sounds like you were using the wrong tools”. I apologize if that came out the wrong way and pissed you off; i want to be clear that my remarks there were made very much in a specific context, specifically about spam management among a geeked-out crowd, many of whom (afaik) do spam management using free tools and free services. I would not (and do not) casually dismiss user concerns with rants about freedom. However, my experience is that many people (of all levels of tech sophistication) have not thought about the long-term consequences of what we’re doing socially on the network today. The creation (and casual daily propping-up) of proprietary social gatekeepers has us heading toward ubiquitous unaccountable centralized private surveillance, and the direct intrusion of commercial interests on even the most fundamental forms of human social interaction. I think it’s a bad tradeoff, and i actually do think that most people don’t understand the role they’re playing in contributing to it.

Yes, we geeks certainly need to listen to what features people want to understand how we can make tools that suit their needs. But we also have a responsibility to help people understand the tradeoffs that they might not see at first glance, but which we see because we think about this stuff more than most people.

Those GitHub terms are awful, and I will take them into account before recommending GitHub as a service (if I ever do again); thanks for pointing them out. I wonder how enforceable they actually are. It might also be that one is no longer bound by the terms once one stops using GitHub — i.e., if you migrate away, then you regain your freedom to set up a clone, though of course you’d have to rely on memory and notes in order to make that clone accurate :-).

Of course, those Terms of Use could be made by any site, including one running all free software. That would be odd, and I’m not sure it’s ever happened, but Terms of Use are pretty much independent of the software licensing, except for a license that itself prohibits such terms of (ab)use… but then that probably wouldn’t be a free software license, paradoxically enough.

I realize that’s unrelated to your larger point that the freedom of the underlying software is important because it makes it easier to duplicate a given service’s functionality, and that you were just pointing out the ToU thing as an example of how a service can impose obnoxious terms (thus making it necessary to migrate away, thus making the freedom to set up a clone important).

But just looking at it as a technical matter, I don’t think the code is terribly important to cloning GitHub or most other online services. It’s not the code, it’s the servers and — most importantly — the management and ops teams. If you and I and a few talented friends wanted to whip up a clone of GitHub, we could do the main functionality in a week or two. Seriously, look at what’s there: git (already free); some user authentication and identity management (lots of packages for that); an issue tracker. I haven’t explored the issue tracker much, so my estimate might be a bit off there, but it’s not far off. The thing is, we couldn’t do it scalably. If you look at the back-end ops for these online services, it quickly becomes apparent that the code is not the valuable part. It’s the processes and server management expertise, and handling all the unpredictable things that come up when you have a user base in the hundreds of thousands.

The issues are organizational; they have very little to do with code.

So I don’t think software freedom has the influence here that we want it to. I’m not saying it’s not important, but I think it’s less important than (probably) you think it is for the specific purpose of being able to clone popular online services. And this isn’t something I want to be true, just to be clear. But that has no bearing on whether it is true.

By the way, although projport just does import right now, I used it to test the googlecode APIs in both directions — that’s why I mentioned it, because it had enabled me to verify that the APIs were for real. The basic infrastructure is there in projport to do outbound conversion, I just didn’t have time to implement it all the way (the people I was writing it for only needed to do inbound conversion). We’d absolutely incorporate any patches to do outbound conversion; it’s part of the stated purpose of the tool.

Good ops teams that are capable of creating and maintaining robust and scalable network services actually do put their knowledge and skills into code. That’s the way to scale.

If they put that code back into free tools, then the whole community benefits, and it becomes possible to take advantage of the pool of ops expertise, thanks to the freedom of the software.

You seem to be claiming that reproducing the software behind any given network service is trivial compared to operating that service. Your examples to demonstrate that claim seem weak to me: The main reason you expect cloning github to take a week or two seems to be because git itself (the most sophisticated part of the network service github.com provides) is already free. This is an argument for the importance of software freedom for network services, not against it.

And your example with projport shows that actually writing the code to make effective use of the data from open APIs is itself a non-trivial task. You implemented part of it, but didn’t manage to complete outbound conversion yet, despite it being part of the stated purpose of the tool. I’m left with the same conclusion: that if googlecode itself was free software, you would be more able to effectively combat vendor lock-in than you are with it as a non-free network service.

So software freedom really does matter even if someone else is running the code on your behalf (i.e. in a network service).

You are forcing me to hone my point, which I have perhaps been expressing too broadly.

Most of these services (GitHub, even Google Code — heck, even Google Search to some degree) are a thin layer of non-free code over a huge foundation of free code. The relative sizes of those layers differ, but generally it’s a thin-over-thick ratio like that.

I think what I’m saying is that software freedom becomes a less compelling utilitarian argument the lower that ratio is (that is, the thinner the proprietary layer).

So for example, you’re right, the reason it wouldn’t take long to clone GitHub is precisely that major parts of it — such as git — are already free software. That is definitely an argument for the importance of free software! (I don’t think anything I said contradicts this.)

But strictly from the utilitarian perspective of avoiding immediate lock-in, having that remaining, non-free, not-yet-commoditized layer of GitHub special sauce be free is not nearly as compelling as it would be if the rest of GitHub weren’t already effectively free-as-in-freedom. It’s still somewhat compelling; it’s just a matter of degree. When I try to make that “freedom is important” argument, I can feel how much less persuasive it is when it’s about an almost-already-free service. My wishing it to be more persuasive doesn’t change the dynamics of the situation. The fact is, GitHub (modulo the new bug tracker, about which I don’t know much) is not a whole lot of lock-in. The major API is git itself, and it pushes/pulls all the data in exactly the format you want if you want to preserve the ability to walk away at any moment.

Another way to say it is, it’s a lot easier to explain to someone why they shouldn’t use MS Exchange than why they shouldn’t use GitHub.

So regarding this:

“You seem to be claiming that reproducing the software behind any given network service is trivial compared to operating that service.”

Not at all. Not “any given” network service, just a network service that is already mostly free. Even when it’s entirely free, there is still a lock-in effect just from the hosting and ops requirements. The Launchpad.net code is completely free, for example, and yet standing up a production instance of Launchpad somewhere else would be non-trivial (and has proven so in the real world).

The projport counterexample doesn’t hold up. I have no motivation to implement outbound conversion before I needed. When I need it, then I’ll implement it; if someone else needs it, they can implement it.

I completely agree that “if googlecode itself was free software, you would be more able to effectively combat vendor lock-in than you are with it as a non-free network service”. It’s obviously true. But one would only be a little bit more able, and that’s my only point. It’s not that free isn’t better; it’s just that it’s not as overwhelmingly, compellingly better as many on our side often claim it is, for the particular audience we’re trying to persuade.

Hey Karl, a long time fan of your blog here, most of your articles serve as motivation for my own site, thanks for everything 🙂 And don’t mind the comments from other people, you have fans all over the world 🙂