Let it roam free ! Releasing your code into the wild…

Let it roam free ! Releasing your code into the wild…

Today, I thought I’d do something a little different and talk about what one might expect from publicly releasing some code.
I figured it might be nice to interview someone from our group which has lots of experience doing so, Tariq Daouda, to gain some of his insights.

So without further ado, here we go !

JP:
Hi Tariq, glad to have you with us. I thought I might ask you a few questions regarding what happens when one decides to make some of his or her code (librairies, plugins or whatever) available to the public. Hopefully, readers might get a few pointers on what to expect when doing so.T:
Glad to be here and share my experience !

JP:
All right, now for the sake of the interview, we’ll concentrate on 3 of your projects which are all publicly available through your GitHub lair. Namely: pyGeno, Mariana and pyArango. Since all 3 libraries target distinct audiences, we should be able to eliminate audience-specific concerns.
So could you describe your main motivation behind releasing your first library into the wild ?T:
Hmmm.. that’s a good question…
I guess, and this pertains to pyGeno specifically, I knew I wanted to write a scientific publication around that library. And since one of the pre-requisites for such a publication is the release of the code to the public domain, I figured I might as well do it right away. I’ll admit that I also wanted to make sure that the library was living up to my expectations on the aspect of user friendliness, so getting some form of user feedback was definitely part of my motivations. I also wanted other people to use my library in different contexts than the one I was using it in, to maybe find some kinks I might’ve missed !

JP:
Right ! Because we all know what happens when you give your code to someone else… If there’s a flaw, they find it right away !
So a mix of end user feedback and debugging then.. Interesting !
Let’s talk about popularity, GitHub provides 2 metrics by which one could judge the popularity of projects: stars and forks. Which do you favor, and why ?T:
I really like stars ! But don’t get me wrong, I also like forks… When they convert to pull requests, that is.

JP:
Right, because on GitHub, the only way for someone without access to the source repo to submit some code modifications is to fork the project, modify the fork and then use it to generate a pull request.T:
Exactly. I really enjoy receiving pull requests ! Surprisingly, Mariana did not generate a lot of pull requests.. I got more for pyGeno and pyArango. I guess the nature of the users really impacts their relationship with the libraries. For instance: I got many PRs for pyArango coming from members of the ArangoDB Dev team.

JP:
Ha ! I guess they see your library in a very positive way 🙂T:
Right ?! 🙂
But for those who only fork my projects, I think a lot of them just forget the project after a while… So that’s why I prefer stars.

JP:
And would you say that you’ve felt some form of appropriation of the code by your user base ?T:
Definitely ! I’ve received a number of feature requests, some users have fixed typos in the documentation of pyGeno, others have updated the pyArango library to support the new version of ArangoDB’s API. But the best example of code appropriation I’ve seen has been the setup of Continuous Integration procedures for both Mariana and pyArango. People took the time to set it up, and then submitted a PR.. That amount of involvment really surprised me !

JP:
Nice, so the code really has begun a life of its own.. Under your supervision of course ! I guess the GitHub experience really has delivered some nice results.
OK, let’s delve into another aspect of releasing code: the choice of a license. It’s certainly an aspect with which most of us are less comfortable and which can have clear impacts on the possible uses of the code in the future. Which license did you choose and why ?T:
Good point. I chose to go with the Apache version 2.0 license instead of the more common GPLs or MIT options. I specifically chose Apache version 2.0 because it provides an express grant of patent rights from contributors to users. It’s also the license under which a lot of startups and companies such as ArangoDB operate, so it just felt right.JP:
Interesting !T:
I would suggest visiting the Choose a license site to anyone looking to release some code, it’s simple, straightforward and very usefull.

JP:
That’s a great pointer. I’m sure it’ll come in handy to some of our readers.
We often hear comments about how the anonymity conferred by the web sometimes turn nice people into rude ones. Were you ever the target of some form of hostility in relation to the libraries you have released ?T:
Unfortunately, yes.
I’ve received harsh comments on a few occasions. Some of the comments were sometimes warranted (although the harshness was not) but one time, while I was answering a question on StackOverflow suggesting the OP look at one of my smaller libraries as a potential solution to his problem, some anonymous user suddenly appeared in the conversation and starting really bashing my library, for no apparent reason ! It seems the comments really were of poor taste and unwarranted because they have since been removed from the conversation by StackOverflow moderators..
But to be frank, the positive comments I’ve received far outweigh the negative ones. I’ve received lots of very positive and rewarding comments. And unsurprisingly, these positive people are rarely anonymous.

JP:
And what about the maintenance aspect ?
A common issue with libraries released by individuals is the maintenance is sometimes lacking. Some libraries will fall behind in regards to their dependencies and basically become unusable after a while. Do you consider yourself an active maintainer and how much of your time do you put on such tasks ?T:
I’m definitely active on that front although feature addition sometimes takes a back seat. I tend to resolve bugs and other serious issues in a timely manner but adding features is more time consuming and sometimes, especially if it’s not a feature I need myself, I just don’t have the time to do it.
I think it’s also important to put measures in place that alleviate some of the work involved with releasing new versions of a library. This is going to sound old but a good set of tests is mandatory. Having a good set of tests also enables the use of Continuous Integration (CI) services. CI is one of the measures that have helped me to identify bugs in the past and since it runs the tests automatically, it just saves a lot of time while providing some peace of mind. And peace of mind is a good thing !

JP:
The fact of releasing your code on a GitHub repo is certainly not enough for people to massively start using them, they need to know that the libaries exist first. What did you do to promote your libraries ?T:
Promotion is tough !
I think it depends on your target audience, I tried Twitter at first but that did not work very well. I think you need to have a strong base of followers for Tweets to really be impactful. The best results I got were from posting on Biostars for pyGeno and HackerNews for Mariana but I think even those strategies yield stochastic results. I also gave a number of talks at various venues but while you get to talk to your target audience, the number of attendees is sometimes not very high. One important thing to note though is that you need to be ready for people to contact you directly. Expect a lot of people to start contacting you by email.

JP:
And how did you measure the impact of your promotional activities ?T:
Well, GitHub provides stats on visitors and the activity of your repos. You can see the number of views, unique views, clones and clones per unique machine. So that’s what I look at to measure the impact.

JP:
Great ! Thank you for all that information Tariq. Now before I leave you, do you have some last words of wisdom you would like to impart on developers looking to release some of their code ?T:
Yes: Tests are a coder’s best friend !!JP:
So you would place having good tests above, say, a good documentation ?T:
Definitely !
Documentation is fairly easy to do, but if you release broken code.. People will let you know, believe me !

Share This Story, Choose Your Platform!

Originally trained in molecular biology, I quickly realized my heart lied with bioinformatics ! (How can anyone be presented an HMM and not fall in love ?). While I spend most of my days writing Python code, I must admit I am starting to enjoy my occasional dip in R.