Decisions decisions decisions: Things to be aware of when choosing your research software stack

This post is part of the Collaborations Workshops 2017 speed blogging series.

Choosing a software stack for your new project can be a daunting task filled with many what-ifs. Our best advice? Don’t panic and don't overthink it. Trying to over futureproof your project is a mistake! Choosing a software stack involves decisions at every level of the software’s architecture including the programming language(s) being employed, the build system, runtimes, libraries, frameworks, even the infrastructure or architecture the software will run on and the version control system.

When starting out a new project it is important to get an idea of who your main audience and users will be. Do you predict many application users? Will these users also be developers? What sort of background do they have? Someone with a background in software will be more likely to have complex tooling (such as compilers, script interpreters, virtualisation software, etc.) available, whereas a non-developing user might be better served with a static binary. Answering this question will aid in choosing a suitable programming language, version control system, and build system—a step as important as the application itself.

A simplified heuristic you could try is:

Pick the language, L1, that you are more comfortable coding with.

Search for libraries that can solve your problem, C, which talk with programming language L1.

If you find a library, P1, that satisfies your constraints (more about this further down) you have your answer. Try using language L1 with library P1 to solve C.

If you didn't find a suitable library, search for libraries which talk with other programming languages that can solve your problem, C.

Sort the result in descending order based on languages that you prefer to work with.

Loop over your list until you find a language, L2, and a library, P2, that satisfies your constraints when solving problem C.

If you didn't find a suitable language, L2, and a library, P2, you may need to code your own solution from scratch.

While this is a valuable heuristic, you should be aware of and consider the following constraints: licensing, support, and background knowledge of development teams.

Constraints

When writing software it’s best to use the existing work of others through libraries, as you may need to distribute your new work or use it later. You may think licensing is not important, but if you plan on letting anyone else use your code then you’ve got to address it. First, what is the license of the new software? Are you open sourcing your project? What about commercial projects or projects with industrial partners? What if you can’t open source it; e.g. maybe you don’t wholly own the Intellectual Property (IP)? In an ideal world, we wouldn’t need to worry about these issues but these are important aspects of software engineering. “Just open source everything” isn’t the answer either—choosing an open source license to distribute your code is not necessarily straightforward (e.g. GPL or LGPL). The answer to this first point determines whether you can legally use any of the libraries you find.

In terms of support, one thing you want to avoid is spending a few hours or days testing a library until you hit a "bug" in the library that you’re testing and can't contact someone to talk about ways to solve the "bug", since the original developer move to another project and no one else has picked the project to maintain (commonly known as an orphaned project). Things to look out for to avoid orphaned projects are how old is the last release or commit, how old is the documentation, what is the "quality" of documentation, how many issues and contributions the project received so far, and how much time until someone replies to an issue report.

Background Knowledge

Perhaps you are looking to open source (highly recommended!) your project in the hope to gain many contributors. It is important to think about possible levels of expertise from your collaborators, especially when deciding on a programming language, since a lower barrier to contribute you increase your chances to receive a contribution.

One key area where this manifest itself if when choosing between using a single programming language and using multiple languages (possibly domain specific). Would it be easier for a contributor to modify a single Python application that, for example, reads in the data, reformats it, runs some matrix calculation and then plot a graphic. Or should you do the reformatting in Perl, the analysis in MATLAB and plot a graphic in R. The latter may depend on contributors having more technical knowledge but can help keep the code modular.

Conclusion

There are hundreds of languages, tools, frameworks and libraries with new ones being created all the time. It’s not surprising that a “best” option doesn’t exist. Don’t Panic, if you overthink it then you will never get started. Choose what you’re comfortable with and always be prepared to change.