IT Infrastructure for Research Projects

This blog post describes the IT infrastructure used for running the EU FP7 project Envisage, an international research project involving 8 partner sites across Europe. We describe the chosen project setup (mostly software) used for Envisage, report on our experiences with this setup, and discuss what is working and what is not working so well.

The requirements for our research project Computer Science are not that different from what you’d expect for software development:

user database (~50)

mailing lists (~10)

blog, wiki and other (web-)storage

scm/version control (GIT for code, SVN for LaTeX)

bug tracker (Redmine)

continuos integration (Jenkins)

For all the services, we naturally want a single user database, so LDAP immediately comes to mind. On the OS level, we have a single user for the project that we share for administration, so we log in via SSH. We tried to experiment with an OS-level group, but we had trouble to keep group/other permissions correct, so we went for the simpler solution.

For dissemination, we run our blog here at http://envisage-project.eu as a WordPress instance. Unlike the other modules, this is not directly operated by us, but hosted through the web-server of the department with a suitable CNAME record in DNS. All files including WordPress are hosted on NFS, which automatically delegates backups to the local IT group, just like the various (Postgres) databases that we need for WordPress and the bug tracker.

All services apart from the blog are running on a virtual machine, also operated by local IT. It is a pretty generic RedHat which NFS-mounts the project directory (backups!), where we have installed some of the software ourselves. We rely on the IT group to keep binaries such as SSH, OpenLDAP, Apache and e.g. scripting languages or JDKs up to date on the VM.

In terms of computrons required to host our services, you can probably run them on a modern phone (or a MacMini in the corner) without a visitor noticing anything. Of course continuous integration (see below) will benefit from any CPU cycles, memory, or directly attached storage that you can spare.

Generic project members do not have shell accounts on the VM. They only exist as entries in our own OpenLDAP database. Anecdote: even the LDAP could have been hosted on the university-level (or rather, even one step up on the service providing almost all educational logins through Norway!), but since this database will contain information deemed sensitive (full names and email addresses), after an initial successful trial run we got kicked out because of too much red tape…

LDAP accounts are created with a little shell-script which also knows how to create and setup those users in the project management system, while other maintenance such as groups is done using Apache Directory Studio. For some reason, the WordPress SimpleLDAP plugin required a bit of patching to authenticate against the proper LDAP group.

For security reasons, as we do not have root on the VM, we run our own Apache-instance on a high port as the project-user. On this port, we only speak SSL and have a certificate signed by the local authority. (The Apache binary is the one from the VM.) This makes the URL look mildly ugly, but could easily be fixed by a few lines of iptables or a reverse proxy on the VM to transparently redirect the standard HTTPS port to our port. We will actually need to do this eventually, since many hotel- or airport wifis block HTTP(S) to non-standard ports.

Not using the OS startup facilities means that iff the VM reboots, at the moment, someone has to log in and manually invoke our home-grown start script that brings up an OpenLDAP-server, Apache, and the other odd ends. Again, this could easily be fixed by getting an @reboot cron job installed, some day.

The mailing lists are not directly managed by us, but through the Sympa mailing list software operated on the university-level(!) which pulls in mailing-list members from the LDAP-groups.

The Redmine project management system and bug tracker uses a particular Ruby version which requires a suitable scl enable ruby193 incantation around the Apache startup, so that the Apache module picks up the correct version.

We then use the standard Apache techniques to serve GIT and SVN, where authentication is provided through LDAP — you really want to avoid having different passwords for different services! The version control-system distinction between code and scientific articles was mostly made because articles need less versioning and branching, and some old school researchers prefer rcscvssvn. GIT also has a post-commit hook that posts commit emails to a particular mailing list that includes everyone with write access to the repo, whereas SVN commits are silent.

For convenience, we also offer WebDAV which can easily be mounted as a network drive from all operating systems, and use it primarily to collect slides at meetings. These slides are then later linked to from e.g. the meeting agenda in WordPress. This allows us to keep work in progress or administrativa out of the yes of the prying public.

As further services, we are running Jenkins and Sonar to support the usual software development tasks through continuous integration and code quality metrics. Both are pretty self-contained drop-ins into the Apache config, that barely need any special configuration except again for the LDAP database. Naturally especially Jenkins requires its own kind of hand-holding, e.g. when automating building of Eclipse plugins through Buckminster. Our Eclipse update-site is populated through rsync via ssh.

On a different machine, we use Nagios to monitor that both the blog and the services are available. Our biggest aesthetic concern is that we use two different URLs, envisage-project.eu and envisage.ifi.uio.no:8080…