Brief Description and Continuing Discussion:

Topic 1: Measuring the output of a core (Led by Simon Andrews and Fran Lewitter)

This is a follow-on to a topic which Matt Eldridge brought up at the ISMB workshop. This generated an interesting discussion afterwards which we had to curtail due to time constraints so we thought this would be a good topic to expand upon in a public call.

The original questions was how you can measure how well your core is working. Such a measurement could be used to assess the performance of the core over time, or to allow comparisons between cores. It could also be useful in helping to support applications for additional funding or personnel.

Preliminary Information

Questions which would be relevant to this topic might include:

Is it useful or desirable to objectively measure the performance of a core facility?

If analysis results are to be reproducible then in addition to recording the settings and options used for analysis we would also have to record the exact versions of all software components used and all datasets searched. Providing such traceability could be an arduous task, but may be required in cores which handle clinical or translational workflows, in not samples, and may be desirable in other situations.

This also raises a wider issue of how we manage the transition between releases of software packages on which we rely. Scientific software tends to have a rapid release cycle with improvements and changes coming at regular and (relatively) rapid intervals. This raises a number of questions about how frequently we update software packages on which we depend, and how we manage these transitions.

Preliminary Information

Questions which may be relevant to this discussion:

Do people try to keep track of the software versions used for analysis? If so, is this because of regulatory or other "business" requirements?

Just how often do you update software packages?

Is there a difference between public and vendor-distributed packages in frequency of updates that you install?

How do people manage software updates?

Do you have test environments to try out updates or do they just get used immediately?

Do you update packages during the course of a study?

Do you have multiple versions of key packages?

How do you monitor for available updates? Could this be managed better?

One specific example which was raised in preliminary discussions was the recent Illumina Update to Casava 1.8. This update offered many advantages but also necessitated major changes to the analysis pipelines people were using. It may serve as a useful example of some of the problems faced during software transitions, as well as being specifically of interest to those people running Illumina pipelines.

Topic 1 Measuring the output of a core

The topic was introduced by summarising the discussions started at ISMB2011. The aims of the session were laid out as being whether it was possible to quantitatively assess the output of a core facility, and if so which metrics are useful. It would also be interesting to know if people were doing this already, and if so whether they were doing it for their own information or as part of a more formal assessment of their facility.

Matt Eldridge started off by quickly going over the measures he'd been making of his core. He said that his reports are generated monthly as part of an initiative to get all core facilities to be measured on standard metrics. Measures he makes aim to assess how busy his core is, what their turnaround time is and how satisfied his users are with the service they get. These have proved to be difficult things to measure. At the moment he is measuring factors such as:

How many logged hours of work against each project

The number of open issues

The number of inactive issues (those waiting to be started)

The discussion was opened to the rest of the group to see if anyone else was doing anything similar.

Fran said that her main assessment was based on the number of papers on which any of her group were listed as authors or acknowledged. This was not formally assessed and she has no target to meet.

Monica from Promega said that she wasn't required to track usage, but tried to measure the ratio of investment vs maintenance. This shows what proportion of your time you are spending keeping existing things running as opposed to doing new and innovative work. When the balance tips too far towards maintenance then the core is overworked. This measurement was started to help with justifying bringing new people into the group. Monica is tracking this somewhat informally using spreadsheets but has found it to be useful.

Brent said that his group have just started a chargeback scheme and would therefore be tracking usage as part of this. He track budgets and time spent on projects. Jobs are broken down into categories and he eventually hopes to break down all charges into categories as well.

Charlie from MIT said that he does hourly tracking of core activities for billing and usage statistics but that he has concerns about how to best relate those numbers to a measurement of core output. For example, he has noticed that sometimes the least productive work (productive meaning the timely generation of publishable results) can consume large amounts of the core's hours.

Fran said that she has started to make up a monthly report where all work is broken down into:

The split between these categories is estimated informally. Her aim is to keep the amount of non-science and maintenance work below 10% of all work.

Matt said that one thing he had had to do was a full survey of his users as part of a larger institute assessment exercise. He doesn't think these are practical to do too often, but are probably useful once every 3 years or so. Any more frequently than that might upset the users who don't really want the hassle of answering surveys. Such surveys can be useful though. For other services (eg sequencing) there are simple measures which can be made to assess how well they are doing, but these tend not to work so well for bioinformatics where there is a larger diversity of work occurring.

Hemant pointed out that in some cores there are standard analysis pipelines running and these would be amenable to simple metrics to measure how efficiently they are running.

Simon then asked the group if it would be worth trying to collate some of the metrics people are using to measure the output of their cores and put these up on the wiki somewhere. He also asked if these were present whether people would be willing to try them out on their own facilities and then share the results with the rest of the group. There was widespread agreement, so Simon agreed to set up a page and start collating this information to see if we can get anywhere with it.

UPDATE The page listing the set of metrics to measure the output of a core is now available and open for comments and improvements.

Topic 2 Managing and Tracking Software Updates

Brent introduced the topic by saying that he has been asked recently about creating archives of both software and results to be able to exactly reproduce analysis results. This is something which may become needed for compliance reasons, especially with the expansion of personalised medicine. He asked if anyone else was doing this already, or looking to do this in the future.

Hemant said that he already has systems in place to manage software versions on his cluster. This is based on the "Modules" [1] software management system available on SourceForge. "Modules" based user environment management system has several benefits. Module system handles all software dependencies making it very easy to reproduce analyses with older versions of software. It means that he always has all historical software available to him. Modules also allows one to control access and implement account level security. One can restrict access to "modules" based appropriate permissions on the top level folders and appropriate user groups.

Modules system works for Illumina's pipeline software. We have been using this for the last three years. Modules automatically keep track of specific software dependencies (e.g. a specific version of Perl/Python etc). This allows multiple versions of sometimes incompatible software to co-exist on the server. All software dependencies are automatically loaded when you "source" a specific module making this system extremely user friendly.

He uses the SeqWare workflow package to manage high throughput sequencing data analysis workflows. This keeps track of all of the versions of programs used in every analysis.

Changes with illumina pipeline are currently just tracked in a spreadsheet with start and end dates for specific versions used for analysis. Having this information (with the "modules" system) allows re-processing of old data (if needed) with a specific version of illumina pipeline. He is hopeful things will calm down now that Illumina have moved to standard Sanger encoding for their sequences.

Matt said that he has a similar system, but that he doesn't have control over the software modules which are installed since these are handled by a central IT group. The modules system also doesn't work for software his group have installed themselves in which case they have to update manually.

Simon said that his systems are far less formally managed. They install software themselves individually on each analysis machine. Some packages (samtools and R for example) are taken from the main Fedora repositories are are updated automatically, but most packages are manually installed. Software updates happen irregularly and are normally triggered by the need for a bugfix, or an additional feature not available in the currently installed version. Sometimes different versions are installed in parallel, but mostly the new version replaces the old. No specific record is kept of the versions of software used for any particular analysis.

Fran said that her software is managed by a central unix team. They are good at keeping software up to date and there is a central catalogue of packages and versions available online. She said that her bigger concern is with database versions rather than software. She updates the databases they mirror every week, but don't keep historical versions or track exactly which version they use in their analyses.

Hemant said that he has a central copy of many of the main genome databases which are then shared around his site. This has helped to enforce consistency and reduce duplication of data.

Simon said that he hadn't found changing genome assemblies to be a problem since the assemblies themselves were updated very infrequently and were tagged with clear version numbers. A bigger problem for his group was changing annotation versions. The same genome assembly could go through several different gene builds and these were often not tracked through updates. Changing annotations could change analysis results if, for example, you wanted to analyse all promoters in the genome and your promoter list changed.

Alistair pointed out that recent versions of Galaxy now have the option of including version tags into analysis results, and that this works for any command line program which has a -v or --version option. This is a fairly simple way to track the provenance of analysis results. He also said that he avoided the annotation version problems by doing most of his analysis through the EnsemblAPI which enforces the use of a specific release and means you don't need to mirror annotations locally as you can just connect to the public Ensembl servers.

Simon said that although he did make use of the Ensembl API his experience had been that it had proved to be too slow to routinely use the public server for his analyses and he would need to mirror it locally for more widespread use. Matt said that his experience had been similar and that it wasn't always an option to use such a remote service.

Brent asked if anyone routinely did regression tests when putting in place new versions of software. Although some people would like to do this no one is actually doing it at the moment simply for lack of time to implement such a system. Brent said that this is something which is starting to happen for some of his projects. These now have a requirement to test new software versions and rigorously explore any differences in output to previous versions on a test dataset. He said that the effect of this is that it can take 3 months to get a new software version into production, but that the onus for doing this work has fallen onto the group who need this rather than on his team.