Software

One hallmark of our lab is the development of novel, open source software to push the limits of molecular simulation methods and to bring these capabilities to the field in general. Some key examples are shown below.

Much of our software is disseminated on SimTK.org. Also, much of it is developed in conjunction with collaborators, including key collaborators from Simbios. A full list of authors, credits, and papers associated with the software packages can be found on the web pages linked below.

Folding@home is a distributed computing project — people from throughout the world download and run software to band together to the largest supercomputer in the world. Folding@home uses novel computational methods coupled to distributed computing, to simulate problems millions of times more challenging than previously achieved.

MSMBuilder: Simulate and analyze long timescale dynamics

Understanding a molecule’s conformational dynamics requires mapping out the dominant metastable, or long lived, states that it occupies and then determining the rates for transitioning between these states. Markov State Models (MSMs) provide a natural framework for accomplishing this objective. To facilitate more widespread use of MSMs we have developed the MSMBuilder package. Besides building MSMs, the code includes tools for verifying that the resulting model is Markovian as well as analyzing and visualizing the model. For example, it is possible to determine the populations of each state with error-bars and to extract representative conformations for each state so that the systems dynamics may be visualized. The code is written in object oriented C++ and Python so new developments may be incorporated rapidly.

This software was created to understand molecular dynamics data generated with the Gromacs simulation package but may be easily generalized to other data types and simulation packages by extending the class structure. Alternatively, other trajectory formats may be converted into the Gromacs xtc format.

Pre-compiled binaries are available under the downloads link for Linux and Mac OS X, the installation of which should take about five minutes. The source code is also available. Installing from source may take anywhere from 30-60 minutes depending on whether you have all the dependencies.

OpenMM: A library for rapid execution and rapid development of molecular dynamics software

Description: OpenMM is a library which provides tools for modern molecular modeling simulation. As a library it can be hooked into any code, allowing that code to do molecular modeling with minimal extra coding.

Moreover, OpenMM has a strong emphasis on hardware acceleration, thus providing not just a consistent API, but much greater performance than what one could get from just about any other code available.

Long Term Goals and Related Uses: While Molecular Dynamics is not new, the key advance here is the hardware acceleration (which has been extremely limited in acceptance due to the challenges involved) and the extensibility (allowing for rapid prototyping and development of new MD methods). The functionality of OpenMM will (eventually) include everything that one would need to have to run modern molecular simulation.

Algorithms for several emerging large-scale problems in cheminformatics have as their rate-limiting step the evaluation of relatively slow chemical similarity measures, such as structural similarity or three-dimensional (3-D) shape comparison. In this article we present SCISSORS, a linear-algebraical technique (related to multidimensional scaling and kernel principal components analysis) to rapidly estimate chemical similarities for several popular measures. We demonstrate that SCISSORS faithfully reflects its source similarity measures for both Tanimoto calculation and rank ordering. After an efficient precalculation step on a database, SCISSORS affords several orders of magnitude of speedup in database screening. SCISSORS furthermore provides an asymptotic speedup for large similarity matrix construction problems, reducing the number of conventional slow similarity evaluations required from quadratic to linear scaling.

PAPER is a program to calculate optimal molecular overlays, based on the Gaussian model of molecular shape (as used, for example, in OpenEye ROCS). It accelerates large screening experiments by evaluating multiple overlays in parallel on NVIDIA GPUs. The full source code to PAPER, as described in the publication, is provided. The audience is Computational chemists and biologists interested in molecular shape overlay, e.g for virtual screening or docking.

YANK: GPU-accelerated calculation of ligand binding affinities

YANK is a code for estimating free energies of ligand binding using free energy perturbation methods, utilizing the OpenMM library for GPU-accelerated molecular dynamics. YANK intends to both accelerate free energy calculations and make them simple enough to run (through encoding current “best practices” of the FEP community) so that they might replace other post-docking scoring methods currently in use in the drug design and computational chemistry communities that are less rigorous from a statistical mechanics point of view.

Yank is written by John Chodera and Kim Branson ( former Pande group members ) with help from Imran Haque. They have also worked closely with OpenMM team members to make tight connections between Yank and OpenMM. Please see the Yank page on SimTK.org for details (currently no downloads are available).

Ocker: A new approach to small molecule virtual screening

Ocker is a new small molecule docking program using a range of 2D and 3D searching methods in conjuntion with information from protein-protein complex structures to identify small molecule inhibtitors of protein-protein interaction. It can also function as a ‘standard” virtual screening tool. It has a variety of scoring and ranking methods and is written in python and C++.

OpenMM Zephyr: Making MD simulations easy to use

OpenMM Zephyr is a molecular simulation application for studying molecular dynamics of proteins, RNA, and other molecules. Zephyr guides the user through a work flow for setting up and running a specialized version of the molecular dynamics application gromacs. This version of gromacs uses the OpenMM API for GPU-accelerated molecular simulations.

SIML: a fast SIMD implementation of LINGO chemical similarities

SIML (“Single-Instruction, Multiple-LINGO”) is a library containing implementations of a fast SIMD algorithm for calculating the LINGO chemical similarity metric (described in an upcoming publication). This method, currently implemented for x86 CPUs (non-vectorized) and NVIDIA GPUs, is several times faster than existing LINGO implementations for the CPU, and two orders of magnitude faster when run on a GPU.

PyOpenMM: A Python interface to OpenMM

PyOpenMM is a python API that wraps the OpenMM library. OpenMM is a library that provides tools for performing GPU accelerated molecular modeling simulations. See the OpenMM project for OpenMM related details

Molecular Simulation Trajectories Archive of a Villin Variant

Molecular dynamics (all-atom, explicit solvent) simulations were performed on a set of nine unfolded conformations of the fastest-folding protein yet discovered, a variant of the villin headpiece subdomain (HP-35 NleNle). The simulations were generated using a new distributed computing method utilizing the symmetric multiprocessing paradigm for individual nodes of the “Folding@home” distributed computing network. This technology has enabled the generation of hundreds of trajectories each on a timescale comparable to the experimental folding time, revealing kinetic complexity not resolved in current experimental data.

The trajectory files for thousands of all-atom, explicit solvent molecular dynamics simulations performed on a set of nine unfolded conformations of a variant of the villin headpiece subdomain (HP-35 NleNle) are made available in GROMACS and PDB formats, along with a VMD plug-in to visualize the trajectories. The trajectories are organized into a group, PROJ3036, which contains trajectories starting from the nine non-folded configurations mentioned above. For a detailed description of the organization of the thousands of trajectories available, please go to the project’s Wiki.

The CAMPAIGN project’s goals are to modularize and parallelize data clustering algorithms and explore new clustering approaches, with special concentration on running on GPUs. The currently implemented algorithms (K-means, K-centers, hierarchical clustering, and an unreleased version of self-organizing map) achieve one to two orders of magnitude speed-up on a single Nvidia Tesla GPU over CPU reference implementations.

This program can be used to calculate Small Angle X-ray Scattering (SAXS) profiles from atomic coordinates in PDB-format. By making use of the massive parallelism of the Nvidia range of CUDA-enabled GPUs, this code achieves a two orders of magnitude speedup over comparable CPU-code, allowing fast profile computation even for systems of million atoms size.

MemtestG80: GPU Memory tester

MemtestG80 is a software-based tester to test for “soft errors” in GPU memory or logic for NVIDIA CUDA-enabled GPUs. It uses a variety of proven test patterns (some custom and some based on Memtest86) to verify the correct operation of GPU memory and logic. It is a useful tool to ensure that given GPUs do not produce “silent errors” which may corrupt the results of a computation without triggering an overt error.

Long Term Goals and Related Uses: 1) MemtestG80 allows end-users to verify the correct operation of their hardware under their own environmental conditions. 2) GPU software developers can integrate the MemtestG80 code into their own projects prior to distribution as an added self-test mechanism on target machines.

Please see the MemtestG80 page on SimTK.org for downloads and more detail or this reference:
Haque IS and Pande VS. Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU. arXiv:0910.0505v1 [cs.AR]. (2009) View here.

MMTools: A toolkit for aiding in the preparation and setup of molecular mechanics systems

MMTools is our code for automated, batch-style set up of Molecular Dynamics simulations. MMTools consists of a collection of Python modules to aid in the preparation of molecular mechanics systems for popular molecular simulation packages, such as gromacs and AMBER. It draws on tools such as MCCE (Gunnar lab) for prediction of protonation states, UCSF Modeller (Sali lab) for building in missing loops and residues, ACPYPI for conversion of topology file formats, and Amber’s AmberTools for setup of proteins and small molecules.