README

Change Log and Usage

BayesWHAM_v2 is a generalization of BayesWHAM_v1 that adds the capacity to (i) incorporate biased simulation data collected at any temperature and pressure in the NPT ensemble, and (ii) reweight these data to estimate the unbiased free energy landscape FES at any T and P.

The code is designed to function in the NPT ensemble.

Reweighting extrapolatively or interpolatively to [T,P] state points far from those at which data was collected can result in convergence failure of the WHAM iteration.

Execution is typcically more CPU and RAM intensive than BayesWHAM_v1 due to necessity to collect histograms in the two additional dimensions U and V to permit reweighting in T and P.

For simulation data collected at a single T and P state point, and construction of the landscape at that T and P, the additional functionality of BayesWHAM_v2 is not required. Accordingly, (i) BayesWHAM_v1 may be used or (ii) BayesWHAM_v2 may be used regardless, and the unnecessary cost associated with E and V binning of the data -- since no biasing is introduced in E or V -- eliminated by binning the data into a single U bin and a single V bin. See Example 1 below. (Note that reweighting to a different T or P state with a single U and V bin will introduce large errors due to poor resolution in U and V reweighting.)

BayesReweight_v2 is a generalization of BayesReweight_v1 designed for compatibility with BayesWHAM_v2.

BayesWHAM_plotter and BayesReweight_plotter remain unchanged from v1.

Theory

The theoretical underpinnings of the v2 generalization are described in BayesWHAM_addendum.pdf that should be read as an addendum to the published work A.L. Ferguson "BayesWHAM: A Bayesian approach for free energy estimation, reweighting, and uncertainty quantification in the weighted histogram analysis method" J. Comput. Chem.38 18 1583-1605 (2017).

Synopsis

BayesWHAM_v2.m - a Bayesian implementation of the weighted histogram analysis method (WHAM) to make statistically optimal estimates of multidimensional molecular free energy surface (FES) from an ensemble of umbrella sampling simulations conducted in NPT ensemble. The FES can be constructed from data collected at different T and P, with biasing in multidimensional collective variables psi, and requires a record of the trajectories in potential energy U, volume V, and biasing variables psi. The code functions in n-dimensions, natively supports uniform, symmetric Dirichlet, and Gaussian priors, and rigorously estimates uncertainties by Metropolis-Hastings sampling of the Bayes posterior. The source code may be straightforwardly modified to support arbitrary user-defined priors by augmenting two switch statements defining the MAP expression and the Metropolis acceptance criterion.

BayesWHAM_plotter.m - a utility code to plot the 1, 2, and 3-dimensional output of BayesWHAM.m

BayesReweight_v2.m - a tool to project the unbiased FES calculated by BayesWHAM into any number of arbitrary variables beyond those in which umbrella sampling was conducted.

BayesReweight_plotter.m - a utility code to plot the 1, 2, and 3-dimensional output of BayesReweight.m

Version

v2 - 06/2017

v1 - 08/2016

Installation

No installation required. These are MATLAB functions that can be called and executed directly from the MATLAB prompt.

I/O

Full details of required inputs, input files and formatting, and output files are provided in the headers of each script. Example calculations detailing the execution steps and providing all input and output files are detailed below.

./diala_phi_psi_EXAMPLE1/hist - [U,V,phi,psi] histograms compiled over [U,V,phi,psi] trajectories in ./diala_phi_psi_EXAMPLE1/traj in row major order

./diala_phi_psi_EXAMPLE1/traj_aux__phi_psi_theta - trajectories of projection variables [phi,psi,theta] into which we wish to reweight the FES recorded over the course of each umbrella sampling calculation at the same frequency as those in ./diala_phi_psi_EXAMPLE1/traj; the projection variables are arbitrary and need not contain the umbrella sampling variables

BayesWHAM_v2.m

[1]

Calculation of 2D FES F(phi,psi) at T=300 K and P=1 bar (i.e., at same T & P at which all biased simulations were conducted) where phi and psi are backbone dihedral angles with range [-180 deg, 180 deg] and 360 deg periodicity.

Histogramming has been conducted binning phi and psi every 10 degrees, but E and V have each been histogrammed into a single bin. Since landscape is being computed at the T=300 K and P=1 bar at wich all simulations were conducted, the E and V biasing factors are precisely zero and we can accelerate calculation by effectively declining to bin in E and V.

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.

Copies of all files generated by this calculation are provided in ./diala_phi_psi_EXAMPLE1/BayesWHAM_v2_OUTPUT

Since landscape is being computed at the T=300 K and P=1 bar at wich all simulations were conducted and binning in E and V was into a single bin each, we can run the previous version of BayesWHAM on these data to compare results

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.

Copies of all files generated by this calculation are provided in ./diala_phi_psi_EXAMPLE1/BayesWHAM_v1_OUTPUT

Output files are identical to those generated by BayesWHAM_v2.m in ./diala_phi_psi_EXAMPLE1/BayesWHAM_v2_OUTPUT

./diala_phi_psi_EXAMPLE2/hist - [U,V,phi,psi] histograms compiled over [U,V,phi,psi] trajectories in ./diala_phi_psi_EXAMPLE2/traj in row major order

./diala_phi_psi_EXAMPLE2/traj_aux__phi_psi_theta - trajectories of projection variables [phi,psi,theta] into which we wish to reweight the FES recorded over the course of each umbrella sampling calculation at the same frequency as those in ./diala_phi_psi_EXAMPLE2/traj; the projection variables are arbitrary and need not contain the umbrella sampling variables

BayesWHAM_v2.m

[1]

Calculation of 2D FES F(phi,psi) at T=300 K and P=1 bar (i.e., at same T & P at which all biased simulations were conducted) where phi and psi are backbone dihedral angles with range [-180 deg, 180 deg] and 360 deg periodicity.

Histogramming has been conducted binning phi and psi every 10 degrees, E every 1000 kJ/mol, and V every 1 nm^3. Since landscape is being constructed at same T and P at which all biased simulations were conducted, the E and V binning is irrelevant since the biasing factors in E and V are identically zero but does affect execution speed.

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.

Copies of all files generated by this calculation are provided in ./diala_phi_psi_EXAMPLE2/BayesWHAM_v2_OUTPUT_T300P1

Output files [betaF_MAP.txt, f_MAP.txt, hist_binCenters.txt, hist_binWidths.txt, p_MAP.txt, pdf_MAP.txt] are identical to those generated by BayesWHAM_v2.m in ./diala_phi_psi_EXAMPLE1/BayesWHAM_v2_OUTPUT and ./diala_phi_psi_EXAMPLE1/BayesWHAM_v1_OUTPUT.

Output files [logL_MAP.txt, all _MH, all _augUV] differ due to different histogram binning (i.e., the binning of E and V in the present case) that gives rise to differnt value of likelihood and sequence of random numbers in MH sampling.

Extrapolations / interpolations to state points far from those at which data were collected can lead to large errors.

Histogramming has been conducted binning phi and psi every 10 degrees, E every 1000 kJ/mol, and V every 1 nm^3. Since landscape is being constructed at different T and P from that at which all biased simulations were conducted, the E and V binning affects reweighting accuracy and execution speed. In practice, the E and V binning is probably too coarse, and the robustness of the results to the bin width should be assessed.

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.

Copies of all files generated by this calculation are provided in ./diala_phi_psi_EXAMPLE2/BayesWHAM_v2_OUTPUT_T310P50

Extrapolations / interpolations to state points far from those at which data were collected can lead to large errors.

Histogramming has been conducted binning phi and psi every 10 degrees, E every 1000 kJ/mol, and V every 1 nm^3. Since landscape is being constructed at different T and P from that at which all biased simulations were conducted, the E and V binning affects reweighting accuracy and execution speed. In practice, the E and V binning is probably too coarse, and the robustness of the results to the bin width should be assessed.

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.

Copies of all files generated by this calculation are provided in ./diala_phi_psi_EXAMPLE2/BayesWHAM_v2_OUTPUT_T290P1

./diala_phi_psi_EXAMPLE3/hist - [U,V,phi,psi] histograms compiled over [U,V,phi,psi] trajectories in ./diala_phi_psi_EXAMPLE3/traj in row major order

./diala_phi_psi_EXAMPLE3/traj_aux__phi_psi_theta - trajectories of projection variables [phi,psi,theta] into which we wish to reweight the FES recorded over the course of each umbrella sampling calculation at the same frequency as those in ./diala_phi_psi_EXAMPLE3/traj; the projection variables are arbitrary and need not contain the umbrella sampling variables

Histogramming has been conducted binning phi and psi every 20 degrees, E every 200 kJ/mol, and V every 0.5 nm^3. E and V binning affects accuracy and speed of reweighting.

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.

./diala_phi_psi_EXAMPLE4/hist - [U,V,phi,psi] histograms compiled over [U,V,phi,psi] trajectories in ./diala_phi_psi_EXAMPLE4/traj in row major order

./diala_phi_psi_EXAMPLE4/traj_aux__phi_psi_theta - trajectories of projection variables [phi,psi,theta] into which we wish to reweight the FES recorded over the course of each umbrella sampling calculation at the same frequency as those in ./diala_phi_psi_EXAMPLE4/traj; the projection variables are arbitrary and need not contain the umbrella sampling variables

Histogramming has been conducted binning phi and psi every 20 degrees, E every 200 kJ/mol, and V every 0.5 nm^3. E and V binning affects accuracy and speed of reweighting.

We solve the generalized WHAM equations to a tolerance of 1E-15 under a uniform prior. Uncertainties are estimated by performing 2E5 rounds of Metropolis-Hastings sampling from the Bayes posterior and saving every 1E3 realizations. In practice, many more rounds of sampling should be performed until the log likelihood stabilizes indicating the Markov chain burn-in period has passed and samples are being drawn from the stationary distribution.