Release notes for 4.6.7

Fixed PME bug with high OpenMP thread count PME energies and forces could be incorrect with combined MPI+OpenMP
parallelization. This would, only, happen when
pmegrids->nthread_comm[YY] >= 2, which can only occur with high OpenMP
thread count with multiple large prime factors.
It's unlikely that this issue affected production runs.
#1572.

Backported from 5.0 a fix that avoids a stack overflow on Windows with CMake > 2.8.10.2. CMake used to add "/STACK:10000000" to the default linker flags. That
was removed in version 2.8.11-rc1. The default value used by MSVC is
apparently too small because mdrun crashes with a stack overflow when
built on Windows with MSVC or ICC and CMake newer than 2.8.10.2.

Fixed output of eigenvalues in g_covar. #1575 Commit 972032bfb8cd38 introduced a bug that would lead to eigenvalues
only written to .xvg file if "-last" is explicitly stated on the
command line. Otherwise no eigenvalues would appear in the .xvg file.
The eigenvalues are written in a loop from '0' to 'end', but since
'end' is initialized with '-1', the loop would never be executed.
This patch moves the code that computes 'end' one block upwards
before the output to file.

Fixed two PME issues with MPI+OpenMP Change 272736bc partially fixed #1388, but broke the more general
case of multiple MPI communication pulses in PME. Change 272736bc
incorrectly changed tx1 and ty1. This change has been reverted.
Change 27189bba fixed the incorrect PME grid reduction with multiple
thread grid overlap in y. But it broke the, much more common, case
where the y-size of the PME grid is not divisible by the domains in y.
This change, incorrectly, changed buf_my.
Now buf_my is set to the correct value, which solves both issues.
#1578.
#1388 and #1572.

Release notes for 4.6.6

When using tabulated interactions (historically with PME-Switch), the previous free-energy kernels used tabulated interactions which gave correct results. However, as we have moved to using the new interaction modifiers, Ewald short-ranged interactions are computed analytically. To extend the range over which we apply the soft-core interaction, the free-energy kernels evaluated interactions by subtracting the reciprocal-space component, and then applying the free-energy evaluation to the Coulomb (1/r) short-range interaction. This works fine for vanilla PME, but led to problems when combined with a switch modifier, since we are switching a different function compared to the non-free-energy kernels. This could lead to large artefacts where the free energy was 100x off if we were applying the cutoff to r while the switch was applied to the scaled soft-core radius.

This patch modifies the free-energy kernel so that the vanilla, shift, and exact-cutoff versions still use the compensation trick, while the switch modifier always operates on the traditional short-range Ewald functional form.

The (very small) Ewald shift has also been added when computing free energy in combination with Ewald summation and potential-shift modifiers. As the perturbation goes to zero, the interaction will also approach the non-free-energy interactions. Tested to match the non-free-energy kernel to with 1e-8 in the fully coupled state, it conserves energy, and produces reasonable free energies for ethanol in water.

This also modifies table-generation, table-usage, and dispersion-correction code to use shift/switch forms (and correctly), when that has been selected in the interaction modifiers. This provides much more accurate results for our new shifted interactions. Correct (unmodified) tables are now generated for 1-4 interactions in a few corner cases in the presence of modifiers for non-bonded interactions. Code paths for using exact cutoffs now work correctly when rcoulomb-switch != rvdw-switch, or if only one kind of switch is active.

Free-energy calculations using a plain Coulomb interaction now incorporate a potential shift if one exists. The GMX_NB_GENERIC environment variable can now be used to specify the use of the generic kernel even with shifts or switches active. #1463

Fixed bug in parallel v/f constraining with 3 or more decomposition domains in one or more dimensions could lead to modification of communicated v and f components by the box size for inter charge-group constraints. #1462

Added cut-off checks for triclinic domain decomposition, where with two decomposition cells in a trilinic dimension, the cut-off could be longer than the size of the communicated domains. This could lead to some pairs close to cut-off distance to be ignored in the force/energy calculations. #1467

Added BlueGene/Q Verlet cut-off scheme kernels, enhancements to CMake handling, support for bgclang (but latest compiler does not yet work with OpenMP), support for A2 core and QPX SIMD in CPU detection, updates to install guide.

Automated PP-PME (task) load-balancing: balancing non-bonded force and PME mesh workload when the two are executed on different compute-resources (i.e CPU and GPU or different CPUs). This enables GPU-CPU and PP-PME process load balancing by shifting work from the mesh to the non-bonded calculation.

PPPM/P3M with analytical derivative at the same cost and with the same features as PME.

New, advanced free energy sampling techniques.

AdResS adaptive resolution simulation support.

Enforced rotation ("rotational pulling")

Build configuration now uses CMake, configure+autoconf/make no longer supported. (The CMake build system features with a lot of automation and cleverness under the hood and we know that the it might not always prove to be as rock-solid as the old one. However, far more advanced and complex, so bear with us while we iron out issues that come up along the way.)

Improved regressiontests; these can now be run directly from the build tree using make check

g_hbond now utilizes OpenMP.

Bugfixes

No critical bugfixes. This version is based on 4.5.6 and all important fixes are "inherited" and therefore documented in the 4.5.6 release notes.

Changes that might affect your results

None for simulations set up with the traditional group cut-off scheme.

When switching from the group scheme to the Verlet scheme, integration of the equations of motion can get more accurate due to the exact cut-off treatment and buffering (this will, of course, depend on the original cut-off settings used). See the section Cut-off schemes for details.

Other important changes compared to 4.5

mdrun does now thread affinity setting

This means that when runing multiple mdrun processes on the same machine, one has to either provide a core "pin offset" using the -pinoffset command line option, or turn off internal affinities and take the performance hit (or alternatively manage affinities externally).

The choice of compiler matters more

With the switch to SIMD intrinsics, up-to-date SIMD CPU acceleration support, OpenMP, the compiler used matters more both in terms the ability to compile GROMACS correctly and from the point of view of mdrun performance. The recommended compilers that are known to work (=compile GROMACS correctly) and provide good performance on x86/AMD64 are: gcc 4.5 and later, Intel Compilers 12.0 and later and clang 3.1 (note the lack of OpenMP support which can cause 30%+ performance loss). In all cases you are strongly advised to use the most recent patch level available. GROMACS makes extensive use of compiler intrinsics to get the most out of your hardware, so if you use a compiler that is older than your hardware you are asking for trouble, because all the compilers have had bugs in their intrinsics implementations. For further details see ???.

completed removal of Fortran kernels (we are not aware of any systems where these would run faster than the corresponding non-accelerated C kernels by enough to be worth our effort, and probably the new force-only C kernels will be faster than the old Fortran kernels on any system where the disparity between Fortran and C compiler optimization is noticeable; speak up if any of this is a problem for you!)

completed removal of Power6 accelerated kernels (currently we lack the resources to implement accelerated kernels for Power architectures, and probably the new force-only C kernels will show results comparable with the old accelerated Power kernels; speak up if any of this is a problem for you - particularly if you have resources to offer to fix it!)