Pages

Wednesday, June 27, 2012

Reviews of Casper's second PLoS ONE submission

The reviews for Casper's second PLoS ONE submission, submitted around May 25th are in. Let it not be said that all #openaccess papers are carelessly reviewed!

I have all but skimmed it but the only major concern of Reviewer #1 is of course very easily dealt with (good point by the way!).

Similarly, the major criticism of Reviewer #2 is also easily dealt with. We were of course aware of the FMOUtil program, but I am not sure what a comparison would entail? We should of course point out any differences in the input files produced by FragIt and FMOUtil for polypeptides and emphasize the advantage of FragIt (general applicability) by doing a few more examples like DNA and perhaps a large ligand bound to a protein?

PONE-D-12-14697
FragIt: A Tool to Prepare Input Files for Fragment Based Quantum
Chemical Calculations
PLoS ONE

Dear Mr Steinmann,

Thank you for submitting your manuscript to PLoS ONE. After careful
consideration, we feel that it has merit, but is not suitable for
publication as it currently stands. Therefore, my decision is "Major
Revision."

We invite you to submit a revised version of the manuscript that
addresses all the critical points raised by both reviewers.

We encourage you to submit your revision within sixty days of the date
of this decision.

When your files are ready, please submit your revision by logging on to http://pone.edmgr.com/ and following
the Submissions Needing Revision link. Do not submit a revised
manuscript as a new submission.

If you would like to make changes to your financial disclosure, please
include your updated statement in your cover letter.

Please also include a rebuttal letter that responds to each point
brought up by the academic editor and reviewer(s). This letter should be
uploaded as a Response to Reviewers file.

In addition, please provide a marked-up copy of the changes made from
the previous article file as a Manuscript with Tracked Changes file.
This can be done using 'track changes' in programs such as MS Word
and/or highlighting any changes in the new document.

Steinmann et al. present a description of FragIt, a command-line program
to prepare GAMESS input files for Fragment Molecular Orbital
Calculations. Fragments are generated by bond cleavage at bonds that
match particular SMARTS patterns, which simplifies the
procedure in particular for polymers such as proteins and
oligosaccharides. As the authors point out, without a tool of this sort,
it would be next to impossible to prepare input files for any but the
smallest proteins.

The manuscript is overall clear and well-written, but could have
benefited from a final careful edit before submission as there are many
trivial issues. I tested out the software, and it worked without
problems (thank you for including a test file).

Major:
(1) This paper does not describe a particular version of the software.
Please make a version 1.0 release available from the Github download
site or elsewhere. Given that most computational chemists may not be
familiar with Github, please provide a simple download
link for the software to be downloaded.

Minor:
(1) Spellings: Copenahgen, dilengtly, indepth, chargetransfer, "rise to
further" not "rise further", "in Figure" not "on Figure", "nescessary",
"input file for" not "input file to", "because we" not "because of we",
"aswell", "supplimentary"
(2) Page 1, line 52: Please describe what is behind the "etc". (I would
like to know)
(3) Is this tool only specific to proteins and oligosaccharides? Can you
make this clearer? The MMFF94 forcefield only supports a limited subset
of atom types, and already in the introduction at page 1, line 52, you
start talking about proteins. If this is
the case, perhaps you should also mention this in the title.
(4) This tool appears to be specific to GAMESS. While I would encourage
you to develop your software in such a way that it could be used for
other comp chem packages, if that is not your goal, perhaps you should
name GAMESS specifically in the title, i.e. A
Tool to Prepare GAMESS Input Files. Otherwise you may mislead or
disappoint the reader, and in this way you also highlight it to GAMESS
users.
(5) Colloquialisms: "doable", "mean feat", "we've", "don't". While I
have no particular problem with the use of these, such expressions are
not generally found in the literature and may be confusing to non-native
speakers (who form the majority of the readership).
(6) The figure numbers correspond to the legends, but not the actual
figures in the PDF provided. Naturally, this caused me some confusion.
(7) I found it unusual that the Results section preceded the description
of the software and dataset. In fact, I skipped reading the Results
until the end, and I would recommend you consider rearranging the
manuscript accordingly.
(8) Is there software available already which can already perform the
same action? This wasn't very clear to me. For example, you reference
Open Babel on Page 2 but Open Babel has no particular abilities to
prepare FMO files.
(9) As someone who is not familiar with FMO calculations, I would have
been interested to read a general description of why they useful, how
fragments should be decided, what is the associated decrease in
accuracy, the increase in speed, and so forth. Perhaps
the authors could consider a paragraph on this as this feeds into why
the software is important.
(10) EFMO is mentioned on Page 2, line 33, but not again in the paper.
Is this supported or not (and how does it differ)?
(11) Abbreviations such as FMO, EFMO, SMARTS should be listed with
capitals, e.g. "Fragment Molecular Orbital" not "fragment molecular
orbital"
(12) Page 5, line 39: Remove line referencing Pybel if not used.
(13) Page 5, Line 53: Here it says that "explicitly defined valid pairs
of atoms" can be used instead of substructure patterns, but elsewhere in
the manuscript there is no example of this. Could the authors provide
an example of usage?
(14) Page 6, line 10: do you mean "combines two or more adjacent
fragments?" (subsequent means neighbouring in time)
(15) Page 6, line 33: What does FMO/FD stand for?
(16) Page 6, line 37: This sentence is confusing. How about "One defines
a central fragment as above and a distance. Fragments which have atoms
with this distance..."
(17) Page 8, line 26: "--output-active-atoms" not
"--output-active-distance" according to the version of the software I
downloaded.
(18) Please add a section in the manuscript describing the contents of
the Sumpplementary Section.
(19) References. Reference formatting is not consistent. SOmetimes
journal titles abbreviated, sometimes not. Should be Avogadro in Ref 9
not Open Babel. All titles are in lowercase even though this may be
incorrect (e.g. ref 8).
(20) Figures. Which are the active, inactive and frozen regions in
Figure 6?
(21) Consider including a LICENSE.txt with the source code. This makes
the license more obvious.

Reviewer #2: The manuscript describes FragIt, a novel and flexible
method and corresponding
software package to prepare input data for quantum chemical calculations
using
fragmentation methods (the FMO method in particular). The presented
results
are rather narrow in focus (FMO method of the GAMESS package for
polypeptides
and polysaccharides), but generally applicable and easily extensible in
theory.

The described software is open source and builds on other
cheminformatics
software (Open Babel and PDB2PQR). It is written in python which makes
it
easily usable and modifyable by most computational chemists. Due to the
use of
SMARTS patterns which can be specified at run-time, no software
engineering is
needed to apply FragIt to different systems.

Our main criticism is the lack of comparison with the available FMOUtil
package
(see general comment 4 below). This point must be addressed before
publication in our opinion.

Less important but still higly recommended for consideration are our
general
comments 1, 2, 3 and 5 about the abstract/introduction - we think some
rewriting would add to the clarity of the manuscript.

As such we recommend to accept this manuscript for publication provided
the
above points (general comments 1-5) have been addressed and the other
points
considered (unless they indicate clear mistakes).

General comments:

1. The beginning of the introduction mentions fragmentation methods in
general
but lacks a clear statement about which method they support in
particular.
From the rest of the manuscript, it appears that FragIt is (for now)
specific
to Kitaura's and Fedorov's FMO method as implemented in GAMESS. As such,
it
would appear appropriate to quickly summarize the method and its main
features
to the PLOS One readership.

2. The second part of the introduction discusses the "tedious tasks" one
has to
undertake to setup program packages for fragmentation methods and
mentions some
prior software which "can perform some of these tasks". However, these
tasks
are not presented at this point (they appear to be implicitly presented
by
explaining the work-flow and fragmentation algorithm later on). While a
thorough discussion of them should be left for later sections, we think
it
would add to the manuscript to present them in the introduction.

3. The discussion of prior software packages with similar scope is
lacking. The
authors cite Avogadro, Open Babel and Facio. Avogadro is a molecular
modelling
application with support for GAMESS input deck generation, but (to our
knowledge) not including fragmentation. Open Babel is a general
cheminformatics
toolkit and has (to our knowledge) no special features targetted at
fragmentation methods besides the SMARTS handling the authors are using
in
FragIt. Facio appears to be a general-purpose molecular-modelling
application includings GAMESS input file / visualization support. The
screenshot at http://www1.bbiq.jp/zzzfelis/FMO2.jpg implies it
includes support
for the FMO method, so the authors should discuss which tasks (see above
point
2) Facio is unable (or only poorly/difficully able) to perform compared
to
FragIt.

Further, PEACH (http://www.cbi.or.jp/~nakata/peach/4.8/peachw48.html
is a
related link, the main link appears to be gone) appears to be able to
fragment
DNA and dump GAMESS output (besides ABINIT output), but its availability
and
usage is unclear to us and has not been evaluated. Nevertheless, it
should
probably be mentioned along the others as well.

4. In addition to the above general comment 3, the authors completely
ommitted
FMOUtil (http://staff.aist.go.jp/d.g.fedorov/fmo/fmoutil.html),
written
by the
authors of the FMO method itself. It appears to be the most obvious
choice for
comparison: it is similarly licensed as FragIt (GPL version 2, according
to the
source file and program output) and of similar usage (command-line tool
dumping
a GAMESS input file), albeit of more limited scope (restricted to
proteins/polypeptides). An additional feature of FMOUil appear to be
grouping
of glycine residues to the previous fragment, the authors could address
why this
is not done in FragIt and/or add this if easily implemented via SMARTS.

As most of the results are about polypeptides, we would even advise to
include
FMOUtil fragmentation results for comparison in the results sections,
rerunning
the fragmentation with FMOUtil should not take long.

5. In the light of general comment 4, the unique advantages of FragIt
are not
very prominently mentioned. The authors supply patterns for
polypeptides and
polysaccharides. Due to the residue-centric PDB format, fragmenting
proteins by
residue appears to be a simple problem, while polysaccharides are
probably a
very less common target for quantum chemistry applications in need of
fragmentation methods. The unique advantages compared to e.g. FMOUtil
appear to
be (i) the possibility to fragment peptides at different bonds along the
backbone than what is indicated in the PDB file and (ii) to easily
fragment
arbitrary other types of polymers without the need to change the source
code or
otherwise difficult setups. Point (ii) is only indirectly addressed on
page
7/line 55 in a comment on testing new pattern.

As such it would make a stronger case for FragIt if further patterns for
other
types of polymers (the most notable example would be nucleotides) were
presented or at least their easy development stressed more. The last
section
(Availability and Future Directions) mentions DNA (page 9/line 5), but
at the
very end of the manuscript, and only one more example (solid state
systems) is
mentioned, which underplays the apparent versatility of the SMARTS
approach in
FragIt in our opinion.

Rewriting parts of the abstract to this end should be considered as
well.

6. The grouping of the sections is unusual: the results are presented
before
the Design and Implementation. From reading the manuscript, one gets
the
feeling it has originally been written the other way around, e.g. the
PDB codes
are only mentioned at the very end when the dataset is described, not on
the
first mention of the respective proteins and the figures 2-4 are mostly
generic
figures whose referencing in the Results section looks slightly
misplaced.

7. The results are strictly limited to the fragmentation and their
respective
merits are discussed without backing from QM calculations. While
full-scale QM
calculations would likely be out-of-scope for the manuscript, a more
thorough
discussion (possibly with references to quantitative comparisons) about
which
fragmentation results are desirable in general would useful in order to
better
assess the various FragIt options. The authors refer to references 21
and 22
with respect to the fragementation of peptide bonds, but fragmentation
sizes
etc. are not discussed that much.

8. The terms of the availability (especially the software license) are
not
clearly stated in the Availability and Future Directions section.
According to
the code on github, FragIt is licensed under the GPL, version 2 (or
later); we
believe it should at least be mentioned that it is distributed under an
open
source license. We would also suggest to include a (web-)citation for
the
FragIt code for indexing and linking purposes instead of the single
inline URL
on page 8/line 53.

9. Although the FMO-specific parts of the generated GAMESS input is the
crucial
part of FragIt, it seems to write a complete GAMESS output including
hardcoded
default settings for the ab initio method and basis sets. Those
non-fragmentation specific parts of the input are probably supposed to
be edited
by the user to their needs. FMOUtil (see general comment 4) apparently
includes
support for various methods and basis sets implementing the FMO method,
so
discussing the nature of the default SYSTEM/SCF/CONTROL/BASIS groups in
the
Writing the Input Files section should be considered.

10. The section Availability and Future Directions could discuss
possible
interaction or inclusion of FragIt in other (open-source) software
packages, the
most likely being Avogadro (reference 9), which already includes a
sophisticated
GAMESS input deck generator and a plugin infrastructure including python
support. Avogadro being a graphical application might eben make it
possible to
interactively manipulate fragment boundaries.

11. The paper is sometimes written in a somewhat informal style (see
e.g.
page 2/line 15/16, "is [...] no minor task when you have hundreds of
fragments")
and we encountered several language issues or errors that we have not
explicitly
mentioned.

Specific comments:

Page 1, Line 39: we suggest adding a citation to the recent Chem. Rev.
review
(DOI: 10.1021/cr200093j) of fragmentations methods to the end of the
sentence
"[...] such as fragmentation methods".

Page 1, Line 48: the pointer to the supporting information regarding the
complexitiy of input files for fragmentation methods compared to
conventional
methods seems unnecessary here. First off, there are numerous
fragmentation
methods and some might not require much more complex setup than
conventional
ones. Further, the supporting information does not appear to contain
any
conventional input files for comparison anyway.

Page 2, Line 37: PDB2PQR is not only available online, it can be freely
downloaded and installed locally for (offline) use with FragIt.

Page 2, Line 38: the reference to the RECAP algorithm without further
explanation looks misplaced in the introduction and could be moved to
the Design
and Implementation section.

Page 3, Lines 14-22: the usefulness of the molecular cluster example is
unclear.
If the only reason for its inclusion is to show FragIt will fragment
seperate
molecules to one fragment each, this could perhaps be folded into the
Design and
Implementation section. The specific example (16 water molecules and
one
tyrosine) leads to 16 very small and one big fragment. It is unclear
how this
is a "good" result considering the size difference of the fragments; if
this
results is indeed better than grouping maybe 2-3 water molecules into
one
fragment, this should be discussed. We suggest to omit or move this
subsection.

Page 4, Lines 10-19: the description of figure 5 is hard to follow.
Several
fragments in figure 5 have the same color, and the text does not discuss
them,
just summarized the fragment size. It would be clearer if the text
would
mention the color and/or fragment number according to the panels in
figure 5.
Another possibility would be to enhance the caption of figure 5 to
include that
explanation.

Page 4, Lines 21/23: it is unclear whether the lack of fragmentation
along
disulfide bonds "unless a specific pattern is supplied" is intentional
(and thus
desired), or simply a missing feature. Further, the discussed results
include
no example with disulfide bonds, while the test set does (see page
7/line 14),
though also without discussion.

Page 6, Line 30: we assume fragement I is the central fragment mentioned
before,
it could be mentioned explicitly to make this clearer.

Page 6, Line 44: (web-)citations for PyMOL and Jmol should be provided
(reference 29 cites Jmol, but only later in the manuscript on page
8/line 46).

Page 6, Line 44: "visually inspect" is vague, we assume the output
scripts
include markup so that PyMOL/Jmol will color each fragment differently
(as
shown in the figures), this could be mentioned more specifically
(perhaps
referencing one of the figures), including possible other features.

Page 7, Line 26: (web-)citation for NumPy could be provided considering
Python
got cited earlier in the manuscript as well.

Idem: The strict dependency on Python 2.6 is not explained; if this was
indeed
the case, it would seriously narrow the field of use. Another
possibility is
that the "(or greater)" after Numpy applies to Python as well, in this
case this
should be made clearer. If Python 2.6 is just the version used by the
authors,
we suggest adding language like "has been tested/validated with Python
2.6".
The same applies for OpenBabel, the supported version should be
mentioned here;
as reference 10 specifies version 2.3.0, this could be mentioned here as
well.

4. Reference 16: the ID for the arXiv preprint is missing a dot, it
should be
"1202.4935".

5. Reference 23: The URL is not marked up like the other URLs in e.g.
references 24-26.

Comments on Figures:

1. The figures are referenced out-of-order in the text in the sequence
1-2-5-3-4-7-6-8. Further, the captions and uploaded images do not
match;
uploaded figure 2 is labelled "algorithm" (figure 7 caption) and every
later
figure is offset by one between the uploaded figure and the figure
caption due
to this.

2. Figure 1: similar to the results it depicts (see comments to page
3/lines
14-22 above), the usefulness of figure 1 is dubious. It displays the
water/tyrosine cluster in two panels, with regular atom-coloring in the
first
panel and fragment coloring in the second. However, for the second
panel, there
are fewer different colors than there are fragemnts, so several
fragments have
the same color. As the fragments are not otherwise labelled, the actual
result
is contradicted, i.e. that every independent molecule is a seperate
fragment.
We suggest removing the figure or at least adding labels to the
repeatedly
colored fragments and/or clarify the coloring/fragmentation in the
caption.

3. Figure 6: the caption should explain the coloring with respect to the
mentioned regions, possibly also denoting the fragment indices so that
the figure can be understood for black-and-white printouts.

4. Figure 7: the caption looks superflouos and unnecessary.

Comments on Tables:

1. Table 2: the comment "d.o." in rows 2-6 is unclear to us - pardon our
ignorance if this is an otherwise common abbreviation. If it signifies
"see
above", "v.s." (vide supra) could be used instead, or just the first
comment
"capped with methylene" be repeated.

Comments on Software:

1. No release has been made of FragIt so far, only the git trunk is
available.
We advise on releasing a version and tarball in conjunction with the
paper for
future reference. This must not be a 1.0 release.

2. Only version 2.6 of python is supported (the documentation mentions
ongoing
work to support further python versions). As FragIt is not a big
project (below
2k lines of code, 25% of which belongs to the test suite), we suggest
removing
the python-2.6 dependency for a public release (see software comment 1),
if
possible.

3. The config file format conflates generic configuration values
("writer", the
default patterns, output options) with specific results for a particular
PDB
file ("explicitfragmentpairs" and "explicitprotectatoms"). This might
be due to
option parsing, but is non-intuitive. The same goes for requiring a PDB
file
for dumping the configuration values into a file.

4. It is not possible to enable/disable protection patterns via
command-line
options (at least according to --help output) the only way to do this
seems to
be by changing the config file. Compared to some of the other
command-line
options, this one seems rather important to us so we suggest adding it.

5. Including a (probably trivial) setup.py script for installation and
deployment as is customary for python software is advised.