Comments to author

In their manuscript, a new multi-genomic approach for the study of biogeochemical cycles at global scale: the molecular reconstruction of the sulfur cycle, De Anda and colleagues describe a computational approach to characterize the relative importance of sulfur cycling in a (meta)genome dataset.

Their approach, calculating a "sulfur-score" based on the detected vs expected presence of genes involved in sulfur cycling seems appropriate for this question. I do however have several questions on the (description of) the methodology that I'd like to see clarified before recommending this manuscript for publication.

The introduction of the conceptual framework, in line 74-89 was not very clear to me at first reading. For readability of the manuscript I suggest the authors include some of the information that is present in the methods section. Specifically, why the minimum ecosystem concept and microbial mats are important.

I'm a little confused why the authors use the mean size length metric. Given a well curated reference database, even short reads should be alignable to protein sequences of any length. The length of the protein will impact the expected number of matches.

After the authors have selected 152 proteins involved in the S cycle, they only use 112 domains as annotated by interproscan. Do these domains represent 112 proteins? Why did the authors choose not to generate HMM's for the remaining 40 proteins?

Other than calculating the relative entropy, have the authors used any other check to assess whether the detected pfam domains were specific for the S-cycle? Many pfam domains contain proteins with a range of functions.

the purpose of figure 3 is not entirely clear to me. As it is, it contains too much information to be informative

the elaborate description of the metagenomes mentioned in line 433-463 seems unnecessary for the flow of the manuscript.