comet

Usage:

crux comet [options] <input spectra>+ <database_name>

Description:

This command searches a protein database with a set of spectra, assigning peptide sequences to the observed spectra. This search engine was developed by Jimmy Eng at the University of Washington Proteomics Resource.

Although its history goes back two decades, the Comet search engine was first made publicly available in August 2012 on SourceForge. Comet is multithreaded and supports multiple input and output formats.

Input:

input spectra+ – The name of the file from which to parse the spectra. Valid formats include mzXML, mzML, mz5, raw, ms2, and cms2. Files in mzML or mzXML may be compressed with gzip. RAW files can be parsed only under windows and if the appropriate libraries were included at compile time.

database_name – A full or relative path to the sequence database, in FASTA format, to search. Example databases include RefSeq or UniProt. The database can contain amino acid sequences or nucleic acid sequences. If sequences are amino acid sequences, set the parameter "nucleotide_reading_frame = 0". If the sequences are nucleic acid sequences, you must instruct Comet to translate these to amino acid sequences. Do this by setting nucleotide_reading_frame" to a value between 1 and 9.

Output:

The program writes files to the folder crux-output by default. The name of the output folder can be set by the user using the --output-dir option. The following files will be created:

comet.target.txt – a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.

comet.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.

comet.log.txt – a log file containing a copy of all messages that were printed to standard error.

CPU threads

Masses

--peptide_mass_tolerance <float> – Controls the mass tolerance value. The mass tolerance is set at +/- the specified number i.e. an entered value of "1.0" applies a -1.0 to +1.0 tolerance. The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default = 3.

--auto_peptide_mass_tolerance false|warn|fail – Automatically estimate optimal value for the peptide_mass_tolerancel parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false.

Fragment ions

--auto_fragment_bin_tol false|warn|fail – Automatically estimate optimal value for the fragment_bin_tol parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false.

--override_charge <integer> – Specifies the whether to override existing precursor charge state information when present in the files with the charge range specified by the "precursor_charge" parameter. Default = 0.

--output_suffix <string> – Specifies the suffix string that is appended to the base output name for the pep.xml, pin.xml, txt and sqt output files. Default = <empty>.

--mass_offsets <string> – Specifies one or more mass offsets to apply. This value(s) are effectively subtracted from each precursor mass such that peptides that are smaller than the precursor mass by the offset value can still be matched to the respective spectrum. Default = <empty>.

Variable modifications

--variable_mod01 <string> – Up to 9 variable modifications are supported. Each modification is specified using seven entries: "<mass> <residues> <type> <max> <terminus> <distance> <force>." Type is 0 for static mods and non-zero for variable mods. Note that that if you set the same type value on multiple modification entries, Comet will treat those variable modifications as a binary set. This means that all modifiable residues in the binary set must be unmodified or modified. Multiple binary sets can be specified by setting a different binary modification value. Max is an integer specifying the maximum number of modified residues possible in a peptide for this modification entry. Distance specifies the distance the modification is applied to from the respective terminus: -1 = no distance contraint; 0 = only applies to terminal residue; N = only applies to terminal residue through next N residues. Terminus specifies which terminus the distance constraint is applied to: 0 = protein N-terminus; 1 = protein C-terminus; 2 = peptide N-terminus; 3 = peptide C-terminus.Force specifies whether peptides must contain this modification: 0 = not forced to be present; 1 = modification is required. Default = 0.0 null 0 4 -1 0 0.

--pm-charge <integer> – Precursor charge state to consider MS/MS spectra from, in measurement error estimation. Ideally, this should be the most frequently occurring charge state in the given data. Default = 2.

--pm-pair-top-n-frag-peaks <integer> – Number of fragment peaks per spectrum pair to be used in fragment error estimation. Default = 5.

--pm-min-common-frag-peaks <integer> – Number of the most-intense peaks that two spectra must share in order to potentially be generated by the same peptide, for measurement error estimation. Default = 20.

--pm-max-scan-separation <integer> – Maximum number of scans two spectra can be separated by in order to be considered potentially generated by the same peptide, for measurement error estimation. Default = 1000.

--pm-min-peak-pairs <integer> – Minimum number of peak pairs (for precursor or fragment) that must be successfully paired in order to attempt to estimate measurement error distribution. Default = 100.

Input and output

--fileroot <string> – The fileroot string will be added as a prefix to all output file names. Default = <empty>.

--output-dir <string> – The name of the directory where output files will be created. Default = crux-output.