DESCRIPTION

Multiple SDFile names are separated by spaces. The valid file extensions are .sdf and .sd. All other file names are ignored. All the SD files in a current directory can be specified either by *.sdf or the current directory name.

The current release of MayaChemTools supports generation of atom types fingerpritns corresponding to following -a, --AtomIdentifierTypes:

Based on the values specified for -a, --AtomIdentifierType along with other specified parameters such as --AtomicInvariantsToUse and --FunctionalClassesToUse, initial atom types are assigned to all non-hydrogen atoms or all atoms in a molecule

Using the assigned atom types and specified -m, --Mode, one of the following types of fingerprints are generated:

The supported aromaticity model names along with model specific control parameters are defined in AromaticityModelsData.csv, which is distributed with the current release and is available under lib/data directory. Molecule.pm module retrieves data from this file during class instantiation and makes it available to method DetectAromaticity for detecting aromaticity corresponding to a specific model.

FixedSize value is not supported for AtomicInvariantsAtomTypes value of -a, --AtomIdentifierType option.

ArbitrarySize corresponds to only atom types detected in molecule; FixedSize corresponds to fixed number of previously defined atom types for specified -a, --AtomIdentifierType.

--BitsOrderAscending | Descending

Bits order to use during generation of fingerprints bit-vector string for AtomTypesBits value of =item --BitsOrderAscending | Descending

Bits order to use during generation of fingerprints bit-vector string for AtomTypesBits value of -m, --mode option. Possible values: Ascending, Descending. Default: Ascending.

Ascending bit order which corresponds to first bit in each byte as the lowest bit as opposed to the highest bit.

Internally, bits are stored in Ascending order using Perl vec function. Regardless of machine order, big-endian or little-endian, vec function always considers first string byte as the lowest byte and first bit within each byte as the lowest bit.

--CompoundIDDataFieldName or LabelPrefixString

This value is --CompoundIDMode specific and indicates how compound ID is generated.

For DataField value of --CompoundIDMode option, it corresponds to datafield label name whose value is used as compound ID; otherwise, it's a prefix string used for generating compound IDs like LabelPrefixString<Number>. Default value, Cmpd, generates compound IDs which look like Cmpd<Number>.

Examples for DataField value of --CompoundIDMode:

MolID
ExtReg

Examples for LabelPrefix or MolNameOrLabelPrefix value of --CompoundIDMode:

Specify how to generate compound IDs and write to FP or CSV/TSV text file(s) along with generated fingerprints for FP | text | all values of --output option: use a SDFile(s) datafield value; use molname line from SDFile(s); generate a sequential ID with specific prefix; use combination of both MolName and LabelPrefix with usage of LabelPrefix values for empty molname lines.

For MolNameAndLabelPrefix value of --CompoundIDMode, molname line in SDFile(s) takes precedence over sequential compound IDs generated using LabelPrefix and only empty molname values are replaced with sequential compound IDs.

This is only used for CompoundID value of --DataFieldsMode option.

--DataFields"FieldLabel1,FieldLabel2,..."

Comma delimited list of SDFiles(s) data fields to extract and write to CSV/TSV text file(s) along with generated fingerprints for text | all values of --output option.

This is only used for Specify value of --DataFieldsMode option.

Examples:

Extreg
MolID,CompoundName

-d, --DataFieldsModeAll | Common | Specify | CompoundID

Specify how data fields in SDFile(s) are transferred to output CSV/TSV text file(s) along with generated fingerprints for text | all values of --output option: transfer all SD data field; transfer SD data files common to all compounds; extract specified data fields; generate a compound ID using molname line, a compound prefix, or a combination of both. Possible values: All | Common | specify | CompoundID. Default value: CompoundID.

-h, --help

Print this help message.

-i, --IgnoreHydrogensYes | No

For yes value of -i, --IgnoreHydrogens, any explicit hydrogens are also used for generation of atom type fingerprints; implicit hydrogens are still ignored.

-k, --KeepLargestComponentYes | No

Generate fingerprints for only the largest component in molecule. Possible values: Yes or No. Default value: Yes.

For molecules containing multiple connected components, fingerprints can be generated in two different ways: use all connected components or just the largest connected component. By default, all atoms except for the largest connected component are deleted before generation of fingerprints.

For AtomTypesCount atom types fingerprints, two types of atom types set size can be specified using -a, --AtomTypesSetToUse option: ArbitrarySize or FixedSize. ArbitrarySize corrresponds to only atom types detected in molecule; FixedSize corresponds to fixed number of atom types previously defined.

For AtomTypesBits atom types fingeprints, only FixedSize is allowed.

Combination of -m, --Mode and --AtomTypesSetToUse along with -a, --AtomtomIdentifierType allows generation of following different atom types fingerprints:

-o, --overwrite

Overwrite existing files.

-q, --quoteYes | No

-r, --rootRootName

New file name is generated using the root: <Root>.<Ext>. Default for new file names: <SDFileName><AtomTypesFP>.<Ext>. The file type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values are used for SD, FP, comma/semicolon, and tab delimited text files, respectively.This option is ignored for multiple input files.

To generate MMFF94 atom types count fingerprints of arbitrary size in vector string format and create a SampleATFP.csv file containing compound ID using combination of molecule name line and an explicit compound prefix along with fingerprints vector strings data, type:

COPYRIGHT

Copyright (C) 2019 Manish Sud. All rights reserved.

This file is part of MayaChemTools.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.