DESCRIPTION

Multiple SDFile names are separated by spaces. The valid file extensions are .sdf and .sd. All other file names are ignored. All the SD files in a current directory can be specified either by *.sdf or the current directory name.

The current release of MayaChemTools supports generation of topological atom triplets fingerprints corresponding to following -a, --AtomIdentifierTypes:

Based on the values specified for -a, --AtomIdentifierType and --AtomicInvariantsToUse, initial atom types are assigned to all non-hydrogen atoms in a molecule. Using the distance matrix for the molecule and initial atom types assigned to non-hydrogen atoms, all unique atom pairs within --MinDistance and --MaxDistance are identified and counted. An atom triplet identifier is generated for each unique atom triplet; the format of the atom triplet identifier is:

The supported aromaticity model names along with model specific control parameters are defined in AromaticityModelsData.csv, which is distributed with the current release and is available under lib/data directory. Molecule.pm module retrieves data from this file during class instantiation and makes it available to method DetectAromaticity for detecting aromaticity corresponding to a specific model.

--CompoundIDDataFieldName or LabelPrefixString

This value is --CompoundIDMode specific and indicates how compound ID is generated.

For DataField value of --CompoundIDMode option, it corresponds to datafield label name whose value is used as compound ID; otherwise, it's a prefix string used for generating compound IDs like LabelPrefixString<Number>. Default value, Cmpd, generates compound IDs which look like Cmpd<Number>.

Examples for DataField value of --CompoundIDMode:

MolID
ExtReg

Examples for LabelPrefix or MolNameOrLabelPrefix value of --CompoundIDMode:

Specify how to generate compound IDs and write to FP or CSV/TSV text file(s) along with generated fingerprints for FP | text | all values of --output option: use a SDFile(s) datafield value; use molname line from SDFile(s); generate a sequential ID with specific prefix; use combination of both MolName and LabelPrefix with usage of LabelPrefix values for empty molname lines.

For MolNameAndLabelPrefix value of --CompoundIDMode, molname line in SDFile(s) takes precedence over sequential compound IDs generated using LabelPrefix and only empty molname values are replaced with sequential compound IDs.

This is only used for CompoundID value of --DataFieldsMode option.

--DataFields"FieldLabel1,FieldLabel2,..."

Comma delimited list of SDFiles(s) data fields to extract and write to CSV/TSV text file(s) along with generated fingerprints for text | all values of --output option.

This is only used for Specify value of --DataFieldsMode option.

Examples:

Extreg
MolID,CompoundName

-d, --DataFieldsModeAll | Common | Specify | CompoundID

Specify how data fields in SDFile(s) are transferred to output CSV/TSV text file(s) along with generated fingerprints for text | all values of --output option: transfer all SD data field; transfer SD data files common to all compounds; extract specified data fields; generate a compound ID using molname line, a compound prefix, or a combination of both. Possible values: All | Common | specify | CompoundID. Default value: CompoundID.

--FingerprintsLabeltext

-h, --help

Print this help message.

-k, --KeepLargestComponentYes | No

Generate fingerprints for only the largest component in molecule. Possible values: Yes or No. Default value: Yes.

For molecules containing multiple connected components, fingerprints can be generated in two different ways: use all connected components or just the largest connected component. By default, all atoms except for the largest connected component are deleted before generation of fingerprints.

-o, --overwrite

Overwrite existing files.

-q, --quoteYes | No

-r, --rootRootName

New file name is generated using the root: <Root>.<Ext>. Default for new file names: <SDFileName><TopologicalAtomTripletsFP>.<Ext>. The file type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values are used for SD, FP, comma/semicolon, and tab delimited text files, respectively.This option is ignored for multiple input files.

Triangle distance inequality test implies that distance or binned distance between any two atom pairs in an atom triplet must be less than the sum of distances or binned distances between other two atoms pairs and greater than the difference of their distances.

To generate topological atom triplets fingerprints corresponding to bond distances from 1 through 10 using atomic invariants atom types in IDsAndValuesString format and create a SampleTATFP.csv file containing compound ID using combination of molecule name line and an explicit compound prefix along with fingerprints vector strings data, type:

COPYRIGHT

Copyright (C) 2019 Manish Sud. All rights reserved.

This file is part of MayaChemTools.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.