DESCRIPTION

This is a module for converting Single Nucleotide Polymorphism (SNP) genotype data to parental allele designations. This helps with creating files suitable for mapping, identifying and characterizing crossovers, and also helps with quality control.

SUBROUTINES/METHODS

BUILD

Since the integrity of the data in the manifest file is absolutely vital,
building an object fails if there are duplicate sample ids in the
manifest file.

ATTRIBUTES

manifest_filename

Name of the file containing information for each sample id
Required in the constructor
The first line contains headers and the remaining lines contain
tab-delimited fields in the following order:
sample id or "Institute Sample Label" (e.g. "WG0096796-DNAA05" )
sample name or "Sample name" (e.g. "B73xB97" )
group name or "Group" (e.g. "NAM F1" )
parentA or "Mother" (e.g. "WG0096795-DNAA01" )
parentB or "Father" (e.g. "WG0096796-DNAF01" )
replicate of or "Replicate(s)" (id of sample that this replicates
e.g. "WG0096796-DNAA05" )
AxB F1 or "F1 of parentA and parentB" (e.g. "WG0096795-DNAA02" )
The last four fields can be blank, if they are not applicable. However,
being blank when they are applicable will result in failure of the
program to analyze the data properly

data_filename

Name of the tab-delimited file containing the data to be processed.
Required in the constructor.
The text '[Data]' in a line indicates that remaining lines are all data.
The next line contains column headers, which are in fact the sample ids.
Sample ids missing from the manifest file will not be processed.
The next line contains the name of the SNP in the first field and data in
the remaining fields.
Data must be in the format of SNP_name{tab}AA{tab}GG{tab}.

OUTPUT FILES

Upon object construction, two files are produced: one that summarizes the
input and another that that describes the genotypes of samples in terms of
their "parents". For example, a sample with a genotype of "CG" whose
'parentA' has a genotype of "CC" and whose 'parentB' has a genotype of
"GG" would have a heterozygous genotype, labeled as 'H'.
Here are the possible allele designations that result:
Allele designations for informative genotypes:
A = parentA genotype
B = parentB genotype
H = heterozygous genotype
Allele designations for noninformative genotypes:
~ = nonpolymorphic parents (i.e. both parents have same genotype)
- = missing data
-- = missing data for at least one parental
% = polymorphic parent
Error codes:
# = conflict of nonpolymorphic expectation, meaning both parents
have the same genotype, but the sample has a different
genotype. For example, parentA and parentB both have the
genotype 'CC', but the sample has a genotype of 'TT'.
! = nonparental genotype, meaning each parent has a different
genotype, but the sample has at least one allele not seen
in either parent. For example, getting 'AG' for the
offspring when the parents have 'GG' and 'TT'.
(This should not even be seen when the data was obtained
from a biallelic assay.)
!! = genotype of the F1 for parentA x parentB is incongruent with
the genotype for parentA
See the bundled tests for examples.

TODO

Output report detailing which samples have been processed and in what way.
Also give descendents and ancestor relationships.
Document ability to process files using F1 and parentA info (i.e. in the
absence of parentB info).
Add simple means of adding map info so that distances and chromosomes are
output along with the marker names.
Give crossover info?
Give introgressions/regions attributable to specific ancestor(s).
Use benchmarking to find out which (if any) to memoize:
_nonredundant_chars
_trim
_is_comprised_from
_sorted_characters
_sort_and_join
_chars_from
_sorted_first_two_char
Test bad file names

DIAGNOSTICS

TODO

CONFIGURATION AND ENVIRONMENT

TODO

DEPENDENCIES

TODO

INCOMPATIBILITIES

TODO

BUGS

Please report any you find. None have been reported as of the current release.

LIMITATIONS

This is ALPHA code. Use at your own risk. There are some major changes that I want to do to it.

As a valued partner and proud supporter of MetaCPAN, StickerYou is
happy to offer a 10% discount on all Custom Stickers,
Business Labels, Roll Labels,
Vinyl Lettering or Custom Decals. StickerYou.com
is your one-stop shop to make your business stick.
Use code METACPAN10 at checkout to apply your discount.