Glossary

Ancestry Informative Marker (AIM): AIMs are the subset of genetic markers that differ in allele frequencies across different populations of the world. Most variations are shared among populations, so for most loci the most common allele is the same in each population. AIMs may be used to categorize individuals into populations sharing similar markers and perhaps phenotypes (such as self-declared ethnicity).

Allele: Traditional definition: alternate forms of a gene, composed of one or more SNPs. More loosely: a SNP. For example, at a given position along a chromosome, most people might have the DNA base "A". A few might have an alternative sequence. Each defined type is an allele.

Autosomal Dominant Disease: An inherited disease based on mutations in the autosomal chromosomes (those other than the X and Y), where only one non-working allele leads to the disease. Each child of a parent carrying such an allele has a 50% chance of getting that allele (and the disease). Such disease may only show symptoms in later life; an example of an autosomal dominant condition is Huntington disease.

Autosomal Recessive Disease: An inherited disease based on mutations in the autosomal chromosomes (those other than the X and Y), where two non-working alleles lead to the disease. Each child of two carrier parents has a 25% chance of getting both non-working alleles (and the disease). An example of an autosomal recessive disease is cystic fibrosis.

Biostatistics: The statistics involved in medical and epidemiological studies, which now includes many genomic studies. Medpage has an excellent Guide to Biostatistics.

Carrier: An individual who has a single copy of a disease-causing variant but who typically doesn't show any signs of that disease. Generally, for classic Mendelian diseases with recessive inheritance, if a mother and father are both carriers for a given disease, each child has a 1 in 4 chance of inheriting both disease-associated alleles (one from each parent) and having the disease.

CEU: A grouping of DNA samples intended to generally represent Europeans for the purposes of establishing how common a given SNP is, the CEU samples are from CEPH Utah residents with ancestry from northern and western Europe. See also population diversity help page.

Chromosome: The physical units of heredity; long linear strands of DNA. Humans normally have 46 chromosomes (23 inherited from Dad, 23 from Mom).

CI: Confidence Interval. The range of possible values within which the true value is expected to lie at a particular level of confidence. If not stated, the level of confidence is usually 95% certainty.

Clinical Significance: This attribute of a SNP is generally imported from ClinVar, either assigned by the submitter or by dbSNP or OMIM. The most common value is probable-pathogenic; the other possibilities are pathogenic, non-pathogenic, probable-non-pathogenic, drug-response, histocompatibility, untested and other. Further definition is provided in PMC3075918, especially Table 3.

CNV: Copy Number Variant. Relatively large parts of a chromosome present in more copies (if duplicated), or less copies (if deleted), than normal. See Wikipedia for more details.

Compound heterozygote: When two different SNPs are carried by the paternal and maternal chromosomes for one gene; in other words, the maternally-inherited gene carries a variant SNP at one location, and the paternally-inherited gene carries a variant SNP at a different location.

De novo mutation: A DNA variation that is either germline or somatic but is not readily found in either parent. Germline de novo mutations can be mutations arising in a parental gamete (sperm or egg), which lead post-fertilization to being present in all cells of the offspring. Somatic de novo mutations arise in a somatic cell, and often are therefore found in only a percentage of the cells of that type; they are not passed on to offspring.

Diplotype: What genotype is to allele, diplotype is to haplotype; a specific combination of two haplotypes.

Familial risk: The risk for individuals who's relatives have a disease compared to the risk for individuals who's relatives don't have the disease. It can vary depending on which relative(s) are specified. Occasionally also called 'recurrence risk'. Familial risk is typically higher for more common diseases (like breast cancer) compared to rare diseases.

Gene: An area along a chromosome typically encompassing the information necessary to encode one protein.

Genomics: The study of all the genetic material in a species taken as a whole (well, to the extent possible anyway).

Genome: All of the genetic material in a species. The human genome is approximately 3,300,000,000 base pairs in length and is distributed amongst 23 types of chromosomes (chromosome 1 through 22, in order of size, plus the X and Y chromosome that determine gender.)

Genoset: A defined combination of alleles from 2 or more distinct SNPs.

Genotype: The two alleles inherited at a given SNP position, one inherited from Dad, one inherited from Mom. Example: rs16260(A;C) is how we indicate someone with a (A;C) genotype at SNP rs16260.

Genotype relative risk: The odds ratio of one specified genotype compared to another.

GWAS:Genome-Wide Association Study. A study of the markers (usually SNPs) across the entire genome to see which ones are statistically more or less common in one group of people (often patients with a specific disease) compared to another group of people (typically people unaffected by that disease). The SNPs found in GWAS studies are usually risk factors with relatively low penetrance and are usually not causative mutations. See Wikipedia for more information.

Haploinsufficiency: The term for when having only one functioning copy of a gene isn't enough to avoid a disease or otherwise abnormal phenotype, presumably due to not having enough of the protein being made by the one working copy of that gene. Haploinsufficiency can be caused by mutations or deletions, and can be autosomal or X-linked. The opposite of haploinsufficiency is haplosufficiency, which is when having only one normal copy of a gene is enough to lead to a normal phenotype.

Haplotype: A specific combination of SNPs all occurring on the same chromosome (i.e. all occurring on the chromosome inherited from Dad, or, inherited from Mom). Human autosomal chromosomes normally come in pairs, and the combination in one individual of these two haplotypes is referred to as a diplotype.

Haplotype block: SNPs that are close enough to one another to be inherited together, ultimately indicating degrees of common ancestry. Haplotype blocks effectively define islands of lower meiotic recombination.

Hemizygous: When used in a phrase like, "the patient was hemizygous for the mutation", this indicates that the person was carrying only one allele (in total) for this SNP, usually because the corresponding area of the other (autosomal) chromosome has been deleted. Hemizygosity also can refer to X chromosome alleles in males.

Heritability: The percentage of a trait estimated to be due to the variations you've inherited, as opposed to the contribution of non-genetic influences such as your environment or diet. For example, the heritability of bipolar disorder is estimated to be 70%, which means that genetic variations account for 70% of the overall odds of developing the disorder and non-genetic factors account for 30%.

Heterozygote: As opposed to a homozygote for a given SNP or allele, a heterozygote is a person who inherited different forms from Mom and Dad. Example: at the SNP position known as rs16260, a person who inherits a (C) and an (A) is a rs16260(A;C) heterozygote.

Homozygote: As opposed to a heterozygote for a given SNP or allele, a homozygote is a person who inherits the same form from Mom and Dad. Example: at the SNP position known as rs16260, a person who inherits a (C) from both parents is a rs16260(C;C) homozygote.

Imputation: A statistical process used to predict a person's genotype at an untested chromosomal location, based on the genotypes that were tested in that person plus a knowledge of population haplotypes. In general, imputation yields more accurate predictions for common variants in well-studied populations (e.g. Caucasians), and becomes less accurate in less characterized ethnicities and for increasingly rare variants. By definition, that also means imputation can not predict a de novo mutation.

In/del: While many SNPs are swaps between the four DNA bases (A,C,G and T as abbreviated), extra DNA bases may be present (called "insertions"), or a few DNA bases may be missing ("deletions"). The term in/del refers to an insertion or deletion of this type.

Incidence: The number of newly diagnosed cases of a disease in a given population over a specified time period; for cancer, the annual incidence is usually specified as the number of (new) cases per 100,000 people.

Individual risk: The risk that you will contract a disease by a certain age, and one of the hardest numbers to estimate. A key factor in determining individual risk is the estimated penetrance of the disease susceptibility, which is just a way to say how much influence all the other factors (including all your other genes, and your lifestyle choices, etc) have compared to, say, one specific SNP, allele or genotype. Examples of genes with highly penetrant variations include the BRCA1 and CF genes.

Lifetime Risk: The risk of developing a given condition sometime during one's life. When calculated for a given population as a whole, it is known as average lifetime risk; in contrast to "your lifetime risk", which is calculated by applying relative risk or odds ratios based on your own genotype(s) to the average risk for your population. See also residual lifetime risk.

Locus (pl. loci): The name for a physical position on the genome. Can either refer to a large region such as a complete gene or a very specific region, like a particular base pair position.

Odds ratio (OR): The ratio of the odds within one group compared to the odds within another group, for the association between an allele or genotype with a phenotype (usually a trait or a disease). Typically, carriers of a less common allele or genotype are being compared to people with two copies of the most common allele. An OR of 1.0 means that the DNA variant has no affect on the odds of having the disease, while values above 1.0 indicate a statistical association between that variant and having the disease. OR values below 1 indicate a lower association (risk). Variants with a modest Mendelian effect will have an OR of 3 or greater, and highly penetrant variants can have higher ORs.
Although commonly reported for SNP associations with diseases, be wary of the tendency of the OR to overemphasize the effect of a SNP. This is particularly true in cases where the SNP in question is quite rare in both groups, and/or the group sizes themselves are rather small. See also Odds ratio in Wikipedia.

OMIM:OMIM is the Online Mendelian Inheritance in Man database, an online catalog of human genes and genetic disorders.

P-value According to EMBL-EBI P-value is the probability of an event or outcome in a statistical experiment. The p-value measures the probability that a difference between two experimental conditions happened by chance. The lower the p-value, the more likely it is that the difference between the two conditions is a true reflection of the biological process being studied either than a random phenomenon.

Pathogenic: Capable of causing or leading to disease.

Penetrance: The degree to which you're likely to actually have a given trait or phenotype when you carry a given variation (e.g. SNP). Basically, a variation that causes an outcome with 99% certainty is said to be very highly penetrant (or, causative). Penetrance is considered high if over 80%, and moderate if between 20 - 80%. Many SNPs reported in SNPedia, such as those from GWAS studies, are (only) risk factors, and they have low penetrance (below 20%). See Wikipedia for an expanded definition.

PGx: Pharmacogenomics. The study of the effect of a person's genome on their response to a drug.

PMID:PubMed Identifier. A standard identifier for scientific articles, as indexed by the US National Library of Medicine.

Polymorphism: A sequence difference at a particular position. The alternative forms are called alleles.

Population attributable fraction (or risk): The percentage of cases of a disease in a given population that are theoretically explained by a certain genotype (or cause, such as exposure to a mutagen). To put it another way, this is how many occurrences of a disease wouldn't occur if this genotype (or exposure) didn't exist or were removed from the population. It does not describe the actual proportion of cases observed with the risk factor, nor does it account for the effect of other risks or any interactions between risks. Example: by itself, even the most significant (SNP) genotype found so far for schizophrenia probably accounts for only 1-2% of schizophrenic diagnoses. Also known as the population attributable risk.

Position: The simple answer is that this is the location of a SNP on the chromosome. DNA doesn't come in standard sizes though, so positions are locations in relation to a reference assembly, a particular genome sequence used as a standard in the research community. The current reference standard is called GRCh37 Genome Build 37.1, but most testing services still use the older Genome Build 36 reference, which has different positions. The exception is with mtDNA which more commonly uses the rCRS position standard, which is also the one used in Genome Build 37.1. If that isn't confusing enough, some SNPs are repeated in multiple locations, and other occur at an undetermined position. Another source of confusion is that some start counting base pairs from zero (called 0 Origin) and some start from one (called 1 Origin). SNPedia is converting to using GRCh37 Genome Build 37.1 with 1 Origin, but many SNPs have not been updated yet. In the case of multiple positions, the first one is typically used.

Prevalence: The proportion of people with a given condition at a given time, usually given as a percentage (e.g. 20%) or a ratio (1 in 10,000). For example, the prevalence of obesity among American adults in 2001 was estimated by the CDC to be 21%. See also the Wikipedia definition, which includes how prevalence is different from incidence.

Prior Risk: The likely risk for a condition before test results are known. Also known as a priori risk, this usually is calculated based on some initial knowledge (like gender or ethnicity) and their average risks, and it can be modified by additional information (such as family history or age).

Relative Risk (RR): The ratio of the probability in one group compared to the probability in another group. Although reported less often in SNP studies than odds ratios, the RR is more intuitive (and generally lower). Note that at least one editor of a scientific journal [1] has indicated that when judging whether to publish a paper with a new finding they hope to see an RR of three or more. Note that many SNP publications cited here in SNPedia do not meet this criteria. For a more detailed explanation of RR, see Wikipedia's RR entry.

Residual Lifetime Risk: the risk of developing a given condition taking into account a person's age, in other words, given your age, the odds of developing a condition during the remainder of your lifetime. This is generally calculated as a population average, without taking into account your genotypes. Epidemiological tables often estimate residual lifetime risk in 1 year intervals, and they may also be calculated separately based on year of birth.

Residual Risk: The risk for a given condition that remains even after a negative test result has been received. For example, after being tested and found negative for the most common cystic fibrosis mutations, there's still a chance that a person harbors a rarer (or even unique) mutation that wasn't tested; in Caucasian populations, this residual risk of actually being a carrier yet negative for the common CF mutations is about 1 in 200. Residual risk varies by mutation frequencies as well as by population.

rs#: All this reflects is that the SNP in question has been officially registered and given an (rs) identifier by dbSNP, the largest public database of SNPs, maintained by the National Institutes of Health.

Single Nucleotide Polymorphism (SNP; pronounced snip): A precise position along a chromosome where the DNA of different people may vary. Generally two alternate alleles are found at a particular SNP. The 1000 Genomes Project reported 84.7 million SNPs in the human genome (Nature, October 2015). The SNPs that have medical or health consequences for you and your loved ones are the focus of SNPedia.

Uniparental Disomy: when a person receives two copies of a chromosome, or of part of a chromosome, from one parent and no copy from the other parent. This can result in long runs of homozygosity, which may lead to recessively inherited clinical disorders.

Wild-type: For a given SNP, allele, genotype or gene, the form that was either first discovered or is the most common is considered the reference against which all other forms are compared. This reference form is called the wild-type form.

X-Inactivation: The process by which one of the two X chromosomes in mammalian females is inactivated (silenced) in most cells. Which X chromosome is silenced is considered to be random, but once an X chromosome is inactivated, it will remain inactive throughout the lifetime of the cell and those cell's descendants. Most genes on the inactive X are not expressed (in contrast to the active X), but up to 25% of the genes on the inactive X may escape silencing at some time or in some tissues.

X-Linked Disease: An inherited disease based on mutations in the X chromosome, exhibiting itself in males if the (single) X chromosome carries the defective variation and in females if both X chromosomes carry a defect. Generally this means the mother is an asymptomatic carrier, with each son having a 50% chance of being affected. Examples of such X-linked recessive diseases include Duchenne muscular dystrophy and hemophilia.