Figure 1.

Pseudogene annotation flowchart. A flowchart to describe the GENCODE pseudogene annotation procedure and the incorporation
of functional genomics data from the 1000 Genomes (1000G) project and ENCODE. This
is an integrated procedure including manual annotation done by the HAVANA team and
two automated prediction pipelines: PseudoPipe and RetroFinder. The loci that are
annotated by both PseudoPipe and RetroFinder are collected in a subset labeled as
'2-way consensus', which is further intersected with the manually annotated HAVANA
pseudogenes. The intersection results in three subsets of pseudogenes. Level 1 pseudogenes
are loci that have been identified by all three methods (PseudoPipe, RetroFinder and
HAVANA). Level 2 pseudogenes are loci that have been discovered through manual curation
and were not found by either automated pipeline. Delta 2-way contains pseudogenes
that have been identified only by computational pipelines and were not validated by
manual annotation. As a quality control exercise to determine completeness of pseudogene
annotation in chromosomes that have been manually annotated, 2-way consensus pseudogenes
are analyzed by the HAVANA team to establish their validity and are included in the
manually annotated pseudogene set if appropriate. The final set of pseudogenes is
compared with functional genomics data from ENCODE and genomic variation data from
the 1000 Genomes project.