The teleost D. rerio (zebrafish) due to its genomic, genetic and embryological properties is an optimal animal model to study human diseases at a cellular and molecular level. To approach this systematically, we have defined the zebrafish proteins that are the best candidates for the generation of zebrafish animal models that could mimic human genetic diseases and susceptibilities.

We gathered the human protein sequences associated to at least one disease or susceptibility to disease, from the RefSeq protein database, using information from the Online Mendelian Inheritance in Man (OMIM) database (morbidmap). These proteins were then used to identify their respective orthologues in zebrafish. The resulting information was compiled, characterised and linked with other zebrafish information such as in-situ and mutant allele data to generate the ZF Human Disease Database (ZF-HDD). The zebrafish mutant allele data was used to validate this systematic approach by demonstrating that previously characterised candidate zebrafish proteins/genes have been used to model human genetic conditions.

For the 2465 human conditions with an associated human sequence, we found that 1608 (65%) have at least one candidate zebrafish protein. In total, there are 3580 zebrafish proteins that are orthologue to one of the 1003 human proteins associated to a genetic condition. We demonstrate that the difference in numbers between the human and zebrafish proteins, associated to genetic conditions, is due to positive selection pressure to preserve as duplication these proteins in zebrafish.