New VGNC gene nomenclature for cow, dog and horse!

We are excited to announce that, in addition to chimpanzee gene nomenclature, the VGNC website is now populated with gene symbols for cow, dog and horse. The VGNC initially uses a software pipeline based on the HGNC Consensus Orthology Predictions (HCOP) tool to transfer human gene symbols and names automatically to genes in other species where the same orthologs are predicted by four different resources (Ensembl, NCBI Gene, OMA and PANTHER). As you might expect, chimp is still the winner in terms of number of gene symbols (14846) as this includes manually curated non-consensus orthologs we have approved over the past year, but the other three species are not too far behind: cow = 11965 symbols; dog = 11474 symbols; horse = 10552 symbols.

Nobody should ever be confused by which species they are looking at when using vertebrate.genenames.org as we have handy species graphics that both accompany our search results and are present at the top right of every Symbol Report. VGNC Symbol Reports display the nomenclature, the unique VGNC ID, details on the species, links to the equivalent gene report in Ensembl and NCBI Gene, and links to the human ortholog on genenames.org. Where available there are also links to orthologs of the gene in the other three VGNC species. For an example, please see the chimp CDH7, cow CDH7, dog CDH7 and horse CDH7 Symbol Reports. Cow genes have an additional link to gene reports in the Bovine Genome Database where these are available.

The gene search facility on vertebrate.genenames.org works in the same way as the search on genenames.org but the most important difference is that the VGNC site allows filtering by species as part of the facets provided on the left-hand side of the search results. A good example search is for ABCA4. You can also browse the entire VGNC dataset using the gene data tab which provides links to Symbol Reports in the same format as the search results with the same species filters. We also provide statistics and downloads for the full dataset of each species. You simply select the species of interest from the dropdown box and you also have the option of filtering by chromosome. Data can be downloaded either as a text or JSON file. There is also one file that provides the complete VGNC dataset for all available VGNC species.

The next step for cow, dog, and horse nomenclature will be for curators to go through data manually where the HCOP orthology predictions were not entirely consistent, and for cases where there is not a one-to-one ortholog between human and the other species, as we have already been doing for chimp. This will take us some time! We also plan to add nomenclature for further vertebrate species using the automated VGNC software pipeline. Our criteria for choosing further species are the quality of the genome assembly and annotation, the perceived value as a research organism, and the level of support from the scientific community. Please contact us at vgnc@genenames.org with feedback on the gene nomenclature or the functionality of the VGNC website.

Renaming of placeholder C#orf symbols

We still have over 350 human protein-coding genes named with a C#orf$ symbol, where # represents the chromosome on which the gene is located and $ is the next number in a numerical series. These symbols are assigned to genes where there is no identified function, family member, clear named ortholog, or predicted structural information for the gene product at the time of naming. Over time such information may become available which allows us to perform a rename. This renaming of placeholder C#orf symbols is important for two of our current major project aims: transfer of data across vertebrate species, as described above, and stabilisation of symbols to support work in the clinical community. Therefore C#orf renaming is one of our core priorities.

We have changed over 10% of C#orf symbols since May, a total of 37. One example is the gene previously known as C4orf26 which was highlighted to us as being clinically relevant on separate occasions by staff at the TGMI and Genomics England projects due to its association with Amelogenesis imperfecta, type IIA4; following consultation with researchers who have published work on the gene we have been able to rename this as ODAPH for ‘odontogenesis associated phosphoprotein’. Other examples of renames based on publications include C14orf159 to DGLUCY, D-glutamate cyclase; C19orf43 to TRIR, ‘telomerase RNA component interacting RNase’; C14orf80 to TEDC1, tubulin epsilon and delta complex 1 and C16orf59 to TEDC2, tubulin epsilon and delta complex 2. Examples of genes renamed based on newly identified family membership are CXorf23 which is now BCLAF3 for ‘BCLAF1 and THRAP3 family member 3’ and C17orf74 which was renamed to SPEM2 for ‘SPEM family member 2’. If you have, or know of, any data that would help us to rename any C#orf symbols please email us at hgnc@genenames.org.