XiaoshenGuoBGIguoxs@genomics.cnguoxs@genomics.cnSupporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes" Next generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data, due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low frequency and novel variants. Although whole exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole genome sequencing data is limited for any population, and a large amount of low-frequency, population-specific variants remains uncharacterized.
We have performed whole genome sequencing at high depth (~80X) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genome Project samples, including 45 North Han Chinese and 45 South Han Chinese samples. 83 of these 90 have not been sequenced by the 1000 Genomes Project. We have identified 12,568,804 single nucleotide polymorphisms, 2,074,734 short InDels and 26,142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7,007,685 novel variants with low frequency (defined as minor allele frequency < 5%), including 5,816,839 SNPs, 1,172,919 InDels, and 17,927 structural variants.
Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, this Han Chinese deep sequencing data enhances characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement for the 1000 Genomes Project, as well as for other human genome projects.
TianmingLanHaoxiangLinWenjuanZhuLaurentChristianAsker Melchior TellierMengchengYangXinLiuJunWangJianWangHuanmingYangXunXuXiaosenGuoGenomic2no_image.pngU.S. Central Intelligence Agencyderivative workPublic domainhttps://commons.wikimedia.org/w/index.php?curid=12218997GigaDB1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100302/https://www.protocols.io/widgets/protocol/snp-indel-calling-grkbv4whttps://www.protocols.io/widgets/protocol/soapdenovo-genome-assembly-gr3bv8nhttps://www.protocols.io/widgets/protocol/structure-variation-detection-gr4bv8whttps://github.com/HaoxiangLin/WGS_of_Han_Chinese_genomes10.1093/gigascience/gix067http://www.ncbi.nlm.nih.gov/bioproject/PRJEB11005Shenzhen Municipal Government of ChinaunknownCXB201108250094AkeywordHigh-coverage Whole-genome SequencingkeywordHan Chinese genomeskeywordDenovo assemblykeywordGenetic variationsZhongKuiXiaBGI Shenzhenxiazhongkui@genomics.cnxiazhongkui@genomics.cnData and analysis of diet-induced and obesity-associated alterations of gut microbiota of 129S6/Sv and C57BL/6J mice It is well known that the microbiota of high fat (HF) diet-induced obese mice differs from that of lean mice, but to what extent this difference reflects the obese state or the diet is unclear. To dissociate changes in the gut microbiota associated with high HF feeding from those associated with obesity, we took advantage of the different susceptibility of C57BL/6JBomTac (BL6) and 129S6/SvEvTac (Sv129) mice to diet-induced obesity and of their different responses to inhibition of cyclooxygenase (COX) activity, where inhibition of COX activity in BL6 mice prevents HF diet-induced obesity, but in Sv129 mice accentuates obesity. <br />Using HiSeq-based whole genome sequencing we identified taxonomic and functional differences in the gut microbiota of the two mouse strains fed regular low fat or HF diets with or without supplementation with the COX-inhibitor, indomethacin. <br /> Here we present the sequence assemblies and annotations for those 54 samples, together with the gene catalogue and relevative abundance levels of both genes and OTUs. It is hoped these data can be used for comparison in future studies of a similar design.XiaopingLiQinHaoManimozhiyanArumugam0000-0002-0886-9101LiangXiaoDongyaZhangZhongkuiXiaChuanLiuZhiweiFangTineRaskLichtLiseMadsenJianfengZhangYingruiLiQiangFengKarstenKristiansen0000-0002-6024-0917JunWangJunhuaLiSiBraskSonneNingChenEvenFjæreLisaKoldenMidtbøMurielDerrienFloorHugenholtzUllaBirgitteVogelAlicjaMortensenMichielWageningenUR KleerebezemMetagenomic3100114.pngMouse Gut MetagenomePublic DomainBGIBGI734003200ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100271/10.1186/s40168-017-0258-628390422http://www.ebi.ac.uk/ena/data/view/ERP011540Danish National Research Foundationhttp://dx.doi.org/10.13039/501100001732Carlsbergfondethttp://dx.doi.org/10.13039/501100002808National Institute Of Nutrition And Seafood ResearchunknownDanish Strategic Research Councilhttp://dx.doi.org/10.13039/10000739811-116163Shenzhen Municipal Government of ChinaunknownCXZZ20150330171521403Shenzhen Municipal Government of ChinaunknownJCYJ20140418095735538keywordC57BL/6J micekeyword129S6/Sv micekeywordObesitykeywordHigh fat feedingkeywordMicrobiotakeywordIndomethacinHuijueJiaBGI Shenzhenjiahuijue@genomics.cnjiahuijue@genomics.cnSupporting data for "Shotgun Metagenomics of 250 Adult Twins Reveals Genetic and Environmental Impacts on the Gut Microbiome" The first large cohort of shotgun sequenced dataset for the gut microbiome in twins, including 250 adult twins in the TwinsUK registry, leading to an updated reference gene catalog of 11.4 million genes. Heritability was identified in microbial taxa and potential functions, and sharing of microbial single-nucleotide polymorphisms (SNPs) between twins were demonstrated, all underscoring the value of twins for the understanding of the genetic and environmental underpinnings of the gut microbiome.HailiangXieRuijinGuoHuanziZhongQiangFengZhouLanBingcaiQinKirstenJWardMatthewAJackson0000-0002-7891-6217YanXiaXuChenBingChenHuihuaXiaChangluXuFeiLiXunXuJumanaYousufAl-AamaHuanmingYang0000-0002-0858-3410JianWangKarstenKristiansen0000-0002-6024-0917JunWangClaireJStevesJordanaTBell0000-0002-3858-5986JunhuaLiTimothyDSpectorHuijueJia0000-0002-3592-126XMetagenomic3no_image.jpgimage made from free clipartpanda.com imagesPublic domainGigaDBGigaDB28991029248ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10025310006410.1016/j.cels.2016.10.004http://www.ncbi.nlm.nih.gov/bioproject/PRJEB9576SuishaLiangBGI Shenzhenliangsuisha@genomics.cnliangsuisha@genomics.cnA catalogue of the pig gut microbiome The pig is a major species for livestock and biomedicine. We established a comprehensive catalogue of gut microbial genes based on faecal samples of 287 pigs from France, Denmark and China. 7.7 million non-redundant genes representing 719 metagenomic species were identified by deep metagenome sequencing. The pig and human catalogues share 12.6 % and 9.3 % of their genes, respectively, a higher proportion than that shared between the mouse and human catalogues. Importantly, 78 % and 96% of the functional pathways are shared, underscoring the potential use of pigs for biomedical research. The pig gut microbiota is influenced by gender, age and breed. Analysis of the prevalence of antibiotic resistance genes confirmed the efficiency of eliminating antibiotics from animal diet to reduce the risk of dissemination associated with farming systems. The pig microbiome gene catalogue provides a useful resource for metagenomics-based research in biomedicine and for sustainable knowledge-based pig farming.HuijueJia0000-0002-3592-126XEdiPriftiLiangXiaoKarstenKristiansen0000-0002-6024-0917SuishaLiangJunhuaLiXunXuZhongkuiXiaChuanLiuLiseMadsenQiangFengJunWangJordiEstelléPiaKiilerichYuliaxisRamayo-CaldasAnniOyanPedersenNielsJørgenKjeldsenEmmanuelleMaguinJoëlDoréNicolasPonsEmmanuelleLeChatelierStanislavDEhrlichClaireRogel-GaillardMetagenomic3100187.jpgThe pig-gut microbiomepublic domainBGI ShenzhenBGI Shenzhen4294967296ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100187/10.1038/nmicrobiol.2016.161http://www.ncbi.nlm.nih.gov/bioproject/PRJEB11755YongpingCuiTranslational Medicine Research Center, Shanxi Medical University, Taiyuan, Shanxi 030001,
Chinacuiy0922@yahoo.comcuiy0922@yahoo.comSupporting data for "Genomic analyses revealFAM84B and the NOTCH pathway are associated with the progression of esophageal squamous cell carcinoma". Esophageal squamous cell carcinoma (ESCC) is the sixth most lethal cancer worldwide and the fourth most lethal cancer in China. Although copy number alterations and somaticpoint mutations associated with the development of ESCC have been identified by array-based technologies and genome-wide studies respectively. The genomic characterization of ESCC from different stages of the disease has not been explored and is likely to reveal additional oncogenic mechanisms. Here we have performed either whole-genome sequencing or whole-exome sequencing on 51 stage I and 53 stage III ESCC patients to characterize the genomic alterations that occur during the various clinical stages of ESCC, and further validated these changes in 36 atypical hyperplasia samples. <br />
Due to the sensitive nature of this dataset it is being hosted in the secure restricted access database European Genome-Phenome Archive at the EBI. It has been assigned the accession number EGAD00001001487. <br />
To gain access to this dataset you will need to apply for permission from the Shanxi Medical University and BGI Data Access Committee (DAC).
There are two forms available to download from GigaDB FTP server (below), both should be completed and emailed to <a href="mailto:cuiy0922@yahoo.com?cc=nina_cui@hotmail.com&amp;subject=Request to access EGAD00001001487&amp;body=Dear%20Dr%20Yongping%20Cui,"> Dr Yongping Cui</a>, who is the named representative of the combined Shanxi Medical Uni and BGI DAC. <br />
After sending the forms to the DAC you will be contacted either by the DAC to decline your application or from the EGA with login details if your application is approved. This process can take several days.<br />YanyanZhangJuanWangBinYangYingruiLiJunWangCaixiaChengHeyangCuiLingZhangZhiwuJiaBinSongFangWangYongZhouYaopingLiJingLiuJiaqianWangPengzhouKongRuyiShiYanghuiBiZhenxiangZhaoXiaolingHuJieYangChantingHeZhipingZhaoJinfenWangYanfengXiEnweiXuGuodongLiShipingGuoYunqingChenXiaofengYangXingChenJianfangLiangJianshengGuoXiuqingZhangHuanming YangXiaolongChengChuanguiWangQiminZhanYongpingCuiGenomic2Imaging7no_image.jpgTMA dataCC0Cui,Y et al 2016Cui, Y125000ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100181/10.1186/s13742-015-0107-0https://www.ebi.ac.uk/ega/studies/EGAS00001001487keywordesophageal squamous cell carcinomakeywordcancerkeywordexomekeywordsequencingSiyangLiuBGIliusiyang@genomics.cnliusiyang@genomics.cnAsmVar: tools and exemplar data. Comprehensive characterization of genomic variation in a human individual is important for understanding disease and for development of personalized approaches to treatment. Many tools exist for identification of single nucleotide polymorphism (snps), small indels and large deletions based on DNA re-sequencing strategy. However, those approaches consistently display significant bias for recovery of complex structural variants and novel sequence in the individual genomes and lack sequence interpretation such as ancestral state and mechanism. Here we present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variants and novel sequence in population-scale de novo assemblies at single nucleotide resolution. Our approach displays good scalability and makes it applicable for investigations in large population studies of species with complex genomes, such as homo sapiens. Application of AsmVar to several human de novo assemblies captures a wide spectrum of structural variants and novel sequences present in the human population with high sensitivity and specificity. Our method provides a direct solution to investigate the structural variations and novel sequences from <em>de novo</em> assemblies, which is important for construction of population-scale pan genome. Our study also suggests the advantages of the <em>de novo</em> assembly strategy for definition of genome structure. <br /> This software has been released under the
<a href="http://opensource.org/licenses/MIT" target=blank> MIT License <a/> Copyright 2014-2015.
<br />SimonRasmussenOleLundJihuaSunTorbenHansenSorenBrunakOlufPedersenShujiaHuangAndersAlbrechtsen0000-0001-7306-031XLarsBolund0000-0003-4165-1531HongzhiCaoXiaosenGuoRamneekGuptaKarstenKristiansen0000-0002-6024-0917AndersKrogh0000-0002-5147-6282NingLiJunWang0000-0002-1422-3331XunXuChenYeYuqiChangSiyangLiuJunhuaRaoWeijianYeThe Genome Denmark ConsortiumMikkelHSchierupPalleVillesenThorkildI.ASoerensenAndersDBorglumHansEibergEsbenNFlindtRuiqiXuHaoLiuSorenBesenbacherJakobGroveThomasDAlsFrancescoLescaiThomasMailundRuneMFriborgChristianN.S.PedersenShengtingLiLasseMarettyJonasASibbesenJetteBork-JensenChristianTHaveJoseMGIzarzugazaKirstineBellingRachitaYadavSoftware6100173.jpgAsmVar_oval.gifCC0AsmVarAsmVar110100480ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100173/https://github.com/bioinformatics-centre/AsmVar10.1186/s13742-015-0103-4Danish National Advanced Technology Foundationhttp://data.fundref.org/vocabulary/Label-663137Danish National Research Foundationhttp://dx.doi.org/10.13039/501100001732Novo Nordisk UK Research Foundationhttp://dx.doi.org/10.13039/501100000329State Key Development Program for Basic Research of China-973 Programhttp://www.973.gov.cn/English/Index.aspxkeywordStructural variation, de novo assemblyXiaominLiuBGIliuxiaomin@genomics.cnliuxiaomin@genomics.cnSupporting data for "Deep sequencing of human major histocompatibility complex region contributes to studies of complex disease". The human major histocompatibility complex (MHC) has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint causal variants of these associations due to the ex-treme complexity of the region. We thus sequenced the entire 5 Mb MHC region in 20,635 individuals of Han Chinese ethnicity (10,689 controls and 9,946 psoriasis patients) and constructed a Han-MHC da-tabase which included both variants and HLA gene typing results with high accuracy. We further identi-fied multiple independent novel susceptibility loci in HLA-C, HLA-B, HLA-DPB1, BTNL2 and an inter-genic variant, rs118178193 for psoriasis and confirmed the well-established susceptibility locus HLA-C*06:02. These discovered psoriasis-associated loci in MHC region were markedly different from those described in Caucasian population in a recent analysis and highlights the importance of generating population-specific MHC databases for studies of complex disease. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance our understanding of the pathogenesis of these disorders.FushengZhouHongzhiCaoXianboZuoTaoZhangWenjunWangXiaominLiuRicongXuGangChenYuanweiZhangXiaodongZhengXinJinJinpingGaoJunpuMeiYujunShengQibinLiBoLiangJuanShenChangbingShenHuiJiangCaihongZhuXingFanFengpingXuMinYueXianyongYinChenYeCuicuiZhangXiaoLiuLiangYuJinghuaWuMengyunChenXuehanZhuangLiliTangHaojingShaoLongmaoWuJianLiYuXuYijieZhangSuliZhaoYuWangGeLiHanshiXuLeiZengJiananWangMingzhouBaiYanlingChenWeiChenTianKangYanyanWuXunXuZhengweiZhuYongCuiZaixingWangChunjunYangPeiguangWangLeihongXiangXiangChenAnpingZhangXinghuaGaoFurenZhangJinhuaXuMinZhengJieZhengJianzhongZhangXueqingYuYingruiLiSenYangJianjunLiuLennartHammarstromLiangdanSunJunWangXuejunZhangGenomic2no_image.jpgHLA diversity in Han ChineseCC0Liu et al 2015Liu et al 20152147483648ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100156/10.1038/ng.3576JingchuHuBGI-Shenzhenjingchu.hu@genomics.cnjingchu.hu@genomics.cnSupporting data for "Sparse whole-genome sequencing identifies two loci for major depressive disorder". Major depressive disorder (MDD) is one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide. However, due to the high heterogeneity of the disease, no robustly replicated genome loci have been identified. We have collected more than 12000 samples from 45 cities in China collaborated with local hospital. Most of the data were collected from 2007 to 2010 in different batches. Here, we performed low-coverage whole-genome sequencing of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD. The sequencing reads were aligned to human reference genome GRCH37 and an average sequencing depth of 1.7x were achieved. Based on this dataset, we identified about 32 million SNPs and the association of them with MDD was analyzed. The availability of this big dataset will be vey helpful for the genetic studies of other complex trait in Chinese population. For the data available here are bam files storing the mapping result for each samples.TaoJiangGuangbiaoWangQibinLiYuanLiuYingruiLiYaoLuJianWangJunWang0000-0002-1422-3331XunXuHuanmingYang0000-0002-0858-3410YeYinJingSunXiuqingZhangNaCaiTimB.BigdeliWarrenKretzschmarYihanLiJieqinLiangLiSongJingchuHuWeiJinZhenfeiHuLinmaoWangPuyiQianXiangchaoGanMarkReimersToddWebbBrienRileySilviuBacanuRoseannE.PetersonYipingChenHuiZhongZhengrongLiuGangWangHongSangGuoqingJiangXiaoyanZhouYiLiYiLiWeiZhangXueyiWangXiangFangRundePanGuodongMiaoQiwenZhangJianHuFengyuYuBoDuWenhuaSangKeqingLiGuibingChenMinCaiLijunYangDonglinYangBaoweiHaXiaohongHongHongDengGongyingLiKanLiYanSongShuguiGaoJinbeiZhangZhaoyuGanHuaqingMengJiyangPanChenggeGaoKerangZhangNingSunYouhuiLiQihuiNiuYutangZhangTieqiaoLiuChunmeiHuZhenZhangLuxianLvJichengDongXiaopingWangMingTaoXumeiWangJingXiaHanRongQiangHeTiebangLiuGuopingHuangQiyiMeiZhenmingShenYingLiuJianhuaShenTianTianXiaojuanLiuWenyuanWuDanhuaGuGuangyiFuJianguoShiYunchunChenJingfangGaoLanfenLiuLinaWangFuzhongYangEnzhaoCongJonathanMarchiniShenxunShiRichardMottQiXuKennethS.KendlerJonathanFlintGenomic2100155.jpgSorrowing old manCC0WiKiVincent Willem van Gogh524288000ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100155/10.1038/nature1465926176920http://www.ncbi.nlm.nih.gov/bioproject/PRJNA289433SuishaLiangBGI Shenzhenliangsuisha@genomics.cnliangsuisha@genomics.cnThe colorectal adenoma-carcinoma microbiome. Colorectal cancer, a commonly diagnosed cancer in the elderly, often develops slowly from benign polyps called adenoma. The gut microbiota is believed to be directly involved in colorectal carcinogenesis. The identity and functional capacity of the adenoma- or carcinoma-related gutmicrobe(s), however, have not been surveyed in a comprehensive manner. Here we perform a metagenome-wide association study (MGWAS) on stools from advanced adenoma and carcinoma patients and from healthy subjects, revealing microbial genes, strains and functions enriched in each group. An analysis of potential risk factors indicates that high intake of red meat relative to fruits and vegetables appears to associate with outgrowth of bacteria that might contribute to a more hostile gut environment. These findings suggest that faecal microbiome-based strategies may be useful for early diagnosis and treatment of colorectal adenoma or carcinoma.ZhuyeJieXiaopingLiXinLiHuijueJia0000-0002-3592-126XManimozhiyanArumugam0000-0002-0886-9101LiangXiaoJumanaYousufAl-AamaKarstenKristiansen0000-0002-6024-0917SuishaLiangJunhuaLiJianWangJunWang0000-0002-1422-3331XunXuHuanmingYang0000-0002-0858-3410DongyaZhangZhouLanQiangFengAndreasStadlmayrLongqingTangHuihuaXiaXiaoyingXuLiliSuUrsulaHuber-SchonauerDavidNiederseerHerbertTilgChristianDatzMetagenomic3100140.jpgcolon diagramCC0BGIBGI3221225472ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100140/10.1038/ncomms7528http://www.ebi.ac.uk/ena/data/view/ERP008729http://www.ncbi.nlm.nih.gov/bioproject/PRJEB7774XiaYanBGIxiayan@genomics.cnxiayan@genomics.cnSupporting data for the dynamics and stabilization of the Human gut microbiome during the first year of life. Here we performed metagenomic shotgun sequencing on fecal samples from 98 full-term Swedish infants (new born, 4-months and 12-months old) and their mothers; assembled gut microbial genomes and constructed reference gene catalogs from the cohort. We generated 1.52 Tb paired-end reads of high-quality sequences (average 3.99 Gb per sample). A gene catalog was constructed for each time point based on de novo assembly and metagenomic gene prediction; and functionally annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We also assembled a total of 4,356 microbial genomes (>0.9 MB) de novo; by binning assembled contigs according to abundance variations across samples. These de novo assembled genomes were complemented by 1,147 genomes from the National Center for Biotechnology Information (NCBI) Bacteria/Archaea genome database. All genomes were subsequently clustered into 690 unique metagenomic operational taxonomic units (MetaOTUs) that were equivalent to species-level classifications. Of these, 373 were annotated to species, the remaining 317 represent novel species related to known species. We constructed the metaOTUs profile by mapping reads to our metaOTUs sequences.YinLiYangqingPengHuijueJia0000-0002-3592-126XQiangFengLiangXiaoJumanaYousufAl-AamaKarstenKristiansen0000-0002-6024-0917JunhuaLiJunWang0000-0002-1422-3331YeYinDongyaZhangJosefineRoswallPetiaKovatcheva-DatcharyYanXiaHailiangXieHuanziZhongMuhammadTanweerKhanYingShiuanLeeStefanBergmanJovannaDahlgrenFredrikBackhedValentinaTremaroliCamillaColdingDorotaKotowskaLiseMadsenJianfengZhangMetagenomic3Images_199.pnginfantCC-BYBGIBGI6496138035ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10014510.1016/j.chom.2015.04.004ERP005989ZijunXiongChina National GeneBankxiongzijun@genomics.cnxiongzijun@genomics.cnGenomic data from the Tibetan Plateau frog (Nanorana parkeri). <em>Nanorana parkeri</em> (also known as the high Himalaya frog, Xizang Plateau frog, Parker's slow frog, or mountain slow frog) is a common frog living across the Tibetan Plateau. It occurs at elevations ranging from 2,850 to 5,000m. Because this species lives at such high elevations, it provides an additional excellent biological model to study the frog’s adaptations to extreme conditions. <br />
A female frog was collected from the Qinghai-Tibetan Plateau at an elevation of 4,900m, and genomic DNA was extracted from muscle tissue. Paired end DNA libraries with different insert-size lengths (170 bp to 20 kb) were
sequenced on the Illumina HiSeq 2000 platform. After performing filtering steps to remove artificial duplication, adapter
contamination and low-quality reads, 190 Gbp of high-quality data (83× genome coverage) was obtained. This was assembled using SOAPdenovo and SSPACE, producing a final draft assembly of 2.0Gb with an N50 scaffold size of 1.05Mb. More than 20,000 genes were predicted. The <em>Nanorana parkeri</em> genome should help offer new insights into the amphibian evolution and Tibetan high-altitude adaptation.
XueyanZhangShipingLiuJunWang0000-0002-1422-3331ZijunXiongGuojieZhangYapingZhangYanboSunJingCheGenomic2Nanorana_parkeri.pngNanorana parkeripublic domainHerpetological Diversity and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences.Junxiao Yang1480589312ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100132/10.1073/pnas.1501764112http://www.ncbi.nlm.nih.gov/sra?term=SRA151427http://www.ncbi.nlm.nih.gov/bioproject?term=PRJNA243398XunXuBGIxuxun@genomics.cnxuxun@genomics.cnSingle-cell sequencing data using DOP-PCR, MDA and MALBAC whole genome amplification methods. Single-cell sequencing (SCS) provides many biomedical advances but currently relies on whole-genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate oligonucleotide-primed PCR (DOP-PCR) and multiple annealing and looping-based amplification cycles (MALBAC). Here we systematically compare the advantages and disadvantages and performance of each method. To systematically evaluate the SCS performance of commonly used WGA methods, we performed single cell WGA using six commercial kits based on DOP-PCR and MDA and then performed whole genome sequencing (WGS) of the successfully amplified DNA.
A total of 36 single cells were collected in our study, 19 from a lymphoblastoid cell-line (YH cell-line), the rest from a widely-known gastric cancer cell-line, BGC823. Corresponding pooled DNA was extracted as a non-amplified control. BGC823 cell-line was provided by Professor Youyong Lv at Beijing Cancer Hospital. All samples and experimental protocols were approved by the Institutional Review Board of BGI-Shenzhen. JunWang0000-0002-1422-3331YHouKWuXShiFLiLSongHWuMDeanGLiSTsangRJiangXZhangBLiGLiuNBedekarNLuGXieHLiangTWangJChenYLiXZhangHYangXXuLWangGenomic2http://commons.wikimedia.org/wiki/File%3ABlausen_0624_Lymphocyte_B_cell.png3D image of Lymphocyte B-cellCC-BY 3.0Blausen.comBlausen.com staff. "Blausen gallery 2014". Wikiversity Journal of Medicine. DOI:10.15347/wjm/2014.010. ISSN 200187621249988652362ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10011510.1186/s13742-015-0068-3http://www.ncbi.nlm.nih.gov/sra?term=SRP017032http://www.ncbi.nlm.nih.gov/sra?term=SRP050588SuishaLiangBGI Shenzhenliangsuisha@genomics.cnliangsuisha@genomics.cnA Catalogue of the Mouse Gut Metagenome To increase the value of mice models studies, we have used HiSeq2000-based whole genome sequencing to establish a catalogue of 2.6 million non-redundant microbial genes derived from 1,130 gigabases of microbial sequences from faecal samples of 184 mice of different strains and from different providers and housing laboratories. More than 99% of the genes are bacterial indicating that the mouse gut microbiota comprises at least 800-900 prevalent bacterial species.This reference gene catalog was annotated to Non-redundant protein sequences (NR) and Kyoto Encyclopedia of Genes and Genomes (KEGG) and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) databases.XiaopingLiQinHaoHuijueJia0000-0002-3592-126XQiangFengManimozhiyanArumugam0000-0002-0886-9101LiangXiaoJoelDoréS.DuskoEhrlichKarstenKristiansen0000-0002-6024-0917SuishaLiangJunhuaLiJunWang0000-0002-1422-3331DongyaZhangSi BraskSonneZhongkuiXiaXinminQiuHuaLongChuanLiuZhiweiFangJoyceChouJacobGlanvilleTineRaskLichtDonghaiWuJunYuJosephJao YiuSungQiaoyiLiangZhouLanEmmanuelleLe ChatelierJohn CLinFredrikBackhedValentinaTremaroliCamillaColdingDorotaKotowskaLiseMadsenJianfengZhangMetagenomic3Images_181.pngMouse Gut MetagenomePublic DomainBGIBGI4294967296ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100114/10.1038/nbt.3353CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnPhylogenomic analyses data of the avian phylogenomics project. Determining the evolutionary relationships of modern birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neoagnathae and two of the five Palaegnoathae orders, and used the genomes to construct various genome-scale avian phylogenetic trees and perform comparative genomics analyses.<br /> Here we release the datasets associated with the phylogenomic analyses, which include genome scale alignments, gene tree files (including binned supergene trees), species tree files (including timetrees), and loci sequence files including nucleotide, amino acid, indels, and transposable elements. We hope that this resource will serve future efforts in phylogenomics, evolutionary biology, and trait analyses.DavidWBurtPeterHoudeM.ThomasPGilbert0000-0002-5805-7195JasonTHowardErichDJarvis0000-0001-8931-5049CaiLiJunWang0000-0002-1422-3331GuojieZhangSiavashMirarabAndreJAbererSimonYWHoBenoitNabholzClaudiaCWeberRuteRda FonsecaAlonzoAlfaro-Nunez0000-0002-4050-5041NitishNarulaLiangLiuHansEllegrenAlexandrosStamatakisDavidPMindellJoelCracraftEdwardLBraunTandyWarnowAlexanderSuhScottVEdwardsBrantCFairclothGenomic2phylogeny of birdsCC-BYAvian phylogenomics consortiumAvian phylogenomics consortium30777136ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101041/ 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451XiaoshenGuoBGIguoxs@genomics.cnguoxs@genomics.cnGenome sequence of a Mongolian individual We present the genome sequence of a Mongolian male individual. The genome is assembled using short reads produced from the massive parallel sequencing method, resulting in 130.8-fold genome coverage. We identify high-confidence variation sets validated by chip genotyping and PCR-Sanger sequencing, including 3.7 million single nucleotide polymorphisms and 756,234 short insertions and deletions. We assign the paternal inheritance of the individual to the lineage D3a through Y haplogroup analysis and infer Genghis Khan had a common paternal ancestor with Tibeto-Burman populations. We investigate the gene flow between Mongolians and other ethnic groups and demonstrate that the genetic influences on them most likely resulted from the expansion of the Mongol Empire. The Mongolian genome lays a foundation to further understand human evolution and explore population specific genetic causes of diseases/traits in Mongolians and closely related groups.
QiangFengXiaosenGuoWenqiLiJianWangJunWang0000-0002-1422-3331HuanmingYang0000-0002-0858-3410YeYinXiangZhaoShilinZhuHaihuaBaiZiliYangNarisuNarisuJunjieBuJirimutuJirimutuFanLiangYanpingXingDingzhuWangTongdaLiYanruZhangBaozhuGuanXukuiYangDongZhangShuangshanShuangshanZheSuHuiguangWuWenjingLiMingChenBayinnamulaBayinnamulaYuqiChangYingGaoTianmingLanSuyalatuSuyalatuXuYangYujieChenQizhuWuHuanminZhouGenomic2mongolia.jpgMongolian GenomeCC0UnknownUnknown450971566080ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100104/10.1093/gbe/evu242http://www.ncbi.nlm.nih.gov/sra?term=SRS687913HongzhiCaoBGIcaohongzhi@genomics.cncaohongzhi@genomics.cnSupporting material for: De novo assembly of a haplotype-resolved human genome. Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individual’s genetic variation. <br /> We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the <a href="http://www.nature.com/naturebiotechnology." target="_blank" >linked paper</a> for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.HailongYangHonglongWuRuibangLuo0000-0001-9711-6533YinlongXieWeihuaHuangGuangzhuHeQiangFengYuhuiSunHaodongHuangShujiaHuangXinTongLarsBolund0000-0003-4165-1531HongzhiCaoLaurieGoodman0000-0001-9724-5976KarstenKristiansen0000-0002-6024-0917AndersKrogh0000-0002-5147-6282SonggangLiBinghangLiuYingruiLiQiongLuoJianWangJunWang0000-0002-1422-3331GaneKa-ShuWong0000-0001-6108-5560XunXuHuanmingYang0000-0002-0858-3410XiuqingZhangHanchengZhengLaurentCAMTellierJianLiBoLiYuWangFangYangPengSunSiyangLiuPengGaoJingSunDanChenZhengHuangYueLixiaoLiuRadojeDrmanacSnezanaADrmanacGenomic2100013_YH.jpgHan Chinese individualPublic domainWikimedia Commons, en:Image:HanGaozu.jpgUnknown8680428757ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100096/10003810009710031810.1038/nbt.3200HemiLuanHKBUhemi.luan@gmail.comhemi.luan@gmail.comNontargeted metabolomics and lipidomics HPLC-MS data from maternal plasma of 180 healthy pregnant women. Metabolic variations occur during normal pregnancy to provide the growing fetus with a supply of nutrients required for its development and to ensure the health of the woman during gestation. Mass spectrometry-based metabolomics was employed to study the metabolic phenotype variations in the maternal plasma that are induced by pregnancy in each of its three trimesters.<br />
Here we provide the LC-MS data from 180 healthy pregnant women, each individual was followed up to term to make sure that women had normal term pregnancy and healthy babies. All volunteers gave written consent and filled out individual questionnaire at the time of sample collection.<br />
The samples were divided into six sub-groups according to the gestational week of their pregnancy at the time of sampling, T1 (<em>n</em>= 30, 9-12 weeks), T2 (<em>n</em>=30, 13-16 weeks), T3 (<em>n</em>=30, 17-20 weeks), T4 (<em>n</em>=30, 21-24 weeks), T5 (<em>n</em>=30, 25-28 weeks), and T6 (<em>n</em>=30, 29-40 weeks). Body mass index (BMI), age, and gestational week were recorded for each individual.<br />
The repository contains data in 3 modalities: positive and negative ion 'global' non-targeted LC-MS and shotgun lipidomics (including carnitine profiling) LC-MS.
JunWang0000-0002-1422-3331XunXuHuiJiangHemiLuanNanMengPingLiuQiangFengShuhaiLin JinFuXiaominChenRaoWeiqiaoFangChenZongweiCaiMetabolomic8no_image.jpgPCA plot of metabolites, -ve ion modePublic domainGigaDBGigaDB13887504ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10010810014610.1021/pr401068k10.1186/s13742-015-0054-9http://www.ebi.ac.uk/metabolights/MTBLS146XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the soft shell turtle (Pelodiscus sinensis). The soft shell turtle can reach a carapace length of 1 ft (0.30 m). It has webbed feet for swimming. They are called "softshell" because their carapace lacks horny scutes (scales). The carapace is leathery and pliable, particularly at the sides. It is commercially farmed in vast numbers for the food trade. <br /> DNA from the soft shell turtle was collected in Japan. We sequenced the 2.21 Gb genome to a depth of approximately 105.6 X with short reads from a series of libraries with various insert sizes ( 170bp, 500bp, 800bp, 2kb, 5kb, 10kb,20kb and 40kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 221.7 Gb, with the contig and scaffold N50 values of 21.9 kb and 3.33 Mb respectively. We identified 19,327 protein-coding genes with an mean length of ~1500bp. Experimental procedures and animal care were conducted in strict accordance with guidelines approved by the RIKEN Animal Experiments Committee (Approval IDs H14-23 and H16-10).BronwenAkenKathrynBealYanChenDongmingFangPaulFlicek0000-0002-3897-7955JavierHerreroZhiyongHuangNaokiIrieShigehiroKurakuShigeruKurataniChunyiLiQiyeLiJiannanLiuWenqiLiYaoMingYoshihitoNiimuraMasafumiNozawaJuanPascual-AnayaMiguelPignatelliSteveSearleShujiShigenobu0000-0003-4640-2323BoWangJuanWangJunWang0000-0002-1422-3331JunyiWangZhuoWangSimonWhiteZhiqiangXiongYeYinLiliYuAmonidaZadissaGuojieZhangHongyanZhangYuanZhengGenomic2100086.jpgSoft shell turtlepublic domainwikimediaN/A1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100086/Genome 10Khttp://www.genome10k.org/10.1038/ng.261523624526DRA000567DRA000639http://www.ncbi.nlm.nih.gov/bioproject/PRJNA68233XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the watermelon (Citrullus lanatus). Watermelon (<em>Citrullus lanatus</em> is an important cucurbit crop grown throughout the world. The annual world production of watermelon is about 90 million tons, making it among the top five most consumed fresh fruits (http://faostat.fao.org/). <br />We sequenced the 0.425 Gb genome to a depth of approximately 108.6 x with short reads from a series of libraries with various insert sizes ( 100-200 bp, 400 bp, 2 kb, 5 kb, 10 kb and 20 kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 46.18 Gb, with the contig and scaffold N50 values of 26.38 kb and 2.38 Mb respectively. We identified 23,440 protein-coding genes.ShaoguiGuoHongheSunJeromeSalseHaiyingZhangYiZhengLinyongMaoFlorentMuratZhaoliangZhangMingyunHuangYiminXuSilinZhongLukasAMuellerHongZhaoHongjuHeTaoTanErliPangJinganLiuQingheKouWenjuHouXiaohuaZouJiaoJiangGuoyiGongKathrinKleeHeikoSchoofXuesongHuDequanLiangYangXiaMiaoXingHongpingYiMingzhuWuAmnonLeviXingpingZhangJamesJGiovannoniYunfuLiAurelianoBombarelyShanshanDongZhangjunFeiXiaosenGuoByung-KookHamBangqingHuangSanwenHuangYingHuang0000-0002-4364-9323QunHuHanhuiKuangXinmingLiangKuiLinRuiqiangLiWilliamJLucasTianLvJiumengMinPeixiangNiYiRenBoWangJuanWangJunWang0000-0002-1422-3331JunyiWangZhiwenWangKuiWuYongXuYeYinJianguoZhangYanZhangZhonghuaZhangXiangZhaoZequnZhengShanGaoGenomic2100087.jpgWatermelonpublic domainwikimediaN/A408944640ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10008710.1038/ng.247023179023http://www.ncbi.nlm.nih.gov/bioproject/PRJNA72695XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the green sea turtle (Chelonia mydas). Green turtles are long-lived and may take up to 59 years to reach sexual maturity. Undertaking tremendous feats of navigation, adults return to the same beach to breed each season.<br /> DNA from the green sea turtle was collected in Hong Kong. We sequenced the 2.24 Gb genome to a depth of approximately 82.3 X with short reads from a series of libraries with various insert sizes ( 170bp, 500bp, 800bp, 2kb, 5kb, 10kb,20kb and 40kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 180.94 Gb, with the contig and scaffold N50 values of 20.4 kb and 3.78 Mb respectively. We identified 19,633 protein-coding genes with an mean length of 1456 bp.Experimental procedures and animal care were conducted in strict accordance with guidelines approved by the RIKEN Animal Experiments Committee (Approval IDs H14-23 and H16-10).BronwenAkenKathrynBealYanChenDongmingFangPaulFlicek0000-0002-3897-7955JavierHerreroZhiyongHuangNaokiIrieShigehiroKurakuShigeruKurataniChunyiLiQiyeLiJiannanLiuWenqiLiYaoMingYoshihitoNiimuraMasafumiNozawaJuanPascual-AnayaMiguelPignatelliSteveSearleShujiShigenobu0000-0003-4640-2323BoWangJuanWangJunWang0000-0002-1422-3331JunyiWangZhuoWangSimonWhiteZhiqiangXiongYeYinLiliYuAmonidaZadissaGuojieZhangHongyanZhangYuanZhengGenomic2100085.jpgGreen sea turtle underwaterpublic domainwikimediaN/A1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100085/Genome 10Khttp://www.genome10k.org/10.1038/ng.261523624526http://www.ncbi.nlm.nih.gov/sra?term=SRP011574http://www.ncbi.nlm.nih.gov/bioproject/PRJNA104937XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the plum (Prunus mume). The Plum (<em>Prunus mume</em>), was domesticated in China more than 3,000 years ago as ornamental plant and fruit, is one of the first genomes among Prunus subfamilies of Rosaceae to be sequenced.<br />DNA from the plum was collected in Tongmai town, Tibet, China. We sequenced the genome to a depth of approximately 101.4 X with short reads from a series of libraries with various insert sizes ( 180bp, 500bp, 800bp, 2kb, 5kb, 10kb, 20kb and 40kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 28.4 Gb with the contig and scaffold N50 values of 31.8 kb and 577.8 kb respectively. JiaWangQixiangZhangLidanSunFangyingZhaoWeiruYangZhiqiongYuanZhenXingHuitangPanXiaoZhongWenfangShiDongliangDuFengmingSunZongdaXuRuijieHaoYingminLvMingSunLeLuoMingCaiYikeGaoTangrenChengWenbinChenGuangyiFanChangleiHanBangqingHuangXinmingLiangTianLvYeTaoJunWang0000-0002-1422-3331JunyiWangXunXuYeYinZequnZhengGenomic2Genome-Mapping13100084.jpg1909 illustrations by Alois Lunzerpublic domainwikimediaN/A304087040ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100084/10.1038/ncomms2290http://www.ncbi.nlm.nih.gov/sra?term=SRP014658XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the pear (Pyrus bretschneideri). Pear, the third most important temperate fruit species after grape and apple, belongs to the subfamily Pomoideae in the family Rosaceae. The majority of cultivated pears are functional diploids (2n = 34). The Pyrus genus is genetically diverse with thousands of cultivars, but it can be divided into two major groups, Occidental pears (European pears) and Oriental pears (Asiatic pears). <br />We sequenced the genome to a depth of approximately 107 X with short reads from a series of libraries with various insert sizes ( 170bp, 500bp, 800bp, 2kb, 5kb, 10kb, 20kb and 40kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 57 Gb. We identified 42,812 protein-coding genes.JunWuZebinShiRayMingM.AwaisAKhanShutianTaoSchuylerSKorbanHaoWangNancyJChenTakeshiNishioKaijieQiXiaosanHuangYingtaoWangJuyouWuWeiliZhouHaoYinGaihuaQinYuhuiShaYananYangYueSongLeitingLiMeisongDaiChaoGuYuezhiWangXiaoweiWangHupingZhangLiangZengDanmanZhengChunleiWangMaoshanChenGuangbiaoWangLinXieValpuriSoveroShoufengShaWenjiangHuangShujunZhangMingyueZhangJiangmeiSunLinlinXuYuanLiXingLiuQingsongLiJiahuiShenRobertEPaullJeffreyLBennetzenShaolingZhangHuiChenLinCongCaoDengCaiyunGouDaihuShiYeTaoJuanWangJunWang0000-0002-1422-3331JunyiWangZhiwenWangXunXuDongliangZhanShuZhangXiangZhaoShilinZhuGenomic2100083.jpgPair of pearspublic domainwikimediaN/A555745280ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100083Pear Genome Projecthttp://peargenome.njau.edu.cn:8004/10.1101/gr.144311.11223149293http://www.ncbi.nlm.nih.gov/bioproject/PRJNA157875XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the domestic goat (Capra hircus). The domestic goat is one of the most important livestock species in the world, especially in China, India and other developing countries. Goats not only serve as an important source of meat, milk, fiber and pelts, and have fulfilled agricultural, economic, cultural and even religious roles from very early times in human civilization, but also are now used as animal models for biomedical research and transgene production of protein medicines.We would like to share all the genome data of goat. We hope the genome sequence of goat can provide a new resource for biological research and breeding of goat and other small ruminants.<br />
We sequenced the 2.92 Gb genome to a depth of approximately 65.6 X with short reads from a series of libraries with various insert sizes ( 170 bp, 350 bp, 800 bp, 2 kb, 5 kb, 10 kb and 20 kb) on a HiSeq 2000 sequencer.<br />
The assembled scaffolds of high quality sequences total 191.5 Gb, with the contig and scaffold N50 values of 18.7 kb and 2.21 Mb respectively. We identified 22,175 protein-coding genes.
In addition, we also provide the restriction-enzyme fragment maps derived from the whole genome mapping (WGM) technology developed by the Argus System (<a href="http://www.nature.com/nbt/journal/v31/n2/full/nbt.2478.html" target=_blank>method described in this paper</a>).<br />
Scaffolds derived from de novo assembly of next-generation sequencing data are converted into restriction maps by in silico restriction enzyme digestion. Then, the distance between restriction enzyme sites in the sequencing-derived scaffolds are matched to the lengths of the optical fragments in the single-molecule WGM restriction maps. Matches allow the scaffolds to be extended and linked into super-scaffolds.YangDongYuJiangNianqingXiaoXiaoyongDuWenguangZhangGwenolaTosser-KloppChaoBian0000-0001-9904-721XYuxiangLiBertrandServinBrianSayreBinZhuDeaconSweeneyRichMooreYongyiShenRuopingZhaoJinquanLiJamesWomackJamesKijasNoelleCockettShuhongZhaoJingChenWenbinChenThomasFarautYongHouJieLiangXinLiu0000-0003-3256-2940WenhuiNieShengkaiPanJinhuanWangJunWang0000-0002-1422-3331WenWangWenliangWang0000-0002-3463-4189MinXieXunXuShuangYangPengZengGuojieZhangYapingZhangGenomic2Genome-Mapping13100082.jpgDomestic goatpublic domainwikimediaN/A5368709120ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100082/Genome 10Khttp://www.genome10k.org/10.1038/nbt.247823263233http://www.ncbi.nlm.nih.gov/sra?term=SRP012150http://www.ncbi.nlm.nih.gov/bioproject/PRJNA158393XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of Flax (Linum usitatissimum). Flax (<em>Linum usitatissimum</em>) is also known as linseed. It is an ancient crop that is widely cultivated as a source of ﬁber, oil and medicinally relevant compounds. <br />We sequenced the genome to a depth of approximately 69 X with short reads from a series of libraries with various insert sizes ( 300bp, 500bp, 2kb, 5kb and 10kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 25.9 Gb, with the contig and scaffold N50 values of 20.1 kb and 0.7 Mb respectively. We identified 43,484 protein-coding genes.NeilHobsonLeonardoGalindoJoshuaMcDillSimonHawkinsGodfreyNeutelingsRajuDatlaGeorginaLambertDavidWGalbraithChristopherJGrassaArmandoGeraldesQuentinCCronkChristopherCullisPrasantaKDashPolumetlaAKumarSylvieCloutierMichaelKDeyholosAndrewGSharpeDaihuShiJunWang0000-0002-1422-3331ZhiwenWangGaneKa-ShuWong0000-0001-6108-5560LinfengYangShilinZhuGenomic2100081.jpgLinum usitatissimumpublic domainOriginal book source: Prof. Dr. Otto Wilhelm Thomé Flora von Deutschland, Österreich und der Schweiz 1885, Gera, GermanyN/A419430400ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10008110.1111/j.1365-313X.2012.05093.x22757964http://www.ncbi.nlm.nih.gov/bioproject/PRJNA68161XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the domestic donkey (Equus asinus). The domestic donkey is derived from the African Wild Ass. As with most domesticated species there are a wide variety of coat colours and sizes. They can live for upto 40years. <br />We sequenced the genome to a depth of approximately 19.4 X with short reads from a series of libraries with various insert sizes ( 800bp and 2kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 48.6 Gb.AurélienGinolhacDuaneFroeseMathiasStillerMikkelSchubertEnricoCappelliniBentPetersenPhilipLFJohnsonMatteoFumagalliJuliaTVilstrupThorfinnKorneliussenAnna-SapfoSMalaspinasJosefVogtDamianSzklarczykChristianDKelstrupJakobVintherAndreiDolocanJesperStenderupAmhedMVVelazquezJamesCahillXiaoliWangGrantDZazulaAndaineSeguin-OrlandoCecilieMortensenKimMagnussenJohnFThompsonJacoboWeinstockKristianGregersenKnutHRøedVéraEisenmannCarlJRubinDonaldCMillerDouglasFAntczak0000-0003-3336-5818MadsFBertelsenKhaledASAl-Rasheid0000-0002-3404-3397OliverARyderLeifAndersson0000-0001-6173-994XJohnMundyKurtKjærLarsJuhlJensenJesperVOlsenMichaelHofreiterBethShapiroAndersAlbrechtsen0000-0001-7306-031XSørenBrunakM.ThomasPGilbert0000-0002-5805-7195AndersKrogh0000-0002-5147-6282JiumengMinIdaMoltkeRasmusNielsen0000-0003-0513-6591LudovicOrlandoMaanasaRaghavanMortenRasmussenThomasSicheritz-Ponten0000-0001-6615-1141JunWang0000-0002-1422-3331EskeWillerslevGuojieZhangGenomic2100080.jpgEquus asinus at Hannover Zoopublic domainwikimediaN/A728760320ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100080/Genome 10Khttp://www.genome10k.org/10.1038/nature12323http://www.ncbi.nlm.nih.gov/sra?term=SRS431817XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the diploid cotton (Gossypium raimondii). Cotton is one of the most economically important crop plants worldwide. Its fiber, commonly known as cotton lint, is the principal natural source for the textile industry.<br />We have sequenced and assembled a draft genome of <em>G. raimondii</em>, whose progenitor is the putative contributor of the D subgenome to the economically important fiber-producing cotton species Gossypium hirsutum and Gossypium barbadense. <br />We sequenced the 0.78 Gb genome to a depth of approximately 103 X with short reads from a series of libraries with various insert sizes ( 170 bp, 250 bp, 500 bp, 800 bp, 2 kb, 5 kb, 10 kb, 20 kb and 40 kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 78.7 Gb, with the contig and scaffold N50 values of 44.9 kb and 2.3 Mb respectively. We identified 40,976 protein-coding genes with an mean length of 1104 bb.KunboWangFuguangLiWuweiYeGuoliSongHaihongShangChangsongZouQinLiYouluYuanCairuiLuHenglingWeiXueyanZhangKunLiuNanShiRussellJKohelRichardGPercyJohnZYuYu-XianZhuLinCongCaiyunGouChiSongBoWangJunWang0000-0002-1422-3331JunyiWangZhiwenWangYeYinZhenYueShuxunYuZequnZhengShilinZhuGenomic2100079.jpgcotton plantpublic domainwikimediaN/A817889280ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10007910.1038/ng.237122922876http://www.ncbi.nlm.nih.gov/bioproject/PRJNA82769XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the Diamondback moth (Plutella xylostella). How an insect evolves to become a successful herbivore is of profound biological and practical importance. Herbivores are often adapted to feed on a specific group of evolutionarily and biochemically related host plants, but the genetic and molecular bases for adaptation to plant defense compounds remain poorly understood. <br /><em>P. xylostella</em> has become the most destructive pest of economically important food crops, including rapeseed, cauliflower and cabbage. This insect is the first species to have evolved resistance to dichlorodiphenyltrichloroethane (DDT) in the 1950s and to Bacillus thuringiensis (Bt) toxins in the 1990s and has developed resistance to all classes of insecticide, making it increasingly difficult to control. <br />A strain of the diamondback moth (DBM) (Fuzhou-S), <em>P. xylostella</em>, was reared on radish seedlings without exposure to insecticides for 5 years, spanning at least 100 generations. An inbred line was developed by successive single-pair sibling matings. Male pupae were used for genome sequencing.<br />DNA from the diamondback moth was collected in Fuzhou, China. We sequenced the 0.34 Gb genome to a depth of approximately 131.2 X with short reads from a series of libraries with various insert sizes ( 250bp and 500bp libraries per fosmid clone) on a HiSeq 2000 sequencer.<br />The assembled scaffolds have an N50 of 0.7 Mb. We identified 18,071 protein-coding genes.MinshengYouWeiyiHeXinhuaYangGuangYangMiaoXieSimonWBaxterLietteVasseurGeoffMGurrCarlJDouglasJianlinBaiPingWangKaiCuiShiguoHuangXianchunLiQingZhouZhangyanWuQilinChenChunhuiLiuXiaojingLiXiufengXuChangxinLuMinHuJohnWDaveySandyMSmithMingshunChenXiaofengXiaWeiqiTangFushiKeDandanZhengYulanHuFengqinSongYanchunYouXiaoliMaLuPengYunkaiZhengYongLiangYaqiongChenLiyingYuYounanZhangYuanyuanLiuLinFangCaiyunGouGuoqingLiJingxiangLiYadanLuoBoWangJianWangJunWang0000-0002-1422-3331JunyiWangHuanmingYang0000-0002-0858-3410ZhenYueDongliangZhanXinZhouGenomic2100078.jpgPlutella xylostella, Rodà de Bara, Spain, May'13CC-BY-2.0flickrDonald Hobern419430400ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100078Genome 10Khttp://www.genome10k.org/10.1038/ng.2524http://www.ncbi.nlm.nih.gov/bioproject/PRJNA78271http://www.ncbi.nlm.nih.gov/sra?term=SRP006371XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data of the chickpea (Cicer arietinum). Chickpea (<em>Cicer arietinum</em>) is one of the world's most consumed pulses. The Chickpea plant has finely divided leaves, giving it a feathery appearance. The pods are oblong (2 to 3 by 1 to 2 cm) and contain one or two beaked seeds which may be white, yellow, red, brown, or nearly black. They do well in a cool, dry climate and are grown in India as a winter crop. <br /> The draft of the chickpea genome is based on genotype CDC Frontier, a Canadian kabuli chickpea variety.<br /><br /> DNA from the chickpea was collected and e sequenced the 0.74 Gb genome to a depth of approximately 207 x with short reads from a series of libraries with various insert sizes ( 170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb) on a HiSeq 2000 sequencer.<br />The assembled scaffolds of high quality sequences total 153 Gb, with the contig and scaffold N50 values of 645 kb and 39.9 Mb respectively. We identified 28,256 protein-coding genes with an mean coding length of 1,166 bp.ShengYuStevenCannonJongminBaekBenjaminDRosenBunyaminTaranTeresaMillanXudongZhangLarissaDRamsayYingWangWilliamNelsonPooranMGaurCarolSoderlundChunyanXuPeterWinterJamesKHaneNoeliaCarrasquilla-GarciaJanetACondieMing-ChengLuoMahendarThudiCLLGowdaNarendraPSinghJudithLichtenzveigKrishnaKGaliJosefaRubioNNadarajanJaroslavDolezelKailashCBansalGuenterKahlJuanGilKaramBSinghSwapanKDattaSarwarAzamArvindKBhartiDouglasRCookDavidEdwardsAndrewDFarmerWeimingHeAikoIwataScottAJacksonR.VarmaPenmetsaRachitKSaxenaAndrewGSharpeChiSongHariDUpadhyayaRajeevKVarshneyJunWangXunXuGengyunZhangShancenZhaoGenomic2100076.jpgpods of Cicer arietinumpublic domainwikimediaEitan f587202560ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10007610.1038/nbt.2491http://www.ncbi.nlm.nih.gov/bioproject/PRJNA175619XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data for the Tibetan ground tit (Pseudopoces humilis). Available here is the first draft genome sequence of the Tibetan ground tit (<em>Pseudopodoces humilis</em>) also known as Hume’s Groundpecker. This bird is native to the Qinghai-Tibet Plateau (QTP) or ‘the roof of the world’, which has become a focus for many biological studies due to the extreme environmental conditions, and the genetic mechanisms of high-altitude adaptation has never been studied. Controversy also exists regarding the ground tit’s phylogeny – formerly thought to be part of the Corvidae family, recent phylogenetic analysis have determined this to not be the case. <br />A 1.04 Gb assembled draft genome sequence was generated that covered 95.4% of the whole genome. The scaffold N50 and contig N50 values were 16.3 Mb and 164.7 Kb respectively. High accuracy at the sequencing level was ensured, where the average sequencing depth of the ground tit assembly was 96×, and 99% of the assembly had at least 20× coverage. This data contributes to the study of avian evolutionary history and provides new insights into the tit’s adaptation to extreme environmental conditions. YuanyuanHuiYueCaiMeirongHaoJinyangZhaoSongboWangXinmingZhangRongjunHeJinchaoLiuLonghaiLuoQingleCaiCaiyunGouYongshanLangYingruiLiYadanLuoShengkaiPanXiaojuQianJunWang0000-0002-1422-3331ZhaobaoWangJiaohuiXuGenomic2100088.jpgA foraging Hume Ground Titpublic domainwikimediaUnknown1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100088/10.1186/gb-2013-14-3-r292353709710.1186/gb-2014-15-2-r33http://www.ncbi.nlm.nih.gov/bioproject/PRJNA175930http://www.ncbi.nlm.nih.gov/bioproject/PRJNA179234CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Barn owl (Tyto alba). The Barn Owl (<em>Tyto alba guttata</em> (Brehm, 1831)) is the most widespread bird of its kind found almost anywhere in the world except in polar and desert regions, and is also referred to as the Common Barn Owl. Contrary to other owls, the Barn Owl does not hoot but produces an ear-shattering scream-like sound. <br />
These data have been produced as part of the Avian Phylogenomics Project . DNA was collected from a female breed in the laboratory of Eric Knudsen at Stanford, USA (tissue sample provided by Alex Goddard). We sequenced the 1.6Gb genome to a depth of approximately 27X. <br />
The assembled scaffolds of high quality sequences total 1.14G, with the contig and scaffold N50 values of 13K and 51K respectively. We identified 13613 protein-coding genes with a mean length of 13.8kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2tyto_alba.pngBarn owl (Tyto alba).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)369098752ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101039Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212909http://www.ncbi.nlm.nih.gov/sra?term=SRP029349CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Angola turaco (Tauraco erythrolophus). The Red-crested Turaco (<em>Tauraco erythrolophus </em>(Vieillot, 1819)) is a group of African near-passerines endemic to western Angola. It's call has been described to be somewhat like a jungle monkey. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male (leg band LDF577) who lived at the Copenhagen Zoo in Denmark, of an animal tended to by Mads Bertelsen. We sequenced the genome to a depth of approximately 30X. <br />
The assembled scaffolds of high quality sequences total 1.17Gb, with the contig and scaffold N50 values of 18kb and 55kb respectively. We identified 15,435 protein-coding genes with a mean length of 13.2kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2tauraco_erythrolophus.pngAngola turaco (Tauraco erythrolophus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)384827392ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101038Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212908http://www.ncbi.nlm.nih.gov/sra?term=SRP029348CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Yellow-throated sandgrouse (Pterocles gutturalis). The Yellow-throated Sandgrouse (<em>Pterocles gutturalis saturatior </em>(Hartert, 1900)) is a member of the Pteroclididae family (ground dwelling birds restricted to treeless, open country) and can be found in many countries in southern and eastern Africa. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male at the Sharjah Breeding Center in the United Arab Emirates tended by An Pas, of a wild caught animal in Tanzania. We sequenced the genome to a depth of approximately 25X. <br />
The assembled scaffolds of high quality sequences total 1.07Gb, with the contig and scaffold N50 values of 17kb and 49kb respectively. We identified 13,867 protein-coding genes with a mean length of 12.8kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101037.jpgPterocles gutturalis = Pterocles gutturalispublic domainhttp://commons.wikimedia.org/wiki/File:Pterocles_gutturalis.jpgN/A352321536ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101037Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212906http://www.ncbi.nlm.nih.gov/sra?term=SRP029347CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Great-crested grebe (Podiceps cristatus). The Great Crested Grebe (<em>Podiceps cristatus cristatus </em>(Linnaeus, 1758)) is known for its elaborate mating display and is the largest member of the Grebe family in the Old World. In the 19th century, the species was almost hunted to extinction in the UK for its head plumes, which were used to decorate hats and other female attire. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (137837) at the Natural History Museum of Denmark, of a male caught in Denmark. We sequenced the genome to a depth of approximately 30X. <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 13kb and 30kb respectively. We identified 13,913 protein-coding genes with a mean length of 10.4kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2podiceps_cristatus.pngGreat-crested grebe (Podiceps cristatus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)376438784ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101036Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212905http://www.ncbi.nlm.nih.gov/sra?term=SRP029346CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the American Flamingo (Phoenicopterus ruber ruber). The Caribbean Flamingo (<em>Phoenicopterus ruber ruber </em>(Linnaeus, 1758)) is a sub-species of, and commonly referred to as the American Flamingo (Phoenicopterus ruber). The Caribbean Flamingo is found generally throughout the Caribbean and is a non-migratory bird (those that do migrate only move between summer and winter breeding grounds). It is a largely social bird that breeds in huge colonies, and is a homeothermic endotherm - an animal that keeps a consistent temperature that is regulated within its body. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male (leg band GWX) who had lived at the Copenhagen Zoo in Denmark, and was tended to by Mads Bertelsen. We sequenced the 1.24Gb genome to a depth of approximately 33X. <br />
The assembled scaffolds of high quality sequences total 1.14Gb, with the contig and scaffold N50 values of 16kb and 37kb respectively. We identified 14,024 protein-coding genes with a mean length of 11.7kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2phoenicopterus_ruber.pngAmerican Flamingo (Phoenicopterus ruber ruber).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)374341632ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101035Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212904http://www.ncbi.nlm.nih.gov/sra?term=SRP029345CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Great black cormorant (Phalacrocorax carbo). The Great Cormorant (<em>Phalacrocorax carbo sinensis </em>(Blumenbach, 1798)) is a widespread member of the cormorant family and is known by several different names around the world - the Great Black Comorant across the Northern Hemisphere, Black Shag in New Zealand, the Black Cormorant in Australia and the Large Cormorant in India. It is a large black bird that can be distinguished from the Common Shag by its larger size, heavier build and thicker bill. They are mostly silent birds but can produce various guttural noises in their breeding colonies. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (137943) at the Natural History Museum of Denmark, of a male caught in Gedser, Denmark. We sequenced the genome to a depth of approximately 24X. <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 15kb and 48kb respectively. We identified 13,479 protein-coding genes with a mean length of 13.5kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2phalacrocorax_carbo.pngGreat black cormorant (Phalacrocorax carbo).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)375390208ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101034Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212903http://www.ncbi.nlm.nih.gov/sra?term=SRP029344CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the White-tailed tropicbird (Phaethon lepturus). The White-tailed Tropicbird (<em>Phaethon lepturus ascensionis</em> (Mathews, 1915)) is the smallest of three closely related seabirds in the order Phaethontiformes and can be found in the tropical Atlantic, western Pacific and Indian Oceans. For a small bird, this species can travel far across oceans when not breeding. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA from collected from a vouchered tissue sample (135885) from the Natural History Museum of Denmark, of a female caught on Ascension Island, South Atlantic. We sequenced the genome to a depth of approximately 39X. <br />
The assembled scaffolds of high quality sequences total 1.16Gb, with the contig and scaffold N50 values of 18kb and 47kb respectively. We identified 14,970 protein-coding genes with a mean length of 12.7kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2phaethon_lepturus-balearica_regulorum.pngCrowned crane and White-tailed tropicbirdpublic domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)382730240ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101033Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212902http://www.ncbi.nlm.nih.gov/sra?term=SRP029342CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Dalmatian pelican (Pelecanus crispus). The Dalmatian Pelican (<em>Pelecanus crispus</em> (Bruch, 1832)) is a large bird, measuring from 160 -183 cm in length, 9–15 kg in weight and 290–351 cm in wingspan. This species breeds from southern Europe to India and China and is found in lakes, rivers, deltas and estuaries. Unlike with the Great White Pelican, The Dalmatian Pelican is not tied to lowland areas and will move and nest in suitable wetlands at a variety of elevations. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male at the Copenhagen Zoo in Denmark, of an animal tended by Mads Bertelsen, now vouchered (105271) at the Natural History Museum of Denmark. We sequenced the genome to a depth of approximately 34X. <br />
The assembled scaffolds of high quality sequences total 1.17Gb, with the contig and scaffold N50 values of 18kb and 43kb respectively. We identified 14,813 protein-coding genes with a mean length of 11.9kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2pelecanus_crispus.pngDalmatian pelican (Pelecanus crispus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)381681664ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101032Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212901http://www.ncbi.nlm.nih.gov/sra?term=SRP029331CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Kea (Nestor notabilis). The Kea (<em>Nestor notabilis</em> (Gould, 1856)) is known as the cheeky parrot of New Zealand and is a large bird belonging to the superfamily Strigopoidea. It inhabits the alpine regions of the South Island of New Zealand and is known for its intelligence and curiosity - they can solve logical puzzles and can push and pull things in order to get food. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male at the Copenhagen Zoo in Denmark, of an animal tended by Mads Bertelsen, with the contig and scaffold N50 values of 16kb and 37kb respectively. We identified 14,074 protein-coding genes with a mean length of 14.4kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2nestor_notabilis.pngKea (Nestor notabilis).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)348127232ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101031Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212900http://www.ncbi.nlm.nih.gov/sra?term=SRP029311CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Brown mesite (Mesitornis unicolor). The Brown Mesite (<em>Mesitornis unicolor</em> (Des Murs, 1845)) is a ground-dwelling bird native to Madagascar, and is one of three species of the Mesitornithidae family. It is a medium-sized terrestial bird that inhabits humid forests and prefers lower elevations - making it a vunerable population.<br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered sample (345610) of the Chicago Field Museum (via David Williard) in Chicago from a female caught in Fivondronana de Tolagnaro, Toliara, Madagascar. We sequenced the genome to a depth of approximately 29X. <br />
The assembled scaffolds of high quality sequences total 1.1Gb, with the contig and scaffold N50 values of 18kb and 46kb respectively. We identified 15,371 protein-coding genes with a mean length of 11.4kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101030.jpgBrown Mesitepublic domainhttp://commons.wikimedia.org/wiki/File:Mesitornis_unicolor_1849.jpgN/A361758720ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101030Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212899http://www.ncbi.nlm.nih.gov/sra?term=SRP029309CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Northern Carmine bee-eater (Merops nubicus). The Northern Carmine Bee-eater (<em>Merops nubicus nubicus </em>(Gmelin, 1788)) is part of the bee-eater family, Meropidae, and is also known as the Nubian bee-eater. It is a brightly-coloured bird and feeds primarily on bees and other flying insects, such as grasshoppers and locusts. <br />
These data have been produced as part of the Aavian pPhylogenomiscs Pproject. DNA was collected from a female from the Copenhagen Zoo in Denmark, of an animal tended by Mads Bertelsen; now vouchered (137942) in the Natural History Museum of Denmark. We sequenced the genome to a depth of approximately 37X. <br />
The assembled scaffolds of high quality sequences total 1.06Gb, with the contig and scaffold N50 values of 20kb and 47kb respectively. We identified 13,467 protein-coding genes with a mean length of 13kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2merops_nubicus.pngNorthern Carmine bee-eater (Merops nubicus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)348127232ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101029Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212898http://www.ncbi.nlm.nih.gov/sra?term=SRP029278CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Cuckoo roller (Leptosomus discolor). The Cuckoo Roller (<em>Leptosomus discolor discolor </em>(Hermann, 1783)) is the sole bird of the order Leptosomiformes within the superorder Coraciimorphae - that includes kingfishers, rollers and bee-eaters. It is a medium-large bird that inhabits forests and woodlands in Madagascar and the Comoro Islands. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a female at Weltvogelpark, in Walsrode, Germany, of a male tended to by Andreas Frei of the Zoo. We sequenced the genome to a depth of approximately 32X. <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 19kb and 61kb respectively. We identified 14,831 protein-coding genes with a mean length of 13.9kb.
<br />M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2leptosomus_discolor.pngCuckoo roller (Leptosomus discolor).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)376438784ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101028Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212897http://www.ncbi.nlm.nih.gov/sra?term=SRP029206CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the White tailed eagle (Haliaeetus albicilla). The White tailed eagle (<em>Haliaeetus albicilla groenlandicus</em> (Brehm, 1831)) is one of the largest birds of prey belonging to the Accipitridae family and is considered a close cousin of the Bald Eagle. The White-tailed eagle is a large bird with a wingspan that measures 1.78–2.45 m, and is an apex predator, meaning it is a predator with no predators of its own. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered sample (137926) of Natural History Museum of Denmark from a male caught in Greenland. We sequenced the genome to a depth of approximately 26X. <br />
The assembled scaffolds of high quality sequences total 1.14Gb, with the contig and scaffold N50 values of 20kb and 56kb respectively. We identified 13,831 protein-coding genes with a mean length of 14.2kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2haliaeetus_albicilla.pngWhite tailed eagle (Haliaeetus albicilla).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)372244480ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101027Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212896http://www.ncbi.nlm.nih.gov/sra?term=SRP029203CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Red throated loon (Gavia stellata). The Red-throated Loon (<em>Gavia stellata </em> (Pontoppidan, 1763)), also known as the Red-throated Diver, is a migratory aquatic bird that breeds mainly in the Arctic regions and spends winters in the northern coastal waters. The Red-throated loon has a large global population, making it the most widely distributed member of the Gaviidae (Loon or Diver) family. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered sample (137940) of Natural History Museum of Denmark from a male caught in Denmark. We sequenced the genome to a depth of approximately 33X. <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 16kb and 45kb respectively. We identified 13,454 protein-coding genes with a mean length of 13.2kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2gavia_stellata.pngRed throated loon (Gavia stellata).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)373293056ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101026Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212895http://www.ncbi.nlm.nih.gov/sra?term=SRP029187CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Northern Fulmar (Fulmarus glacialis). The Northern Fulmar (<em>Fulmarus glacialis glacialis</em> (Linnaeus, 1761)), also known as Fulmar or Arctic Fulmar, is a highly abundant sea bird and commonly found in north Pacific and north Atlantic subarctic regions. It is a member of the Procellariidae family, which include petrels and shearwaters, and like other petrels, the Northern Fulmar's walking ability is limited but is a very a strong flier. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered sample (137838) of Natural History Museum of Denmark from a male caught in Denmark. We sequenced the genome to a depth of approximately 33X. <br />
The assembled scaffolds of high quality sequences total 1.14Gb, with the contig and scaffold N50 values of 17kb and 46kb respectively. We identified 14,306 protein-coding genes with a mean length of 12.8kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2fulmarus_glacialis.pngNorthern Fulmar (Fulmarus glacialis).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)375390208ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101025Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212894http://www.ncbi.nlm.nih.gov/sra?term=SRP029180CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Sunbittern (Eurypyga helias). The Sunbittern (<em>Eurypyga helias helias</em> (Pallas, 1781)) is the sole member of the family Eurypygidae and is found in Central to South America from Mexico, Peru to Brazil. It has similar morphological features to the Kagu of New Caledonia and is a non-migrant bird that is normally found on the ground scratching for insects. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a female from the Odense Zoo in Denmark, of an animal obtained by Mads Bertelsen. We sequenced the genome to a depth of approximately 33X. <br />
The assembled scaffolds of high quality sequences total 1.1Gb, with the contig and scaffold N50 values of 16kb and 46kb respectively. We identified 13,974 protein-coding genes with a mean length of 12.3kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2eurypyga_helias.pngSunbittern (Eurypyga helias).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)359661568ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101024Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212893http://www.ncbi.nlm.nih.gov/sra?term=SRP029147CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Speckled mousebird (Colius striatus). The Speckled Mousebird (<em>Colius striatus</em> (Gmelin, 1789)) is the largest species of mousebird and named because of its overall mousey-brown colour. The bird can be found from Cameroon to Eritrea and Ethiopia, and southern South Africa. Described as a noisy creature, it is not however, known for its voice, and is a rather social bird often observed feeding in groups. <br />These data have been produced as part of the Avian Phylogenomic Project. DNA was collected from a male specimen at the Copenhagen Zoo in Denmark, tended to by Mads Bertelsen, now vouchered (138165) at the Natural History Museum of Denmark. We sequenced the genome to a depth of approximately 27X. The assembled scaffolds of high quality sequences total 1.08Gb, with the contig and scaffold N50 values of 18kb and 45kb respectively. We identified 13,538 protein-coding genes with a mean length of 12.4kb.<br />M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2colius_striatus.pngSpeckled mousebird (Colius striatus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)357564416ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101023Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212892http://www.ncbi.nlm.nih.gov/sra?term=SRP028965CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Macqueen's Bustard (Chlamydotis macqueenii). The Macqueen's Bustard (<em>Chlamydotis macqueenii </em>(Gray, 1832)) is a large bird that breeds in southwest Asia in deserts and similarly arid sandy areas. It has recently split as a separate species from the Houbara Bustard of the Canary Islands and Africa, and one difference is that the MacQueen's Bustard has a tendancy to wander compared with the Houbara. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male in the Dubai Falcon Hospital (originally wild caught), with blood samples provided by Tom Bailey of the Hospital. We sequenced the genome to a depth of approximately 27X. <br />
The assembled scaffolds of high quality sequences total 1.09Gb, with the contig and scaffold N50 values of 18kb and 45kb respectively. We identified 13,582 protein-coding genes with a mean length of 12.9kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101022.jpgMacQueen's Bustard (MacQueen's Bustard)public domainhttp://commons.wikimedia.org/wiki/File:Chlamydotis_macqueenii_1921.jpgN/A359661568ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101022Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212891http://www.ncbi.nlm.nih.gov/sra?term=SRP028950CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Turkey Vulture (Cathartes aura). The Turkey Vulture (<em>Cathartes aura aura</em> (Linnaeus, 1758)), also known as the Turkey Buzzard in North America, is the most widespread vulture species. It lives in a range of open and semi-open areas, such as shrublands, sub-tropical forests and pastures. The Turkey Vulture is a scavanger, feeding exclusively on carrion - it flies low using its keen eye and sense of smell to detect the gases produced during the decay of dead animals. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a female in the Carolina Raptor Center, in Huntersville, North Carolina, USA (originally wild caught in North Carolina), with blood samples provided by Dave Scott of the Center. We sequenced the genome to a depth of approximately 25X. <br />
The assembled scaffolds of high quality sequences total 1.17Gb, with the contig and scaffold N50 values of 12kb and 35kb respectively. We identified 13,534 protein-coding genes with a mean length of 10.8kb. <br />
M.ThomasPGilbert0000-0002-5805-7195JasonTHowardErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2cathartes_aura.pngTurkey vulture (Cathartes aura).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)380633088ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101021Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212890http://www.ncbi.nlm.nih.gov/sra?term=SRP028913CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Red-legged seriema (Cariama cristata). The Red-legged Seriema (<em>Cariama cristata</em> (Linnaeus, 1766)) is a predatory terrestial bird in the Cariamidae family. It is also known as the Crested Cariama. It is found in the grasslands of Brazil, south of the Amazon to Uruguay and northern Argentina. The Red-legged Seriema has a song described to sound like a cross between a bark of a young dog and a clucking of turkeys. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male (left leg band # 295951) who had lived at the Copenhagen Zoo in Denmark (originally born in Noorder Dierenpark [Emmen Zoo] Holland), of an animal tended to by Mads Bertelsen. We sequenced the 1.5Gb genome to a depth of approximately 24X. <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 17kb and 54kb respectively. We identified 14,216 protein-coding genes with a mean length of 13.7kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2cariama_cristata.pngRed-legged seriema (Cariama cristata).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)375390208ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101020Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212889http://www.ncbi.nlm.nih.gov/sra?term=SRP028884CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Chuk-will's-widow (Antrostomus carolinensis). The Chuck-will's-widow (<em>Antrostomus carolinensis </em>(Gmelin, 1789)) is a nocturnal bird and part of the Caprimulgidae family. It's name is derived from a continuous repetitive song often heard at night, and is often confused with Whippoorwills due to their similar calls and unusual names. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (B-3403) from the Louisiana State University Museum, of a male caught in East Jetty Woods, Cameron Parish, Louisiana, USA. We sequenced the genome to a depth of approximately 30X. <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 17kb and 45kb respectively. We identified 14,676 protein-coding genes with a mean length of 12kb. <br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2antrostomus_carolinensis.pngChuk-wills-widow (Antrostomus carolinensis).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)376438784ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101019Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212888http://www.ncbi.nlm.nih.gov/sra?term=SRP028883CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Javan rhinoceros hornbill (Buceros rhinoceros silvestris). The Javan Rhinoceros Hornbill (<em>Buceros rhinoceros silvestris</em> (Vieillot, 1816)) is a sub species of the Rhinoceros Hornbill (<em>Buceros rhinoceros</em>), one of the world's largest hornbills that has lived in captivity for over 90 years. It is found it lowland and tropical and sub-tropical mountain rainforests in Borneo, Sumatra, Java, the Malay Peninsula, Singapore and southern Thailand. It is also the state bird of the Malaysian state Sarawak. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male who had lived at the Copenhagen Zoo (originally born wild) in Denmark, of an animal tended to by Mads Bertelsen. We sequenced the genome to a depth of approximately 35X. <br />
The assembled scaffolds of high quality sequences total 1.08Gb, with the contig and scaffold N50 values of 14kb and 51kb respectively. We identified 13,873 protein-coding genes with a mean length of 13.5kb.
<br />
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101018.jpgBuceros rhinoceros silvestris - adultpublic domainhttp://commons.wikimedia.org/wiki/File:Buceros_rhinoceros_silvestris_1838.jpgN/A362807296ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101018Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212887http://www.ncbi.nlm.nih.gov/sra?term=SRP028845CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Crowned crane (Balearica regulorum gibbericeps). The Grey Crowned Crane (<em>Balearica regulorum gibbericeps</em> (Reichenow, 1892)) belongs to the Gruidae family of birds and can be found in the dry African savannah south of the Sahara, as well as in the wetter areas and grassy flatlands near lakes and rivers in Uganda and Kenya. The Crowned crane performs an interesting display when breeding that involves dancing, bowing, and jumping, as well as a booming call that involves inflation of a red gular sac on its throat. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a male at the Copenhagen Zoo (originally born Givskud Zoo) in Denmark, of an animal tended to by Mads Bertelsen. We sequenced the 1.45Gb genome to a depth of approximately 33X. The assembled scaffolds of high quality sequences total 1.14Gb, with the contig and scaffold N50 values of 18kb and 51kb respectively. We identified 14,173 protein-coding genes with a mean length of 13.8kb.
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2phaethon_lepturus-balearica_regulorum.pngCrowned crane and White-tailed tropicbirdpublic domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)373293056ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101017Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212879http://www.ncbi.nlm.nih.gov/sra?term=SRP028839CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Bar-tailed trogon (Apaloderma vittatum). The Bar-tailed trogon (<em>Apaloderma vittatum </em>(Shelley, 1882)) belongs to the Trogonidae family of birds and lives in forests at a preferred altitude of around 1600 metres, and who's vocalisations are described "a yelping crescendo". It is found in several African countries - Angola, Burundi, Cameroon, Democratic Republic of the Congo, Equatorial Guinea, and Kenya, to name a few. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (Cat# 140150) from the Natural History Museum of Denmark, of a male collected in the Udzungwa mountains of Tanzania. We sequenced the genome to a depth of approximately 28X. <br />
The assembled scaffolds of high quality sequences total 1.08Gb, with the contig and scaffold N50 values of 19kb and 56kb respectively. We identified 13,615 protein-coding genes with a mean length of 13.5kb.
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2apaloderma_vittatum.pngBar-tailed trogon (Apaloderma vittatum).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)357564416ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101016Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212878http://www.ncbi.nlm.nih.gov/sra?term=SRP028834CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Rifleman (Acanthisitta chloris). The Rifleman (<em>Acanthisitta chloris</em> (Sparrman, 1787)) is an endemic bird to New Zealand and is also known as New Zealand wrens, or Tītipounamu in Maori. It belongs to the Acanthisittidae family, and is one of two surviving species. The Rifleman is named after a colonial New Zealand regiment because its plumage is similar to the military uniform of a rifleman. <br />
These data have been produced as part of the Avian Phylogenomics Project. DNA was collected from an animal found dead after in earthquake in Kaikoura, South Island, New Zealand, with tissue provided by Mike Bunce and Paul Scofield. We sequenced the genome to a depth of approximately 29X.
<br /> The assembled scaffolds of high quality sequences total 1.05 Gb, with the contig and scaffold N50 values of 18kb and 64kb respectively. We identified 14,596 protein-coding genes with a mean length of 13.5kb.
<br />M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101015.jpgRifleman, male. Endemic to New Zealand.public domainhttp://commons.wikimedia.org/wiki/File:RiflemanMaleFemaleBuller.jpgN/A343932928ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101015Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212877http://www.ncbi.nlm.nih.gov/sra?term=SRP028832CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the White throated tinamou (Tinamus guttatus). The White-throated Tinamou (<em>Tinamus guttatus</em> (Pelzeln, 1863)) is a native bird to the Amazon rain forest in Brazil, southern Venezuela, eastern Peru, northern Bolivia, southeastern Colombia and northeastern Ecuador. It belongs to the Tinamidae family, and eats fruit off the ground or low-lying bushes, as well as flower buds, invertebrates, seeds and leaves. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (B-42614) from the Louisiana State University Museum, of a female caught in Loreto, Peru.in the Loreto Department, Peru. We sequenced the genome to a depth of approximately 100X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.05Gb, with the contig and scaffold N50 values of 24kb and 242kb respectively. We identified 15,773 protein-coding genes with a mean length of 14.7kb
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2tinamus_major.pngWhite throated tinamou (Tinamus guttatus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)343932928ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101014Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212876http://www.ncbi.nlm.nih.gov/sra?term=SRP028753CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Ostrich (Struthio camelus australis). The Southern Ostrich (<em>Struthio camelus australis</em> (Gurney, 1868)) is a sub-species of The Common Ostrich (Struthio camelus) and found in southern Africa, south of the Zambezi and Cunene rivers. It is a flightless bird native to Africa and the only living member of the genus, Struthio. The Ostrich is farmed throughout the world for its meat, feathers and leather. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a female at the San Diego Zoo in California by Oliver Ryder, of an animal originally from Botswana, Africa (ISIS ID: 202443). We sequenced the 2.16Gb genome to a depth of approximately 85X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.23Gb, with the contig and scaffold N50 values of 29kb and 3.5Mb respectively. We identified 16,178 protein-coding genes with a mean length of 19.5kb.
<br /><b>Update - 13-AUG-2014 -</b> A new assembly making use of the optical mapping data has been release, this improves the assembled scaffolds N50 to 17.7 Mb, the protein count is unchanged.ORyderM.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2Genome-Mapping13struthio_camelus.pngOstrich (Struthio camelus australis).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)392167424ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101013Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.125345110.1186/s13742-015-0062-9http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212875http://www.ncbi.nlm.nih.gov/sra?term=SRP028745CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Downy Woodpecker (Picoides pubescens). The Downy Woodpecker (<em>Picoides pubescens </em>(Linnaeus, 1766)) is the smallest of its kind in North America. It uses a number of vocalizations, including a short 'pik' call and produces a slow drumming sound while it pecks into trees compared with other North American woodpeckers. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (B-21955) from the Louisiana State University Museum, of a female caught in Marcum Mountain, Cowell County, Montana, USA. We sequenced the genome to a depth of approximately 105X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.17Gb, with the contig and scaffold N50 values of 20kb and 2Mb respectively. We identified 15,576 protein-coding genes with a mean length of 20kb.
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2picoides_pubescens.pngDowny Woodpecker (Picoides pubescens).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)385875968ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101012Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212874http://www.ncbi.nlm.nih.gov/sra?term=SRP028625CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Hoatzin (Opisthocomus hoazin). The Hoatzin (<em>Opisthocomus hoazin</em> (Müller, 1776)) is a pheasant-sized tropical bird found in swamps, forest and mangroves of the Amazon and Orinoco Delta in South America. It is known by other names, such as the Hoactzin, Stinkbird or Canje Pheasant and is notable for having chicks that possess claws on two of their wings. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a female obtained in Lagunas, Venezuela, with the sample obtained by Maria Gloria Dominguez-Bello of the University of Puerto Rico. We sequenced the genome to a depth of approximately 100X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.14Gb, with the contig and scaffold N50 values of 24kb and 2.9Mb respectively. We identified 15,702 protein-coding genes with a mean length of 20kb.
PeterHoudeM.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2opisthocomus_hoazin.pngThe Hoatzin (Opisthocomus hoazin).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)382730240ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101011Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212873http://www.ncbi.nlm.nih.gov/sra?term=SRP028409CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Golden-collared Manakin (Manacus vitellinus). The Golden-collared Manakin (<em>Manacus vitellinus</em> (Gould, 1843)) belongs to the Pipridae family of suboscine passseriform birds. It is commonly found in Colombia, Costa Rica and Panama. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a male obtained in Gamboa, Panama, with the sample obtained by Barney Schlinger of UCLA. We sequenced the genome to a depth of approximately 110X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.12Gb, with the contig and scaffold N50 values of 34kb and 2.5Mb respectively. We identified 15,285 protein-coding genes with a mean length of 18.8kb.
BSchlingerM.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2manacus_vitellinus.pngGolden-collared Manakin (Manacus vitellinus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)374341632ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101010Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212872http://www.ncbi.nlm.nih.gov/sra?term=SRP028393CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Common Cuckoo (Cuculus canorus). The Common Cuckoo (<em>Cuculus canorus canorus</em> (Linnaeus, 1758)), also known as the European Cuckoo, is a widespread summer migrant to Europe and Asia. It is a brood parasite, meaning it lays eggs in other birds' nests; particularly in Dunnock's, Meadow Pipit's, and Eurasian Reed Warbler's nests. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Projects. DNA was collected from a male specimen in Denmark; voucher 137750 in the Natural History Museum of Denmark. We sequenced the genome to a depth of approximately 100X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.15Gb, with the contig and scaffold N50 values of 31kb and 3Mb respectively. We identified 15,889 protein-coding genes with a mean length of 20kb.
<br />M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2cuculus_canorus.pngCommon Cuckoo (Cuculus canorus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)375390208ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101009Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212869http://www.ncbi.nlm.nih.gov/sra?term=SRP028317CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the American Crow (Corvus brachyrhynchos). The American Crow (<em>Corvus brachyrhynchos brachyrhynchos</em> (Brehm, 1822)) is a large passerine bird and part of the Corvidae family. It is commonly found throughout North America, and despite being common and widespread, they are highly susceptible to West Nile Virus. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a female in the Asheboro, North Carolina Zoo, USA, with blood samples provided by Halley Buckanoff of the zoo. We sequenced the 1.26Gb genome to a depth of approximately 80X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.1Gb, with the contig and scaffold N50 values of 24kb and 6.9Mb respectively. We identified 16,562 protein-coding genes with a mean length of 17.9kb.
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2corvus_brachyrhynchos.pngAmerican Crow (Corvus brachyrhynchos).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)351272960ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101008Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212867http://www.ncbi.nlm.nih.gov/sra?term=SRP028286CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Killdeer (Charadrius vociferus). The Killdeer (<em>Charadrius vociferus vociferus </em>(Linnaeus, 1758)) is a medium-sized plover (a wading bird belonging to the subfamily Charadriinae), and often uses a "broken wing act" to distract predators from their nests. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from a vouchered tissue sample (B-66055) from the Louisiana State University Museum, of a female, subspecies peruvianus, caught in Loreto, Peru. We sequenced the genome to a depth of approximately 100X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.2 Gb, with the contig and scaffold N50 values of 32 kb and 3.6Mb respectively. We identified 16,856 protein-coding genes with a mean length of 19.1kb.
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2charadrius_vociferus.pngKilldeer (Charadrius vociferus).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)394264576ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101007Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212867http://www.ncbi.nlm.nih.gov/sra?term=SRP028286CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Chimney Swift (Chaetura pelagica). The Chimney Swift (<em>Chaetura pelagica</em> (Linnaeus, 1758)) is a bird belonging to the swift family Apodidae. Like their namesake, swifts are among the fastest flying birds. The Chimney swift is often described as a sociable species, seldom seen alone. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Projects. DNA was collected from a vouchered tissue sample (B-21727) from the Louisiana State University Museum, of a female caught in Cameron Parish, Louisiana. We sequenced the genome to a depth of approximately 103X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.1Gb, with the contig and scaffold N50 values of 27 kb and 3.8Mb respectively. We identified 15,373 protein-coding genes with a mean length of 19.8kb.
<br />M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2chaetura_pelagica.pngChimney swift (Chaetura pelagica).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)356515840ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101005/Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA202089http://www.ncbi.nlm.nih.gov/sra?term=SRP026688CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Anna's Hummingbird (Calypte anna). The Anna's Hummingbird (<em>Calypte anna</em> (Lesson, 1829)) is a medium-sized hummingbird native to the west coast of North America and is the only North American hummingbird species with a red crown. It belongs to one of the rare groups of vocal learning birds. The bird was named after Anna Massena, Duchess of Rivoli. <br />
These data have been produced as part of the G10K and Avian Phylogenomics Project. DNA was collected from from a female, which was caught in Portland, Oregan, USA by Claudio Mello and Pete Lovell. We sequenced the 1 Gb genome to a depth of approximately 110X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.1Gb, with the contig and scaffold N50 values of 23 kb and 4Mb respectively. We identified 16,000 protein-coding genes with a mean length of 18.5kb.
CVMelloM.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2calypte_anna.pngAnnas hummingbird (Calypte anna).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)356515840ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101004Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA212866http://www.ncbi.nlm.nih.gov/sra?term=SRP028275CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Crested Ibis (Nipponia nippon). The Crested Ibis (<em>Nipponia nippon </em> (Temminck, 1855)) is the only member of the genus Nipponia and also known as the Japanese Crested Ibis (Toki). Originally widespread across China, Japan, Korea, Taiwan and Russia, by 1988 the species had disappeared from nearly all these places, except two surviving breeding pairs in Southern Qinling mountains of Yangxian County, China.<br />
DNA was collected from a female that descended from one of these pairs, belonging to a population now rescued from near extinction. We sequenced the 1.6 Gb genome to a depth of approximately 105X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences totaled 1.17 Gb, with the contig and scaffold N50 values of 22kb and 5.4Mb respectively. We identified 16,756 protein-coding genes with a mean length of 19.4 kb.
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101003.jpgtaxidermy of crested ibispublic domainhttp://commons.wikimedia.org/wiki/File:Nipponia_nippon.pngN/A399507456ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101003Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/s13059-014-0557-110.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA232572http://www.ncbi.nlm.nih.gov/sra?term=SRP035852http://www.ncbi.nlm.nih.gov/bioproject/PRJNA308878http://www.ncbi.nlm.nih.gov/sra?term=SRP068541CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnGenomic data of the Little Egret (Egretta garzetta). The Little Egret (<em>Egretta garzetta</em> (Linnaeus, 1766)) is a small white heron. An adult's average size is 55–65 cm long with an 88–106 cm wingspan, and weighs 350–550 grams. Little egret's are mostly silent birds, but make various croaking and bubbling calls at their breeding colonies and produce a harsh alarm call when disturbed. <br />
These data have been produced as part of a project on deciphering the genomics of near extinction events, with the egret as a control species for the Ibis. DNA from a male in the Southern Qinling Mountains, Yangxian County Reserve, China. We sequenced the genome to a depth of approximately 74X with short reads from a series of libraries with various insert sizes (170bp, 500bp, 800bp, 2kb, 5kb, 10kb and 20kb). <br />
The assembled scaffolds of high quality sequences total 1.2 Gb, with the contig and scaffold N50 values of 24 kb and 3.1Mb respectively. We identified 16,585 protein-coding genes with a mean length of 18.6 kb.
<br />M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2egretta_garzetta.pngLittle egret (Egretta garzetta).public domainJon Fjeldsa (jfjeldsaa@zmuc.ku.dk)Jon Fjeldsa (jfjeldsaa@zmuc.ku.dk)384827392ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101002Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA232959http://www.ncbi.nlm.nih.gov/sra?term=SRP035853http://www.ncbi.nlm.nih.gov/bioproject/PRJNA309350http://www.ncbi.nlm.nih.gov/sra?term=SRP068698CaiLiChina National GeneBanklicai@genomics.org.cnlicai@genomics.org.cnThe avian phylogenomic project data. The evolutionary relationship of modern birds is one of the most challenging questions in systematic biology and has been debated for centuries. We proposed to rebuild the avian phylogenetic tree by using whole genome data, thus we have collected genomes of 48 bird species, representing 36 orders of bird class. The chicken, zebrafinch, and turkey genomes, which were sequenced in Sanger method, were collected from public domain. Another three genomes, pigeon, peregrine falcon, and Beijing duck, have been published during the development of this project.
<br />
The data posted here include the full genome assemblies of 45 bird species, the repeat and gene annotation produced by our new pipeline, 8295 1:1 syntenic orthologous genes, and the whole genome alignment data for all bird species. The detailed information of the published genomes can be accessed from their own publications. The genomes first released here were sequenced and assembled with NGS technology in whole genome shotgun strategy. Using an homology-based method, we annotated 13000~17000 protein-coding genes in each avian genome.
<br />
So far as we known, the avian phylogenomic project is the biggest comparative genomics project to date. The unprecedented genomic data presented here will contribute to the downstream analyses in many fields, including phylogenetics, comparative genomics, neurology, development biology, etc.
<br />
Please see the <a href="ftp://climb.genomics.cn/pub/10.5524/100001_101000/101000/README.txt" target="_blank">README</a> file for a more complete description of the files that can be downloaded here.<br />
Below are listed the links to all the individual species data used in this study. In addition, the entire dataset has been compressed into a single archive file <bird_phylogenomics_data.tar.gz> for those who wish to retrieve the complete set.<br />
<br />Adelie Penguin - <em>Pygoscelis adeliae</em> - PYGAD - <a href="http://dx.doi.org/10.5524/100006" target="_blank">doi:10.5524/100006</a>
<br />American Crow - <em>Corvus brachyrhynchos</em> - CORBR - <a href="http://dx.doi.org/10.5524/101008" target="_blank">doi:10.5524/101008</a>
<br />American Flamingo - <em>Phoenicopterus ruber ruber</em> - PHORU - <a href="http://dx.doi.org/10.5524/101035" target="_blank">doi:10.5524/101035</a>
<br />Anna's Hummingbird - <em>Calypte anna</em> - CALAN - <a href="http://dx.doi.org/10.5524/101004" target="_blank">doi:10.5524/101004</a>
<br />Beijing Duck (Mallard) - <em>Anas platyrhynchos</em> - ANAPL - <a href="http://dx.doi.org/10.5524/101001" target="_blank">doi:10.5524/101001</a>
<br />Bald Eagle - <em>Haliaeetus leucocephalus</em> - HALLE - <a href="http://dx.doi.org/10.5524/101040" target="_blank">doi:10.5524/101040</a>
<br />Barn Owl - <em>Tyto alba</em> - TYTAL - <a href="http://dx.doi.org/10.5524/101039" target="_blank">doi:10.5524/101039</a>
<br />Bar-tailed Trogon - <em>Apaloderma vittatum</em> - APAVI - <a href="http://dx.doi.org/10.5524/101016" target="_blank">doi:10.5524/101016</a>
<br />Brown Mesite - <em>Mesitornis unicolor</em> - MESUN - <a href="http://dx.doi.org/10.5524/101030" target="_blank">doi:10.5524/101030</a>
<br />Budgerigar - <em>Melopsittacus undulatus</em> - MELUN - <a href="http://dx.doi.org/10.5524/100059" target="_blank">doi:10.5524/100059</a>
<br />Carmine Bee-eater - <em>Merops nubicus</em> - MERNU - <a href="http://dx.doi.org/10.5524/101029" target="_blank">doi:10.5524/101029</a>
<br />Chicken - <em>Gallus gallus</em> - GALGA - <a href="http://www.nature.com/nature/journal/v432/n7018/full/nature03154.html" target="_blank">Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution</a>
<br />Chimney Swift - <em>Chaetura pelagica</em> - CHAPE - <a href="http://dx.doi.org/10.5524/101005" target="_blank">doi:10.5524/101005</a>
<br />Chuck-will's-widow - <em>Antrostomus carolinensis</em> - ANTCA - <a href="http://dx.doi.org/10.5524/101019" target="_blank">doi:10.5524/101019</a>
<br />Common Cuckoo - <em>Cuculus canorus</em> - CUCCA - <a href="http://dx.doi.org/10.5524/101009" target="_blank">doi:10.5524/101009</a>
<br />Common Ostrich - <em>Struthio camelus australis</em> - STRCA - <a href="http://dx.doi.org/10.5524/101013" target="_blank">doi:10.5524/101013</a>
<br />Crested Ibis - <em>Nipponia nippon</em> - NIPNI - <a href="http://dx.doi.org/10.5524/101003" target="_blank">doi:10.5524/101003</a>
<br />Cuckoo-roller - <em>Leptosomus discolor</em> - LEPDI - <a href="http://dx.doi.org/10.5524/101028" target="_blank">doi:10.5524/101028</a>
<br />Dalmatian Pelican - <em>Pelecanus crispus </em> - PELCR - <a href="http://dx.doi.org/10.5524/101032" target="_blank">doi:10.5524/101032</a>
<br />Downy Woodpecker - <em>Picoides pubescens</em> - PICPU - <a href="http://dx.doi.org/10.5524/101012" target="_blank">doi:10.5524/101012</a>
<br />Emperor Penguin - <em>Aptenodytes forsteri</em> - APTFO - <a href="http://dx.doi.org/10.5524/100005" target="_blank">doi:10.5524/100005</a>
<br />Golden-collared Manakin - <em>Manacus vitellinus</em> - MANVI - <a href="http://dx.doi.org/10.5524/101010" target="_blank">doi:10.5524/101010</a>
<br />Great Cormorant - <em>Phalacrocorax carbo</em> - PHACA - <a href="http://dx.doi.org/10.5524/101034" target="_blank">doi:10.5524/101034</a>
<br />Great-crested Grebe - <em>Podiceps cristatus</em> - PODCR - <a href="http://dx.doi.org/10.5524/101036" target="_blank">doi:10.5524/101036</a>
<br />Grey-crowned Crane - <em>Balearica regulorum gibbericeps</em> - BALRE - <a href="http://dx.doi.org/10.5524/101017">doi:10.5524/101017</a>
<br />Hoatzin - <em>Opisthocomus hoazin</em> - OPHHO - <a href="http://dx.doi.org/10.5524/101011" target="_blank">doi:10.5524/101011</a>
<br />Kea - <em>Nestor notabilis </em> - NESNO - <a href="http://dx.doi.org/10.5524/101031" target="_blank">doi:10.5524/101031</a>
<br />Killdeer - <em>Charadrius vociferus </em> - CHAVO - <a href="http://dx.doi.org/10.5524/101007" target="_blank">doi:10.5524/101007</a>
<br />Little Egret - <em>Egretta garzetta</em> - EGRGA - <a href="http://dx.doi.org/10.5524/101002" target="_blank">doi:10.5524/101002</a>
<br />MacQueen's Bustard - <em>Chlamydotis macqueenii </em> - CHLMA - <a href="http://dx.doi.org/10.5524/101022" target="_blank">doi:10.5524/101022</a>
<br />Medium Ground-finch - <em>Geospiza fortis</em> - GEOFO - <a href="http://dx.doi.org/10.5524/100040" target="_blank">doi:10.5524/100040</a>
<br />Northern Fulmar - <em>Fulmarus glacialis</em> - FULGL - <a href="http://dx.doi.org/10.5524/101025" target="_blank">doi:10.5524/101025</a>
<br />Peregrine Falcon - <em>Falco peregrinus</em> - FALPE - <a href="http://dx.doi.org/10.5524/101006" target="_blank">doi:10.5524/101006</a>
<br />Pigeon - <em>Columba livia</em> - COLLI - <a href="http://dx.doi.org/10.5524/100007" target="_blank">doi:10.5524/100007</a>
<br />Red-crested Turaco - <em>Tauraco erythrolophus</em> - TAUER - <a href="http://dx.doi.org/10.5524/101038" target="_blank">doi:10.5524/101038</a>
<br />Red-legged Seriema - <em>Cariama cristata</em> - CARCR - <a href="http://dx.doi.org/10.5524/101020" target="_blank">doi:10.5524/101020</a>
<br />Red-throated Loon - <em>Gavia stellata</em> - GAVST - <a href="http://dx.doi.org/10.5524/101026" target="_blank">doi:10.5524/101026</a>
<br />Rhinoceros Hornbill - <em>Buceros rhinoceros silvestris</em> - BUCRH - <a href="http://dx.doi.org/10.5524/101018" target="_blank">doi:10.5524/101018</a>
<br />Rifleman - <em>Acanthisitta chloris</em> - ACACH - <a href="http://dx.doi.org/10.5524/101015" target="_blank">doi:10.5524/101015</a>
<br />Speckled Mousebird - <em>Colius striatus</em> - COLST - <a href="http://dx.doi.org/10.5524/101023" target="_blank">doi:10.5524/101023</a>
<br />Sunbittern - <em>Eurypyga helias</em> - EURHE - <a href="http://dx.doi.org/10.5524/101024" target="_blank">doi:10.5524/101024</a>
<br />Turkey - <em>Meleagris gallopavo</em> - MELGA - <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000475" target="_blank">Multi-Platform Next-Generation Sequencing of the Domestic Turkey (<em>Meleagris gallopavo</em>): Genome Assembly and Analysis</a>
<br />Turkey Vulture - <em>Cathartes aura</em> - CATAU - <a href="http://dx.doi.org/10.5524/101021" target="_blank">doi:10.5524/101021</a>
<br />White-tailed Eagle - <em>Haliaeetus albicilla</em> - HALAL - <a href="http://dx.doi.org/10.5524/101027" target="_blank">doi:10.5524/101027</a>
<br />White-tailed Tropicbird - <em>Phaethon lepturus</em> - PHALE - <a href="http://dx.doi.org/10.5524/101033" target="_blank">doi:10.5524/101033</a>
<br />White-throated Tinamou - <em>Tinamus guttatus</em> - TINGU - <a href="http://dx.doi.org/10.5524/101014" target="_blank">doi:10.5524/101014</a>
<br />Yellow-throated Sandgrouse - <em>Pterocles gutturalis</em> - PTEGU - <a href="http://dx.doi.org/10.5524/101037" target="_blank">doi:10.5524/101037</a>
<br />Zebra Finch - <em>Taeniopygia guttata</em> - TAEGU - <a href="http://www.nature.com/nature/journal/v464/n7289/full/nature08819.html" target="_blank">The genome of a songbird</a>
M.ThomasPGilbert0000-0002-5805-7195ErichDJarvis0000-0001-8931-5049BoLiCaiLiThe Avian Genome ConsortiumJunWangGuojieZhangGenomic2101000.jpgBird moziac of darwinCC0BGIindividual bird images by Jon Fjeldså.19327352832ftp://penguin.genomics.cn/pub/10.5524/100001_101000/101000/ Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenome sequence of the duck (Anas platyrhynchos). Available here is the first draft genomic sequence of the duck (<em>Anas platyrhynchos</em>). Duck is a member of Anatidae, a family of birds that include geese and swans. However, duck is an economically important waterfowl serving as a source of meat, eggs and feathers; though, of special interest to agriculture and medicine is that fact that the duck is a principal natural host of influenza A viruses and harbours all subtypes of 16 haemagglutinin and 9 neuraminidase subtypes currently known, except for H13 and H16 subtypes. <br />
Using llumina Genome Analyser sequencing technology the genome of a 10-week old female Beijing duck was sequenced and a total 77 Gb of paired-end reads (approximately 64-fold coverage of the whole genome) was generated with an average length of 50 bp. Using SOAPdenovo to combine short reads, a draft genome assembly was constructed consisting of 78,487 scaffolds and covered 1.1 Gb. The contig N50 and scaffold N50 values were 26 kb and 1.2 Mb respectively. Super scaffolds were constructed and chromosomal sequences created according to the duck genetic map – this resulted in 47 superscaffolds which contained 225 scaffolds and spanned 289 Mb. Transcriptomes were also generated from several different tissues, comprising 1.87 million ESTs, and approximately 121 million 75-bp and 917 million 90-bp paired-end reads, which were generated using either the 454/Roche Life Sciences Analyzer or Illumina Genome sequencing technology.
YinhuaHuangDavidWBurtHualanChenHeebalKimShangquanGanYiqiangZhaoKangYiHuapengFengPengyangZhuQiuyueLiuSuanFairleyKatharineEMagorZhenlinDuXiaoxiangHuHakimTaferAlainVignalTaeheonLeeKyu-WonKimZheyaShengYangAnMartienAMGroenenRichardPMACrooijmansRobertGWebsterJerryRAldridge0000-0002-9653-1775WesleyCWarrenSebastianBartschatStephanieKehrManjaMarzPeterFStadlerJacquelineSmithRobertHSKrausYaofengZhaoLimingRenJingFeiMireilleMorissonPeteKaiserDarrenKGriffinManRaoFrederiquePitelQingleCaiThomasFarautLaurieGoodman0000-0001-9724-5976JavierHerreroBoLiJianwenLiNingLiYingruiLiWubinQianSteveSearleJunWangYongZhangGenomic2101001_Duck.pngUnknownPublic domainUnknownMallard duck1073741824ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101001/http://asia.ensembl.org/Anas_platyrhynchos/Info/IndexGenome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10100010.1038/ng.26572374919110.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA46621http://www.ncbi.nlm.nih.gov/bioproject/PRJNA194464http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22967XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data from the peregrine falcon (Falco peregrinus). The peregrine falcon (<em>Falco peregrinus</em>) is a top predator that has unique morphological, physiological and behavioural adaptations, allowing them to be successful hunters – it is also known as the world’s fastest animal. <br />The peregrine falcon genome was sequenced to further understand the evolutionary adaptation of this successful hunter. An Illumina HiSeq 2000 platform was used to generate a 128.07 Gb sequence and the initial genome size was estimated at 1.2 Gb, suggesting a genome coverage of 106.72X.<br /> SOAPdenovo was used for assembly and resulted in scaffold and contig N50 values of 3.89 Mb and 28.6 kb respectively. Fosmid-based Sanger sequencing confirmed >99% coverage of euchromatic DNA. These data provide insights into evolution of the peregrine’s predatory lifestyle.<br />MichaelWBrufordChangchangCaoYingChenYuanpingChenAndrewDixonNickCFoxShukunGaoJingHeHaolongHouLiHuShengguangLiaoGuoqingLiYuanLiuQiongLuoMargitGMullerPeixiangNiShengkaiPanJunWangJunyiWangZhaobaoWangJinquanXiaPengweiXuYeYinZhenYueXiangjiangZhanGenomic2101006_Peregrine.pngUnknownCC0UnknownUnknown2147483648ftp://penguin.genomics.cn/pub/10.5524/101001_102000/101006/Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10007510.1038/ng.25882352507610.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/bioproject/PRJNA159791XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data from the saker falcon (Falco cherrug). The saker falcon (<em>Falco cherrug</em>) is the national bird of Hungary, and has unique morphological, physiological and behavioural adaptations allowing them to be successful hunters. It often hunts by horizontal pursuit of rodents and birds, rather than the high-speed dives (stoop from a height) some other birds of prey such as the peregrine use. <br />Here we present the data for the saker falcon genome that was obtained using an Illumina HiSeq 2000 platform that generated a 136.21 Gb sequence. Paired-end libraries with insert sizes of 170, 500 and 800 bp (short inserts) and 5, 10 and 20 kb (long inserts) were constructed. The initial genome size was estimated at 1.2 Gb, suggesting a genome coverage of 113.51x. SOAPdenovo was used for assembly and resulted in scaffold N50 and contig values of 4.15 Mb and 31.2 kb, respectively. Fosmid-based Sanger sequencing confirmed >97% coverage of euchromatic DNA. This data will help provide insights into the evolution of the saker’s predatory lifestyle.MichaelWBrufordChangchangCaoYingChenYuanpingChenAndrewDixonNickCFoxShukunGaoJingHeHaolongHouLiHuShengguangLiaoGuoqingLiYuanLiuQiongLuoMargitGMullerPeixiangNiShengkaiPanJunWang0000-0002-1422-3331JunyiWangZhaobaoWangJinquanXiaPengweiXuYeYinZhenYueXiangjiangZhanGenomic2100075_Saker.pngSaker falcon closeupCC0UnknownUnknown2147483648ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100075/2014-01-27Genome 10Khttp://www.genome10k.org/10100610.1038/ng.258823525076http://www.ncbi.nlm.nih.gov/bioproject/PRJNA168071XinLiuBGI-shenzhenliuxin@genomics.org.cnliuxin@genomics.org.cnGenomic data from the domestic yak (Bos grunniens). Domestic yaks (<em>Bos grunniens</em>) provide meat and other necessities for Tibetans living at high altitude on the Qinghai-Tibetan Plateau and in adjacent regions. Here, we present the draft genome sequence of a female domestic yak generated using Illumina HiSeq 2000 technology at 65-fold coverage. <br />De novo assembly of 4.4 billion reads from paired-end libraries yielded a draft assembly with a total length of 2,657 Mb, and contig and scaffold N50 sizes of 20.4 kb and 1.4 Mb, respectively. Approximately 90% of the total sequence was covered by 2,083 scaffolds of >307 kb, with the largest scaffold spanning 8.8 Mb.QiangQiuTaoMaZhiqiangYeQuanjunHuJaebumKimDenisMLarkinLorettaAuvilBorisCapitanuJianMaHarrisALewinRanZhouLizhongWangKunWangXuLuXuetaoZangHuiMaJianZhangZhaofengWangYingmeiZhangDaweiZhangTakahiroYonezawaMasamiHasegawaYangZhongWenbinLiuShengxiangZhangRuijunLongJohannesALenstraYiWuPengShiJianquanLiuChangchangCaoDavidNCooperHaolongHouZhiyongHuangYongshanLangShengguangLiaoShengkaiPanWubinQianXiaojuQianJianWangJunWang0000-0002-1422-3331JunyiWangYanWangJinquanXiaHuanmingYang0000-0002-0858-3410YeYinGuojieZhangYanZhangGenomic2100071_Yak.pngyak_nepal.jpgCC-BYPixabaySimon@pixabay (http://pixabay.com/en/users/Simon/)2821225472ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100071/10.1038/ng.234322751099http://www.ncbi.nlm.nih.gov/sra?term=SRP009062http://www.ncbi.nlm.nih.gov/sra?term=SRP009200http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE33300UndefinedundefinedBGI-Shenzhenundefined@mail.comundefined@mail.comGenomic and transcriptomic data from the Brandt's bat (Myotis brandtii). Brandt's bat (<em>Myotis brandtii</em>) is a species of vesper bat in the family Vespertilionidae. It is found throughout most of Europe and parts of Asia. It is known for its extreme longevity quotient, approximately twice that of humans.<br /> A whole-genome shotgun strategy was applied to sequence the genome of an adult male Brandt’s bat (<em>M. brandtii</em>) from the Obvalnaya cave in Russia. We also sequenced liver, kidney and brain transcriptomes of hibernating and summer-active M. brandtii. Approximately 156 Gb (or 78 × ) genome data including 101 Gb (or 51 × ) short insert size reads were generated, resulting in a high-quality assembly with scaffold N50 of 3.1 Mb and contig N50 of 21 kb IngeSeimAlexeyVLobanovSimingMaYueYue FengTobiasLLenzMaximVGerashchenkoDingdingFanXiaomingYaoDanielJordanYingqiXiongAndreyNLyapunovGuanxingChenOksanaIKulakovaYudongSunSang-GooGLeeAlexeyAMoskalevVadimNGladyshevbRoderickTBronsonXiaodongFangZhiyongHuangAndersKrogh0000-0002-5147-6282ShamilRSunyaevAntonATuranovJunWang0000-0002-1422-3331ZhiqiangXiongSunHeeYimGuojieZhangYabingZhuYongMaGenomic2100065_Myotis brandti.jpgMyotis brandti.jpgGNU Free Documentation License, CC SAhttp://zmmu.msu.ru/bats/rusbats/gallery/pmbra.html#aБорисенко1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100065/10.1038/ncomms321223962925http://www.ncbi.nlm.nih.gov/sra?term=SRS374114UndefinedundefinedBGI-Shenzhenundefined@mail.comundefined@mail.comGenomic data from the Insectivorous bat (Myotis davidii). The microbats constitute the suborder <em>Microchiroptera</em> within the order <em>Chiroptera</em> (bats). They are most often referred to by their scientific name. Other English names are "insectivorous bats", "echolocating bats", "small bats" or "true bats". All these names are somewhat inaccurate, because not all microbats feed on insects, and some of them are larger than small megabats. <br /> We applied a whole genome shotgun strategy and next-generation sequencing technologies using an Illumina HiSeq 2000 platform to sequence the genome of an individual adult male <em>M. davidii</em>. In order to lower the risk of non-randomness, we constructed 26 paired-end libraries sequenced on 26 lanes, with insert sizes of about 250 base pairs (bp), 500 bp, 800 bp, 2 kb, 5 kb, 10 kb and 20 kb. In total, we generated about 237.2 Gb (118.6x coverage) of high quality sequence for the assembly. The genome was de novo assembled by <a href="http://soap.genomics.org.cn">SOAPdenovo</a>. The final contig size and N50 were 1.91 Gb and 13.2 kb,
respectively. The total scaffold size and N50 were 2.09 Gb and 3.39 Mb, respectively. MichelleLBakerKimberlyABishop-LillyChristopherCBroderYuanxinChenChristopherCowledGaryCrameriXiaodongFangYueFengKennethGFreyZhiyongHuangXuantingJiangGlennAMarshJustinNgZhengliShiXiaoqingSunMaryTachedjianJunWang0000-0002-1422-3331LijunWuJamesWWynneJinXiaoZhiqiangXiongLanYangGuojieZhangYongZhangWeiZhaoPengZhouYabingZhuLin-FaWangGenomic2100067_Myotis davidii.jpgMyotis davidiiCC-BYIUCNEOL China Regional Center1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100067/10.1126/science.123083523258410http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1048081http://www.ncbi.nlm.nih.gov/sra?term=SRS378033UndefinedundefinedBGI-Shenzhenundefined@mail.comundefined@mail.comGenomic data from the Black Flying Fox (Pteropus alecto). The black flying fox, <em>Pteropus alecto</em>, is a megabat in the family <em>Pteropodidae</em>. Members of the genus <em>Pteropus</em> include the largest bats in the world. The <em>Pteropus</em> genus currently has about 57 recognised species. The genus is primarily an island taxon, with 55 species having some or all of their distribution on islands. <br /> We applied a whole genome shotgun strategy and next-generation sequencing
technologies using an Illumina HiSeq 2000 platform to sequence the genome of an individual
adult male <em>P. alecto</em>. In order to lower the risk of
non-randomness, we constructed 26 paired-end libraries sequenced on 26 lanes, with insert sizes of about 250 base pairs (bp), 500 bp, 800 bp, 2 kb, 5 kb, 10 kb and 20 kb. In total, we generated about 219.7 Gb (109.8x coverage) of high quality sequence for the assembly. The genome was de novo assembled by <a href=http://soap.genomics.org.cn>SOAPdenovo</a>. The final total contig size and N50 were 1.99 Gb and 25.7 kb, respectively. The total scaffold size and N50 were 2.03 Gb and 15.8 Mb, respectively.<br />MichelleLBakerKimberlyABishop-LillyChristopherCBroderYuanxinChenChristopherCowledGaryCrameriXiaodongFangYueFengKennethGFreyZhiyongHuangXuantingJiangGlennAMarshJustinNgZhengliShiXiaoqingSunMaryTachedjianJunWang0000-0002-1422-3331LijunWuJamesWWynneJinXiaoZhiqiangXiongLanYangGuojieZhangYongZhangWeiZhaoPengZhouYabingZhuLin-FaWangGenomic2100066_Pteropus_alecto.jpgA small group of Black flying-foxesCC-BYOwn workWelbergen2147483648ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100066/10.1126/science.123083523258410http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1048080http://www.ncbi.nlm.nih.gov/sra?term=SRS378032HuijueJiaBGI Shenzhenjiahuijue@genomics.cnjiahuijue@genomics.cnSupporting data for the paper: "An integrated catalog of reference genes in the human gut microbiome". Here we sequenced 249 fecal samples from European adults, leading to a total of 760 samples in the Metagenome of the Human Intestinal Tract (<a href="http://dx.doi.org/10.1038/nature08821">MetaHIT</a>) project. All 6.4TB whole-genome shotgun sequencing data from 1267 fecal samples in <a href="http://dx.doi.org/10.1038/nature08821">MetaHIT</a>, the Human Microbiome Project (<a href="http://www.hmpdacc.org/HMASM/">HMP</a>) and our <a href="http://dx.doi.org/10.5524/100036">diabetes study on Chinese adults</a> were processed with the <a href="http://dx.doi.org/10.1371/journal.pone.0047656">MOCAT pipeline</a>. The resulting gene catalogs were merged using <a href="http://dx.doi.org/10.1093/bioinformatics/btl158">CD-HIT</a> and complemented with genes from 511 sequenced human gut-related prokaryotic genomes that were present in our gut metagenomes. The final high-quality integrated reference catalog of the human gut microbiome contains 9,879,896 non-redundant genes. The genes were phylogenetically annotated according to 3449 bacterial and archaeal genomes and draft genomes from NCBI, and functionally annotated using orthologous groups from the Kyoto Encyclopedia of Genes and Genomes (<a href="http://www.genome.jp/kegg/pathway.html">KEGG</a>) and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (<a href="http://eggnog.embl.de/version_3.0/index.html">eggNOG</a>) databases. In addition, 11 samples from the Chinese cohort were re-extracted using the MetaHIT DNA extraction protocol and shotgun-sequenced to compare with the original data generated by a slightly different DNA extraction protocol.HuijueJia0000-0002-3592-126XXianghangCaiHuanziZhongQiangFengShinichiSunagawaManimozhiyanArumugam0000-0002-0886-9101JensRoatKultimaEdiPriftiTrineNielsenAgnieszkaSierakowskaJunckerChaysavanhManichanhBingChenFlorenceLevenezLiangXiaoHailongZhaoJumanaYousufAl-AamaSherifEdrisTorbenHansenHenrikBjornNielsenSorenBrunakFranciscoGuarnerOlufPedersenJoelDoréS.DuskoEhrlichMetaHIT consortiumPeerBorkWeinengChenKarstenKristiansen0000-0002-6024-0917SuishaLiangJunhuaLiJianWangJuanWangJunWang0000-0002-1422-3331XunXuHuanmingYang0000-0002-0858-3410DongyaZhangWenweiZhangZhaoxiZhangMetagenomic3100064_GutMicrobiota.jpgThe human gut is home for trillions of microbes.CC-BY-SA 3.0BGIJunda Peng, Jianqing Zhao and Lihua Xie133143986176ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100064/Integrated reference catalog of the human gut mircobiomehttp://meta.genomics.cn/metagene/meta/home20001210031710.1038/nbt.2942http://www.ebi.ac.uk/ena/data/view/ERP000108http://www.ebi.ac.uk/ena/data/view/ERP002061http://www.ebi.ac.uk/ena/data/view/ERP003612http://www.ebi.ac.uk/ena/data/view/ERP004605http://www.ebi.ac.uk/ena/data/view/SRP002163http://www.ebi.ac.uk/ena/data/view/SRP011011http://www.ebi.ac.uk/ena/data/view/SRP008047YizhuangZhouBGI-Shenzhenzhouyizhuang@genomics.cnzhouyizhuang@genomics.cnMetacellulosomics data demonstrating synergism in a soil-derived cellulose-degrading microbial community. Currently available enzyme technology is insufficient to economically degrade plant biomass, and presumably will remain so until we reach a comprehensive understanding of how nature solves this problem. Here we show that a microbial consortium enriched from soil establishes collaborative relationships to enable efficient hydrolysis of plant polysaccharides. Analyses of reconstructed bacterial draft genomes from all seven uncultured phylotypes in the consortium show that these microbes cooperate in both cellulose-degrading and other important metabolic processes. Experimental evidence for cellulolytic inter-species synergies came from the discovery of cellulosome structures composed of subunits from different phylotypes. Oxygen consumption by specific phylotypes enables anaerobic saccharification, whereas inferred utilization of the resulting metabolites by non-cellulolytic phylotypes negates their accumulation and associated negative effects towards cellulose degradation. These collaborative microbial actions illustrate that efficient biomass conversion in nature relies on a high level of microbial community organization. YizhuangZhouPhillipBPopeShaochunLiFengjiTanShuChengJinlongYangFengLiuXuejingLeiQingqingSuChengranZhouJiaoZhaoXiuzhuDongTaoJinRuifuYangVincentHGEijsinkJingChenJianWangJunWang0000-0002-1422-3331BoWenHuanmingYang0000-0002-0858-3410ShuangYangGengyunZhangXinZhouMetagenomic3Proteomic10100049_Metacellosomics.jpgSoilCC0http://g.hiphotos.bdimg.com/album/w%3D2048%3Bq%3D90/sign=fc34c48210dfa9ecfd2e511756e8cc72/ca1349540923dd5458a901f6d009b3de9d824851.jpgYizhuang Zhou96636764160ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100049/10.1038/srep05288http://www.ncbi.nlm.nih.gov/sra?term=SRA065216KeithBradnamUC Daviskrbradnam@ucdavis.edukrbradnam@ucdavis.eduAssemblathon 2 assemblies. Assemblathon 2 is a genome assembly contest where participating teams attempted to
assemble genomes for three vertebrate species using a mixture of next-generation sequencing
data. In total, 43 assemblies were submitted for three species (15 for bird, 16 for fish, and 12 for
snake). These assemblies were assessed using a wide variety of statistical approaches as well
as using experimental data from Fosmid sequences and optical maps.AntonAlexandrovPaulBaranayMichaelBechnerInançBirolSébastienBoisvertJarrodAChapmanGuillaumeChapuisRayanChikhiHamidrezaChitsazJacquesCorbeilCristianDel FabbroT.RoderickRDockingDentEarlScottEmrichPavelFedotovNunoAFonsecaRichardAGibbsSanteGnerreÉlénieGodzaridisSteveGoldsteinMatthiasHaimelGilesHallDavidHausslerJosephBHiattIsaacHoMartinHuntShaunDJackmanDavidBJaffeHuaiyangJiangSergeyKazakovPaulJKerseyDominiqueLavenierFrançoisLavioletteYueLiuIainMacCallumMatthewDMacManesNicolasMailletSergeyMelnikovDelphineNaquinZeminNingThomasDOttoBenedictPatenOctávioSPauloAdamMPhillippyFranciscoPina-MartinsMichaelPlaceDariuszPrzybylskiXiangQinCarsonQuFilipeJRibeiroStephenRichardsDanielSRokhsarJ.GrahamRubySimoneScalabrinMichaelCSchatzDavidCSchwartzAlexeySergushichevTedSharpeTimothyIShawJaredTSimpsonHenrySongFedorTsarevFrancescoVezziRiccardoVicedominiBrunoMVieiraKimCWorleyShuangyeYinShiguoZhouIanFKorfKeithRBradnamWen-ChiChouRichardDurbinJosephNFassGaneshkumarGanapathyJasonTHowardErichDJarvis0000-0001-8931-5049JacobOKitzmanJamesRKnightSergeyKorenTak-WahLamBinghangLiuYingruiLiZhenyuLiRuibangLuoJayShendureYujianShiJunWang0000-0002-1422-3331Siu-MingYiuJianyingYuanGuojieZhangHaoZhangGenomic2Assemblathon 2 bannerCC0http://Keith Bradnam28991029248ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100060/10006110006210.1186/2047-217X-2-1023870653http://www.ncbi.nlm.nih.gov/sra?term=ERP002324http://www.ncbi.nlm.nih.gov/sra?term=SRA026860http://www.ncbi.nlm.nih.gov/sra?term=ERP002294ShancenZhaoBGI Shenzhenzhaoshancen@genomics.org.cnzhaoshancen@genomics.org.cnGenomic data from Aegilops tauschii - The Progenitor of Wheat D Genome A spontaneous hybridization of the wild diploid grass Aegilops tauschii (2n=14, DD) with cultivated tetraploid wheat Triticum turgidum (2n=4x=28, AABB) 8,000~10,000 years ago in the Fertile Crescent resulted in the bread wheat (Triticum aestivum; 2n=6x=42, AABBDD), one of the earliest cultivated crops in modern agriculture. We sequenced the 4.36-gigabase (Gb) genome of Ae. tauschii by generating ~90x genome coverage of short reads from a series of libraries with various insert sizes. The assembled scaffolds of high quality sequences represent 83.4% of the genome, in which 65.9% comprised of repetitive elements. Assisted with comprehensive RNA-Seq data, we identified 43,150 protein-coding genes, with 30,697 (71.1%) of them uniquely anchored to chromosomes based on an integrated density genetic map. A number of agriculturally relevant gene families, such as disease resistance, abiotic stress tolerance, and grain quality genes, were found to expand in Ae. tauschii. The draft genome of Ae. tauschii hence provides novel insights into its role in enabling environmental adaptation of common wheat and in defining the large and complicated genomes of wheat species.JizengJiaLongMaoChuanGaoWeimingHeDongLiYongTaoJunWang0000-0002-1422-3331ChiZhangShancenZhaoGenomic2100054_Aegilops_tauschii_ARS-1.jpgAegilops tauschii ARS-1CC0WikipediaMark Nesbitt547608330240ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100054/10.1073/pnas.121908211023610408http://www.ncbi.nlm.nih.gov/sra?term=SRP017494ShancenZhaoBGI Shenzhenzhaoshancen@genomics.org.cnzhaoshancen@genomics.org.cnGenomic data from Triticum urartu - the progenitor of wheat A genome The wheat A genome, as a basic genome of bread wheat and other polyploid wheats, is centrally important to the evolution, domestication, and genetic improvement of wheat. The progenitor of the A genome is the diploid wild einkorn wheat Triticum urartu. Here, we sequenced T. urartu (accession G1812) using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform, and assembled the genome using SOAPdenovo2 with 448.49 Gb of filtered high-quality sequence data. The genome assembly reached 3.92 Gb (without Ns) with a contig N50 length of 3.42 kb and 4.66 Gb (with Ns) with a scaffold N50 length of 63.69 kb . To facilitate gene prediction, we generated a 116.65 Mb transcriptome of T. urartu with 67.14 Gb RNA-Seq data from eight different tissues and treatments, and 49,935 assembled transcripts from six tissues using the Roche 454 sequencing platform. In total, we predicted 34,879 protein-coding gene models. The average gene size was 3,207 bp, with a mean of 4.7 exons per gene.Hong-QingLingQinsiLiangDaowenWangAiminZhangChuanGaoYongTaoJunWang0000-0002-1422-3331ChiZhangShancenZhaoGenomic2100050_Wildeinkorn.jpgWildeinkornCC0WikipediaMark Nesbitt689342251008ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100050/10.1038/nature1199723535596http://www.ncbi.nlm.nih.gov/sra?term=SRP005973RuibangLuoBGIaquaskyline@gmail.comaquaskyline@gmail.comSoftware and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly” SOAPdenovo2 is the latest <em>de novo</em> genome assembly package from BGI’s SOAP (short oligonucleotide analysis package) suite of tools (homepage here: <a href="http://soap.genomics.org.cn/">http://soap.genomics.org.cn/</a>). Compared to SOAPdenovo1, this new version has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closure, and is optimized for large genomes.<br /> Using new sequencing data from the YH (<em>Homo sapiens</em>) diploid genome – the first sequenced Han Chinese individual, an updated assembly was produced (see dataset here: <a href="http://dx.doi.org/10.5524/100038">doi:10.5524/100038</a>), with the N50 scores for the contig and scaffold being 3-fold and 50-fold longer, respectively, than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 times lower during the point of largest memory consumption. <br /> Benchmarking with Assemblathon1 and GAGE datasets shows that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo1 and is competitive to other assemblers on both assembly length and accuracy.
<br />
In order to facilitate readers to repeat and recreate these findings, configured packages with the compressed pipelines containing all of the necessary shell scripts and tools are available from the BGI FTP server (<a href="ftp://public.genomics.org.cn/BGI/SOAPdenovo2">ftp://public.genomics.org.cn/BGI/SOAPdenovo2</a>).
<br />
The latest version of SOAPdenovo2 is available from Sourceforge: <a href="http://soapdenovo2.sourceforge.net/">http://soapdenovo2.sourceforge.net/</a>
<br />
These pipelines are available from our data platform as Galaxy workflows: <a href="http://galaxy.cbiit.cuhk.edu.hk/">http://galaxy.cbiit.cuhk.edu.hk/</a>YanxiangChenYinlongXieWeihuaHuangGuangzhuHeQiPanYunjieLiuJingboTangGengxiongWuYongLiuDavidCheungGuangmingLiuShaoliangPengTak-WahWLamChangleiHanXiangkeLiaoBinghangLiuYingruiLiZhenyuLiRuibangLuoYaoLuYujianShiBoWangJianWangJunWang0000-0002-1422-3331ZhuXiaoqianHuanmingYang0000-0002-0858-3410Siu-MingYiuJianyingYuanChangYuHaoZhangWorkflow5Software6100044_SOAPdenovo2.jpgSOAPdenovo2 workflowCC0Peter Li, GigaSciencePeter Li268435456000ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100044/http://soap.genomics.org.cn/http://soapdenovo2.sourceforge.net/http://galaxy.cbiit.cuhk.edu.hk/u/gigascience/g/rluo201210003810014810.1186/2047-217X-1-1823587118Shenzhen Key Laboratory of Trans-omics BiotechnologiesunknownCXB201108250096ANational Natural Science Foundation of Chinahttp://dx.doi.org/10.13039/50110000180990612019National High Technology Research and Development Program of China-863 programhttp://www.most.gov.cn/eng/programmes1/200610/t20061009_36225.htm2012AA02A201State Key Development Program for Basic Research of China-973 Programhttp://www.973.gov.cn/English/Index.aspx2011CB809203Shenzhen Municipal Government of ChinaunknownJC201005260191ARuibangLuoBGIaquaskyline@gmail.comaquaskyline@gmail.comUpdated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012) Updated genomic data from the YH <em>(Homo sapiens)</em> diploid genome – the first sequenced Han Chinese individual, a representative of the Asian population. The genomic DNA used in this study came from an anonymous male Han Chinese individual who has no known genetic diseases.
The original version of the YH genome was assembled based on 3.3 billion reads using the Illumina Genome Analyzer (see dataset <a href="http://dx.doi.org/10.5524/100015" target="_blank">doi:10.5524/100015</a>). This latest (as of 07/2012) and improved version of the YH genome was assembled based on 2.1 billion reads using the Illumina HiSeq2000. A total of 202G nucleotides data was achieved using 100 bp-long paired end reads with an insert size ranging from 180 bp to 40 kbp, and the genome was sequenced to 67.5-fold average coverage. The latest version of SOAPdenovo2 was used to reassemble, improve and update the previously assembled genome (tools and pipelines available here: <a href="http://dx.doi.org/10.5524/100044" target="_blank">doi:10.5524/100044</a>). By aligning the short reads with SOAP, 177G nucleotides were mapped onto the NCBI reference genome and 99.99% of the genome was covered. The raw sequences, assemblies and relevant tools are released for public use under a <a href="http://creativecommons.org/about/cc0" target="_blank">CC0 license</a>.
More information about the YH genome can be viewed at: <a href="http://yh.genomics.org.cn/" target="_blank">http://yh.genomics.org.cn/</a>RLuoYXieCYuBLiuAsanLarsBolund0000-0003-4165-1531RichardDurbinLinFangXiaodongFangGuangwuGuoKarstenKristiansen0000-0002-6024-0917NingLiSonggangLiYingruiLiZhuoLiRasmusNielsen0000-0003-0513-6591PeixiangNiJunjieQinYeyangSuJianWangJunWang0000-0002-1422-3331GaneKa-ShuWong0000-0001-6108-5560BinYangHuanmingYang0000-0002-0858-3410JiaYeXiuqingZhangHanchengZhengHongkunZhengGenomic2100013_YH.jpgHan Chinese individualpublic domain because its copyright has expiredWikimedia Commons, en:Image:HanGaozu.jpgUnknown83751862272ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100038/http://yh.genomics.org.cn/http://yh.genomics.org.cn/mapview.jsp?1000 Genomeshttp://www.1000genomes.org/10001510004410009710009610.1038/nature0748418987735http://www.ebi.ac.uk/ena/data/view/ERP001652YingruiLiBGIliyr@genomics.cnliyr@genomics.cnDNA methylome of human peripheral blood mononuclear cells from the YH Han Chinese individual. The methylome reported and analyzed here was generated from the same sample of peripheral blood mononuclear cells (PBMCs) from a consented donor (Homo sapiens) whose genome was deciphered in the <a href="http://yh.genomics.org.cn/">YH project</a>. YH is an anonymous male Han Chinese individual who has no known genetic diseases, and whose genome also serves as an Asian reference genome.
Nuclear DNA was extracted and subjected to unbiased, whole-genome bisulfite sequencing (BS-seq) using the Illumina Genome Analyzer. In total, 103.5 Gbp of paired-end sequence data were generated. Of these, 70.4 Gbp (68%) were successfully aligned to either strand of the YH genome with an average mismatch rate of 1.3%, resulting in an average sequencing depth of 12.3-fold per DNA strand or a 24.7-fold overall depth. Of the 18,962,679 CpGs present in the unique haploid part (2.21 Gb) of the YH reference genome sequence, approximately 99.86% were covered by at least one unambiguously mapped read of quality score >14 on either strand, and 92.62% were unambiguously covered on both strands.JingdeZhuMingzhiYeJianYuHonglongWuJihuaSunHongyuZhangRuibangLuo0000-0001-9711-6533YinghuaHeXinJinGuangyuZhouJinfengSunYeboHuangXiaoyuZhouShichengGuoXinLiJiujinXuStephanBeckLarsBolund0000-0003-4165-1531HongzhiCaoMinfengChenQuanChenXuedaHuKarstenKristiansen0000-0002-6024-0917NingLiQibinLiRuiqiangLiYingruiLiGengTianJianWangJunWang0000-0002-1422-3331WenWangHuanmingYang0000-0002-0858-3410ChangYuQinghuiZhangXiuqingZhangHanchengZhengHuisongZhengEpigenomic1100013_YH.jpgHan Chinese individualpublic domain because its copyright has expiredWikimedia Commons, en:Image:HanGaozu.jpgUnknown114890375168ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100014/2012-03-281000 Genomeshttp://www.1000genomes.org/10001510.1371/journal.pbio.100053321085693http://www.ncbi.nlm.nih.gov/sra?term=SRP002339http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17972YingruiLiBGIliyr@genomics.cnliyr@genomics.cnGenome sequence of YH: the first diploid genome sequence of a Han Chinese individual. Genomic data from the YH (<em>Homo sapiens</em>) genome – first diploid genome sequence of a Han Chinese, a representative of the Asian population. The genomic DNA used in this study came from an anonymous male Han Chinese individual who has no known genetic diseases.
The YH genome was assembled based on 3.3 billion reads using the Illumina Genome Analyzer. We achieved 117.7G nucleotides data and the genome was sequenced to 36-fold average coverage. By aligning the short reads with SOAP, 102.9G nucleotides were mapped onto the NCBI reference genome and 99.97% of the genome was covered. The raw sequences, alignments, consensus genome, variants and relevant tools are released for public use under a <a href="http://creativecommons.org/about/cc0">CC0 license</a>.WeiWangJunqingZhangBinxiaoFengZhenglinDuYiqingZhaoZhenzhenYangMichaelInouyeJohnPoolJinjieDuanFangLiangWenjieLiZhikeLuQinHaoYuLiangCuoPingFangChenKeZhouLingYangYangGaoXiaoliFengAsanLarsBolund0000-0003-4165-1531QuanChenRichardDurbinLinFangXiaodongFangWeiFanLaurieGoodman0000-0001-9724-5976GuangwuGuoYiranGuoInesHellmannYujieHuKarstenKristiansen0000-0002-6024-0917HuiqingLiangDaweiLiDongLiGuoqingLiHengLiJunLiLiLiNingLiQibinLiRuiqiangLiShaochuanLiSonggangLiDongyuanLiuYingruiLiZhuoLiYaoLuLijiaMaRasmusNielsen0000-0003-0513-6591PeixiangNiJunjieQinYuanyuanRenJueRuanYeyangSuGengTianJianWangJunWang0000-0002-1422-3331GaneKa-ShuWong0000-0001-6108-5560BinYangGuohuaYangHuanmingYang0000-0002-0858-3410ShuangYangZhentaoYangJiaYeXinYiChangYuGuojieZhangJianguoZhangJuanbinZhangXiuqingZhangJingZhaoHanchengZhengHongkunZhengYanZhouHongmeiZhuGenomic2100013_YH.jpgHan Chinese individualpublic domain because its copyright has expiredWikimedia Commons, en:Image:HanGaozu.jpgUnknown478888853504ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100015/http://yh.genomics.org.cn/http://yh.genomics.org.cn/mapview.jsp?1000 Genomeshttp://www.1000genomes.org/10001310001410003810.1038/nature0748418987735http://www.ebi.ac.uk/ena/data/view/ERP000053http://www.ncbi.nlm.nih.gov/bioproject/PRJEA39173YongHouBGIhouyong@genomics.cnhouyong@genomics.cnSingle cell whole-exome sequences of bladder cancer from an individual. This dataset contains single-cell and whole-tissue sequencing and annotation data from a muscle-invasive bladder transitional cell carcinoma from one individual. The data available includes: single-cell whole-exome sequences from 55 individual cells, including 44 from the tumor and 11 from normal adjacent tissue; whole-tissue DNA sequence data from this cancer and the matched normal. Additional data includes alignments, SNP calling, and high confidence somatic mutation calling and their allelic frequencies.LutingSongHanjieWuYongHouMinJianJieLiangJingxiangLiYingruiLiZesongLiJianWangJunWang0000-0002-1422-3331KuiWuXunXuHuanmingYang0000-0002-0858-3410XiuqingZhangGenomic2100037_Single_cell_bladder_cancer.jpgPrimary muscle-invasive bladder tumorCC0Shenzhen Key Laboratory of Genitourinary Tumor, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen 518035, ChinaZesong Li164282499072ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100037/10.1186/2047-217X-1-1223587365http://www.ncbi.nlm.nih.gov/sra?term=SRA051489XiaoshenGuoBGIguoxs@genomics.cnguoxs@genomics.cnResequencing data from 40 varieties of wild and domesticated silkworms. Here we present whole-genome resequencing data of 40 domesticated and wild silkworms (<em>Bombyx</em>). The domesticated silkworm (<em>Bombyx mori</em>) is of great economic interest and has been domesticated for more the 5,000 years. An organism with a mid-range genome size (~432 Mb), it often serves as a model insect for the order Lepidoptera. A number of wild varieties of silkworms exist as well, including the Chinese wild silkworm (<em>Bombyx mandarina</em>) from which the domesticated silkworm originated.
Each of the silkworm varieties was sequenced to ~3X coverage, representing 99.88% of the genome. These sequences were then used to create a single-base pair resolution genetic variation map of the silkworm. SNP sets were obtained separately for the pool of 29 domesticated strains and the pool of 11 wild varieties. The number of SNPs in the domestic versus wild varieties was approximately 14 million and 13 million, respectively. In addition to SNPs, approximately 0.33 million small insertion-deletions (indels) and 35 thousand structural variants (SVs) were identified among the 40 varieties. Over three-fourths of the SVs overlapped with transposable elements.
A total of 1,041 candidate regions Genomic Regions of Selective Signals (GROSS) were identified. These regions cover 12.5 Mb (2.9%) of the genome and may reflect genomic footprints left by artificial selection during domestication, as they include 354 protein-coding genes that were identified as good candidates for domestication genes.
We observed that 159 genes from GROSS were expressed in on different B. mori tissues on day 3 of the fifth larval instar as a reference strain, and were enriched in tissues of silk gland, midgut, and testis. The genes expressed in silk gland are involved in the synthesis of silk proteins, including fibroin and sericin. Midgut-enriched genes are related to the metabolism of carbohydrates, amino acids and lipids. And genes enriched in the testis are annotated as having binding, catalytic, and motor activity related to reproduction.
The reference genome for this project was the Japanese wild silkworm (NCBI Accession Number <a href="http://www.ncbi.nlm.nih.gov/nuccore/NC_003395">NC_003395</a>).ZeZhangQingyouXiaFangyinDaiDaojunChengTingcaiChengTaoJiangCelineBecquetChunLiuXingfuZhaYingLinYihongShenLanJiangJeffreyJensenSiTangPingZhaoHanfuXuNingjiaHeZhouheDuGuoqingPanAichunZhaoHaojingShaoWeiZengPingWuChunfengLiMinhuiPanJingjingLiXuyangYinChengLuZeyangZhouZhonghuaiXiangJianjunCaoWeiFanYiranGuoInesHellmannDaweiLiDongLiJunLiRuiqiangLiSonggangLiHuiLiuShipingLiuYingruiLiZhuoLiRasmusNielsen0000-0003-0513-6591JianWangJuanWangJunWang0000-0002-1422-3331WenWangZhaolingXuanXunXuHuanmingYang0000-0002-0858-3410ChenYeChangYuGuojieZhangXiuqingZhangJingZhaoHuisongZhengYanZhouGenomic2Epigenomic1100024_Bombyx_mori.jpgDomestic and wild silkwormsGNU Free Documentation License, CC SAWikimedia CommonsGerd A.T. Müller25769803776ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100024/http://silkworm.genomics.org.cn/silkdb/doc/download.html10.1126/science.117662019713493http://www.ncbi.nlm.nih.gov/sra?term=SRP001012ZengliYanBGIyanzengli@genomics.cnyanzengli@genomics.cnThe genome of Schistosoma haematobium <em>Schistosoma haematobium</em> is an important digenetic trematode, and is found in the Middle East, India, Portugal and Africa. It is a major agent of schistosomiasis. More specifically, it is associated with urinary schistosomiasis. Adults are found in the Venous plexuses around the urinary bladder and the released eggs traverse the wall of the bladder causing haematuria and fibrosis of the bladder. The bladder becomes calcified, and there is increased pressure on ureters and kidneys otherwise known as hydronephrosis. Inflammation of the genitals due to <em>S.haematobium</em> may contribute to the propagation of HIV.
In this study, we sequenced the <em>S.haematobium</em> genome from 200 ng of genomic DNA template isolated from a single, mated pair of adult worms, and produced 33.5 Gb of usable sequence data, using Illumina-based technology at 74-fold coverage and comparedit to sequences from related parasites. We consistently showed low sequence heterozygosity and estimated the genome size to be 431-452 Mb, then assembled the data and used local assemblies to close most (96.1%) of the remaining gaps, achieving a final assembly of 385 Mb (365 contigs; N50 scaffold size 307 Kb). Also we included genome annotation based on function, gene ontology, networking and pathway mapping. This genome now provides an unprecedented resource for many fundamental research areas and shows great promise for the design of new disease interventions.
ZengliYanFangyuanChenXiaodongFangYiKangBoLiShipingLiuYingruiLiJianWangJunWang0000-0002-1422-3331XuanWuZijunXiongXunXuHuanmingYang0000-0002-0858-3410LinfengYangGuojieZhangGenomic2100032_Schistosoma_haematobium.jpgSchistosoma haematobiumPublic Domain, US Government (CDC) - they do say they want to be notified as well as creditedCenters for Disease Control and Prevention's Public Health Image Library (PHIL), with identification number #35; Wikimedia CommonsCDC/ Dr. Edwin P. Ewing, Jr.36507222016ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100032/http://schistodb.net/schisto/10.1038/ng.106522246508http://www.ncbi.nlm.nih.gov/sra?term=SRP011039http://www.ncbi.nlm.nih.gov/bioproject/PRJNA78265XiaoshenGuoBGIguoxs@genomics.cnguoxs@genomics.cnGenomic data from an extinct Palaeo-Eskimo. Available here is the genome of a male individual from an extinct Palaeo-Eskimo culture, the first known group of <em>Homo sapiens</em> to settle in Greenland.
The DNA sample was obtained from ~4,000-year-old permafrost-preserved hair, and was shown to have very low modern DNA contamination. The diploid genome was sequenced to an average depth of 20x using Illumina GAII sequencing platforms, with 79% recovery. Correct indexed reads were mapped to the human genome (hg18) with a suffix array-based method that allows for residual primer trimming. Sequencing yielded a total of 3.5 billion reads.JakobSkouPedersenMarceloBertalanPaulaFCamposHanneMunkholmKampAndrewSWilsonAndrewGledhillElineDLorenzenJonasBinladenMadsBakNielsTommerupChristianBendixenTraceyLPierreBjarneGronnowMortenMeldgaardClausAndreasenSardanaAFedorovaLudmilaPOsipovaThomasFGHighamChristopherBronkRamseyThomasVOHansenFinnCNielsenMichaelHCrawfordAndersAlbrechtsen0000-0001-7306-031XSørenBrunakMichaelBunceMinfengChenM.ThomasPGilbert0000-0002-5805-7195XiaosenGuoRamneekGuptaToomasKivisildKarstenKristiansen0000-0002-6024-0917AndersKrogh0000-0002-5147-6282StinusLindgreenYingruiLiZhuoLiEneMetspaluMaitMetspaluIdaMoltkeKasperNielsen0000-0002-5510-7767RasmusNielsen0000-0003-0513-6591LudovicOrlandoMaanasaRaghavanMortenRasmussenThomasSicheritz-Ponten0000-0001-6615-1141SilvanaTridicoRichardVillemsJunWang0000-0002-1422-3331YongWangEskeWillerslevHaoZhangXiuqingZhangJingZhaoGenomic2100026_Saqqaq.jpgSaqqaq eskimoGNU Free Documentation License, CC SAself madeMichael Haferkamp251255586816ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100026/10.1038/nature0883520148029http://www.ncbi.nlm.nih.gov/sra?term=SRP001453http://www.ncbi.nlm.nih.gov/bioproject/PRJNA46213BoLiBGIlibo@genomics.cnlibo@genomics.cnGenomic data from the roundworm Ascaris suum. Available here is the draft genome for <em>Ascaris suum</em>, a roundworm species that infects pigs. It is a valuable resource for therapy and diagnostic test development for both <em>Ascaris suum</em> infection of pigs, and also human infection by the closely related species <em>A. lumbricoides</em>.
The <em>A. suum</em> genome was sequenced at ~80-fold coverage to generate an approximately 273 million base genome sequence encoding 18,542 protein-coding genes. Compared to other metazoan genomes, it has low repeat content (4.4%). Compared with other parasitic free-living roundworms and <em>C. elegan</em>, <em>A. suum</em> shares the highest homology with <em>Brugia malayi</em>, another animal parasite. The <em>A. suum</em> secretome (consisting of around 750 molecules) is rich in peptidases linked to the penetration and degradation of host tissue, and an assemblage of molecules likely to modulate or evade host immune responses. This genome provides a comprehensive resource to the scientific community and a foundation for developing new and urgently needed therapeutic intervention drugs, vaccines and diagnostic tests) against ascariasis and other nematodiases.AaronRJexNeilDYoungRossSHallNaZengGarryAAndersonToddWHarrisBronwynECampbellJohnnyVlaminckCinziaCantacessiErichMSchwarzShobaRanganathanPeterGeldhofPeterNejsumPaulWSternbergRobinBGasserFangyuanChenXiaodongFangYiKangBoLiShipingLiuYingruiLiJianWangJunWang0000-0002-1422-3331TaoWangXuanWuZijunXiongXunXuHuanmingYang0000-0002-0858-3410LinfengYangGuojieZhangGenomic2100017_Ascaris.jpgRoundwormGNU Free Documentation License, CC SAhttp://www.3dham.com/microgallery/John Alan Elson1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100017/10.1038/nature1055322031327http://www.ncbi.nlm.nih.gov/sra?term=SRP010159GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenomic data from the Chinese Rhesus macaque (Macaca mulatta lasiota). The Chinese rhesus macaque (<em>Macaca mulatta lasiota</em>) is a subspecies of rhesus macaques that mainly resides in western and central China. Due to their anatomical and physiological similarity with human beings, macaques are a common laboratory model. Also, as several macaques species have been sequenced, such as the Indian rhesus macaque and the crab-eating macaque, examination of the Chinese rhesus macaque (CR) genome offers interesting insights into the entire <em>Macaca</em> genus.
The DNA sample for data sequencing and analyses was obtained from a five-year old female CR from southwestern China. The genome was sequenced on the IlluminaGAIIx platform, from which 142-Gb of high-quality sequence, representing 47-fold genome coverage for CR. The total size of the assembled CR genome was about 2.84 Gb, providing 47-fold on average. Scaffolds were assigned to the chromosomes according to the synteny displayed with the Indian rhesus macaque and human genome sequences. About 97% of the CR scaffolds could be placed onto chromosomes.NaAn0000-0001-6463-7105YinqiBaiEdwardVBallJiesiChenRonghuaChenDavidNCooperHongliDuXiaodongFangWeiFanQuanfeiHuangYingHuang0000-0002-4364-9323ZhiyongHuangHaofuHuMichaelGKatzeLiangLeBoLiCaiLiJianwenLiFeiLingQiyeLiXiaomingLiuYanLiYingruiLiYuhuanMengRasmusNielsen0000-0003-0513-6591PeterDStensonBingSuJohnRThompsonAlainJvan GoolJianWangJufangWangJunWang0000-0002-1422-3331TaoWangWenWangXiaoningWangZhiwenWangLiqiongWeiHuanmingYang0000-0002-0858-3410GuangmeiYanGuojieZhangPeiZhangXiuqingZhangYanfengZhangYongZhangMinZhuoGenomic2100002_Macaca_mulatta.jpgChinese Rhesus macaqueCC-BYFlickr: EOL ImagesGeoff Gallice5368709120ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100002/2012-04-27http://macaque.genomics.org.cn/Genome 10Khttp://www.genome10k.org/10.1038/nbt.199222002653http://www.ncbi.nlm.nih.gov/nuccore/?term=AEHK00000000http://www.ncbi.nlm.nih.gov/sra?term=SRP003590http://www.ncbi.nlm.nih.gov/bioproject/PRJNA51409WenbinChenBGIchenwenbin@genomics.cnchenwenbin@genomics.cnGenomic data from the pigeonpea (Cajanus cajan). Here we present the genome of the pigeonpea (<em>Cajanus cajan</em>), a widely farmed diploid legume species. It is an important reference genome for food crop development as many crop species, such as soybean (<em>Glycine max</em>), chickpea (<em>Cicer arietinum</em>), lentil (<em>Lens culinaris</em>), and alfalfa (<em>Medicago sativa</em>), are legumes. The genetic improvement of pigeonpea has ramifications for food protection as well, as it is cultivated primarily in small-scale holdings in semi-arid tropical regions of the developing world.
The 237.2 Gb of sequence were generated using the Illumina next-generation sequencing platform to generate. Scaffolds representing 72.7% (605.78 Mb) of the 833.07-Mb pigeonpea genome were assembled using the Illumina sequence and Sanger-based bacterial artificial chromosome end sequences and a genetic map. A few segmental duplication events, but no recent genome-wide duplication events, are observable.JessicaASchlueterMarkTADonoghueAdamMWhaleyJaimeSheridanReetuTutejaWeiWuShiaw-PyngYangTrusharShahKBSaxenaToddMichaelW.RichardMcCombieBichengYangCharlesSpillaneGregoryDMaySarwarAzamArvindKBhartiWenbinChenDouglasRCookGuangyiFanAndrewDFarmerAikoIwataScottAJacksonR.VarmaPenmetsaRachitKSaxenaHariDUpadhyayaRajeevKVarshneyJunWang0000-0002-1422-3331XunXuHuanmingYang0000-0002-0858-3410GengyunZhangYupengLiGenomic2100028_Cajanus_cajan.jpgPigeonpeaCC BYBioLib.czForest & Kim Starr20401094656ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100028/10.1038/nbt.202222057054GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenomic data from the polar bear (Ursus maritimus). The polar bear (<em>Ursus maritimus</em>) is one of the largest land carnivores, second only to the Alaskan brown bear. In an effort to adapt to the extremely cold Arctic environment, it has evolved many unique characteristics. However, ecological pressures pose a grave threat to the survival of polar bears. The polar bear genome provides significant contributions to research concerning evolution, biodiversity and climate change.
In 2010, the BGI completed the first draft of the genome sequence of a 25 years old male polar bear. Using next-generation sequencing technology (Illumina GA) to obtain about 101-fold genome coverage, and SOAPdenovo, the self-developed short reads assembly method, a high quality draft genome sequence was assembled with an N50 scaffold size of 15.9 megabases (Mb), and function elements annotation was finished. A reference gene set that contained around 21,000 genes for the polar bear was predicted. The transposable elements comprised approximately 37% of the polar bear genome.<br /> The data also includes genome and SNP annotations, with SNP information from 18 polar bear and 10 brown bear individuals sampled from Greenland and Alaska.EskeWillersleveJWangBoLiJunWang0000-0002-1422-3331GuojieZhangGenomic2100008_Ursus_maritimus.jpgPolar bearCC BY-SAWikimedia CommonsAlan D. Wilson33285996544ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100008/2014-04-22Genome 10Khttp://www.genome10k.org/10.1016/j.cell.2014.03.054http://www.ebi.ac.uk/ena/data/view/PRJNA210951http://www.ebi.ac.uk/ena/data/view/SRA092289http://www.ncbi.nlm.nih.gov/sra?term=SRA092289GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenomic data from the domestic pigeon (Columba livia). The domestic pigeon (<em>Columba livia domestica</em>) is one of the most common birds on planet Earth, located on every continent besides Antarctica. The sub-species sequenced was a breed known as the Danish Tumbler, a show pigeon with a distinct color markings.
The domestic pigeon genome sequence provides a better understanding of such a widespread creature, including certain mechanisms that scientists still fail to understand fully, such as the magnetosensitivity. The sequencing data also presents insight into the species’ similarities to and differences from other birds, and to how breeding might have shaped its genome as this sub-species was taken from Asian colonies to Denmark 400 years ago and selectively bred.
In 2010, BGI used the whole genome shotgun sequencing and IlluminaHiseq 2000 system to generate 98X short reads for a Danish Tumbler. The raw data was then used by the assembler SOAPdenovo to produce a draft assembly of 1.1 Gb with N50 scaffold length of 3.1Mb and N50 contig length of 22.4 Kb. Based on the k-mer distribution of sequencing data, the genome size of <em>Columba livia</em> is estimated to be 1.3 Gb, suggesting the current assembly is about 84% complete. The percentage of GC content (41.5%) and the percentage of repetitive content (8.7%) in the pigeon are also similar in nature to three other avian genomes (chicken, zebra finch, turkey); the uncovered regions of the genome appear to be enriched in repeats. A total of 17,300 protein-coding genes are predicted in the assembly.M.ThomasPGilbert0000-0002-5805-7195CaiLiJunWang0000-0002-1422-3331GuojieZhangGenomic2100007_Columba_livia.jpgDomestic pigeonpublic domainThe actual Danish tumbler sequencedTom Gilbert33285996544ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100007/Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10.1126/science.12304222337155410.1186/2047-217X-3-2610.1126/science.125138510.1126/science.1253451http://www.ncbi.nlm.nih.gov/nuccore/?term=AKCR00000000http://www.ncbi.nlm.nih.gov/sra?term=SRA052637http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39333http://www.ncbi.nlm.nih.gov/sra?term=SRA054391http://www.ncbi.nlm.nih.gov/bioproject/PRJNA167554GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenomic data from the giant panda (Ailuropoda melanoleuca). The giant panda (<em>Ailuropoda melanoleuca</em>) is considered a symbol of China and is a much loved animal all around the world. It is also one of the world’s most endangered species, making it a flagship species for conservation efforts. As the first fully sequenced Ursidae and the second fully sequenced carnivore after the dog, the whole genome sequence and annotation data provide an unparalleled amount of information to aid in understanding the genetic and biological underpinnings of this unique species, and will help contribute to disease control and conservation efforts.
In 2008, BGI completed a first draft of the genome sequence of a three-year old female giant panda named Jingjing, who was used as a model for the 2008 Olympics in Beijing, China (<a href="http://dx.doi.org/10.1038/nature08696">doi: 10.1038/nature08696</a>). Using second-generation Illumina GA sequencing data, the first de novo genome assembly was created using short-read sequencing technology. Here you will find the giant panda genome sequence assembly as well as annotation information, such as gene structure and function, non-coding RNAs, and repeat elements. Also presented are polymorphism information detected in the diploid genome, including SNPs, indels, and structural variations (SVs). The assembly was done using SOAPdenovo software and the panda genome data is visualized via MapView, which is powered by the Google Web Toolkit.LinHeJingCaiZhiheZhangFuwenWeiZhaoleiZhangWanjunGuOliverARyderFrederickChi-ChingLeungXiaoSunYongguiFuRongHouFujunShenBoMuRunmaoLinGuodongWangQiWuDongDongKathleenCookGaoShanCarolinKosiolXueyingXieZuhongLuCynthiaCSteinerTommyTsan-YukLamSiyuanLinJingTianTimingGongHongdeLiuDejinZhangWenboHuAnlongXuYangZhengYongyongShiZhiqiangLiQingLiuYanlingChenNingQuFengTianXiaolingWangHaiyinWangLizhiXuTomasVinarYajunWangHeminZhangDeshengLiYanHuangXiaWangZhiJiangMaynardOlsonNaAn0000-0001-6463-7105YinqiBaiLarsBolund0000-0003-4165-1531MichaelWBrufordQingleCaiJianjunCaoShifengChengLinFangXiaodongFangWeiFanXiaosenGuoYiranGuoQuanfeiHuangYujieHuMinJianKarstenKristiansen0000-0002-6024-0917Tak-WahLamHuiqingLiangBoLiDaweiLiGuoqingLiHengLiJianwenLiJingxiangLiJunLiLiLiQibinLiRuiqiangLiSonggangLiBinghangLiuShipingLiuXiaoLiuYingruiLiLijiaMaJiumengMinRasmusNielsen0000-0003-0513-6591WenhuiNiePeixiangNiWubinQianNanQinXiaoliRenYuanyuanRenJueRuanZhongbinShiGengTianBoWangJianWangJinhuanWangJunWang0000-0002-1422-3331JunyiWangMingweiWangWenWangMingWenGaneKa-ShuWong0000-0001-6108-5560ZhigangWuZhaolingXuanGuohuaYangHuanmingYang0000-0002-0858-3410ZhentaoYangChenYeSiu-MingYiuChangYuGuojieZhangHaoZhangJuanbinZhangQinghuiZhangXiuqingZhangYapingZhangJingZhaoShancenZhaoHanchengZhengHuisongZhengYanZhouHongmeiZhuGenomic2100004_Ailuropoda_melanoleuca.jpgGiant pandaCC BY-SAWikimedia CommonsShizhao165356240896ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100004/http://panda.genomics.org.cn/Genome 10Khttp://www.genome10k.org/10.1038/nature0869620010809http://www.ncbi.nlm.nih.gov/sra?term=SRP000962http://www.ncbi.nlm.nih.gov/nuccore/?term=ACTA00000000http://www.ncbi.nlm.nih.gov/bioproject/PRJNA38683GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenome sequence of the naked mole rat (Heterocephalus glaber). Here is the genome of the naked mole rat (<em>Heterocephalus glaber</em>), one of the only two known eusocial mammals. It is a fascinating species, due not only to its unique behavior but also to its unique physiology. Among its unusual characteristics, it has the longest lifespan of all rodent species, is a poikilotherm, is highly resistant to cancer, and does not sense certain types of pain.
The genome of an individual male naked mole rat was sequenced on the Illumina HiSeq 2000 platform and assembled using SOAPdenovo. In assembly 2.5 Gb (gigabase pairs) contig sequences with N50 19.3 kb (kilobase pairs) and N90 4.7 kb, and 2.7 Gb scaffold sequences with N50 1.6 Mb (megabase pairs) and N90 0.3 Mb was obtained.EunBaeKimAlexeyAFushanAlexeiVLobanovLijuanHanStefanoMMarinoMarinaVKasaikinaNinaStoletzkiPazPolakAdamKiezunGregoryVKryukovQiangZhangLeonidPeshkinRochelleBuffensteinLiChenThomasJParkVadimNGladyshevRoderickTBronsonYuanxinChenXiaodongFangChangleiHanZhiyongHuangQiyeLiChunfangPengXiaoqingSunShamilRSunyaevAntonATuranovBoWangJunWang0000-0002-1422-3331ZhiqiangXiongLanYangPengchengYangSunHeeYimGuojieZhangWeiZhaoXiangZhaoYabingZhuGenomic2100022_Heterocephalus_glaber.jpgNaked mole ratI, the copyright holder of this work, release this work into the public domain. I grant anyone the right to use this work for any purpose, without any conditions, unless such conditions are required by law.Own workLtshears - Trisha M Shears81604378624ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100022/2012-04-27http://mr.genomics.org.cn10.1038/nature1053321993625http://www.ncbi.nlm.nih.gov/nuccore/?term=AFSB00000000http://www.ncbi.nlm.nih.gov/sra?term=SRP005951http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30337http://www.ncbi.nlm.nih.gov/bioproject/PRJNA68323GuojieZhangBGIzhanggj@genomics.org.cnzhanggj@genomics.org.cnGenome data from the leaf-cutting ant (Acromyrmex echinatior). Here is presented high-quality (>100x depth) Illumina genome sequence of the leaf-cutting ant <em>Acromyrmex echinatior</em>, a model species for symbiosis and reproductive conflict studies. They make a particularly good model as after humans, leaf-cutting ants form the largest and most complex animal societies on Earth. Part of the subfamily Myrmicinae, this particular species is found in the wild from Mexico to Panama and subsists mostly on a particular fungus of the genus <em>Leucocoprinus</em>, which it cultivates on a medium of masticated leaf tissue. With an estimated total size of 313 Mb, the genome is 28% made up of repeat sequences and has a GC content of 34%.SanneNygaardMortenSchiøtt, YannickWurmJiajianZhouLuJiFengQiuHailinPanFrankHauserCornelisJPGrimmelikhuijzenJacobusJBoomsmaHaofuHuAndersKrogh0000-0002-5147-6282CaiLiMortenRasmussenJunWang0000-0002-1422-3331GuojieZhangGenomic2100011_Acromyrmex_echinatior.jpgLeaf-cutting antthe content of all PLoS journals is published under the Creative Commons Attribution 2.5 licenseRestoring Nature's Backbone. Nicholls H, PLoS Biology Vol. 4/6/2006, e202 http://dx.doi.org/10.1371/journal.pbio.0040202Scott Bauer, US Department of Agriculture36507222016ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100011/10021110.1101/gr.121392.11121719571http://www.ebi.ac.uk/ena/data/view/ERP000666GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenome data from Jerdon’s jumping ant (Harpegnathos saltator). Presented here is the sequenced genome of Jerdon’s jumping ant (<em>Harpegnathos saltator</em>). The jumping ant has a distinct caste and social behavior system, and its genome offers interesting insights into epigenetics in aging and behavior.
The Illumina Genome Analyzer platform was used to sequence a genomic library, obtaining more than 100-fold coverage of the estimated 330 Mb genome. The draft genomic assembly reached a scaffold N50 size of ~600 kb and covers more than 90% of the ant’s genome.ShelleyLBergerRobertoBonasioGregDonahueXiaodongFangZhiyongHuangCaiLiJürgenLiebigQiyeLiNavdeepSMuttiNanQinDannyReinbergJunWang0000-0002-1422-3331PengchengYangChaoyangYeGuojieZhangPeiZhangGenomic2100019_Harpegnathos_saltator.jpgJerdon's jumping antThis work has been released into the public domain by its author: Example at the Kalyanvarma project grants anyone the right to use this work for any purpose, without any conditions, unless such conditions are required by law.http://kalyanvarma.net/photography. Originally uploaded to en.wikipedia. Transferred to Wikimedia Commons by User:Sarefo.Kalyan Varma, Example at the Kalyanvarma project100931731456ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100019/10.1126/science.119242820798317http://www.ncbi.nlm.nih.gov/nuccore/?term=AEAC00000000http://www.ncbi.nlm.nih.gov/sra?term=SRP002786http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22680http://www.ncbi.nlm.nih.gov/bioproject/PRJNA50203QibinLiBGIliqb@genomics.org.cnliqb@genomics.org.cnGenomic data from chronic hepatitis B infected humans and healthy controls. Chronic hepatitis B (CHB) infection remains endemic in large parts of the world and, as such, is a major global health issue. However, a thorough understanding of the genetic variants involved in CHB infection susceptibility remains lacking.
This dataset comprises the raw exome sequencing data, SNP sets and InDel sets for 50 CHB patients and 40 healthy controls. The exome sequences were captured by NimbleGen2.1M array targeting 34 Mb of the human genome, containing 180,000 coding exons and 551 miRNA genes. The enriched library was then sequenced on Illumina HighSeq2000 and sequencing reads were aligned to human reference genome (NCBI build 36.3). The average sequencing depth per sample is 43X after the removal of read duplicates. These data provide a resource for identifying genetic variants predisposing humans to CHB infections.WeijunHuangLiangPengQiangZhaoQLiYuanyuanPeiQijunLiaoZhi-LiangGaoYimingWangJunWang0000-0002-1422-3331Genomic2100029_Hepatitis-B.jpgHuman exome – chronic hepatitis B infection predisposing variantsPublic Domain, US Government (CDC)Centers for Disease Control and Prevention's Public Health Image Library(PHIL), with identification number #5631.CDC281320357888ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100029/2012-07-1210.1002/hep.2585022610944http://www.ncbi.nlm.nih.gov/sra?term=SRA048741GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenome data from the Florida carpenter ant (Camponotus floridanus). Here we present the sequenced genome of the Florida carpenter ant (<em>Camponotus floridanus</em>), a species with an organized caste society. As an eusocial species, its genome offers interesting insights into aging, epigenetics and animal behavior.
The Illumina Genome Analyzer platform was used to sequence a genomic library of the Florida carpenter ant, obtaining more than 100-fold coverage. The draft genomic assembly reached a scaffold N50 size of ~600 Kb and covers more than 90% of the approximately 240 Mb genome.ShelleyLBergerRobertoBonasioGregDonahueXiaodongFangZhiyongHuangCaiLiJürgenLiebigQiyeLiNavdeepSMuttiNanQinDannyReinbergJunWang0000-0002-1422-3331PengchengYangChaoyangYeGuojieZhangPeiZhangGenomic2100018_Camponotus.jpgFlorida carpenter antGNU Free Documentation License, CC SAWikimedia Commons, originally uploaded to de.wikipediaRobert Friebe7516192768ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100018/10.1126/science.119242820798317http://www.ncbi.nlm.nih.gov/nuccore/?term=AEAB00000000http://www.ncbi.nlm.nih.gov/sra?term=SRP002786http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22680http://www.ncbi.nlm.nih.gov/bioproject/PRJNA50201GuojieZhangBGIzhanggj@genomics.cnzhanggj@genomics.cnGenomic data from the Emperor penguin (Aptenodytes forsteri). The Emperor penguin (<em>Aptenodytes forsteri</em>) is a large penguin, standing over 1 meter tall, with distinctive black, yellow and white markings. Like most penguins, the emperor penguins are indigenous to Antarctica and exist between the 66th and 78th parallels.
Famous for its unique social and reproductive behavior, the emperor penguin also possesses a number of other notable evolutionary qualities: its stature, its feathers, its incubation process, and its swimming capabilities. The <em>Aptenodytes forsteri</em> genome offers new insights into this remarkable bird.JunWangGuojieZhangDavidMLambertGenomic2100005_Aptenodytes_forsteri.jpgEmperor penguinsPublic Domain, US GovernmentNOAA Photo LibraryMichael Van Woert, 199939728447488ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100005/Genome 10Khttp://www.genome10k.org/The Avian Phylogenomic Projecthttp://avian.genomics.cn/en/index.html 10.1186/2047-217X-3-2610.1186/2047-217X-3-2710.1126/science.125138510.1126/science.1253451SanwenHuangCAAShuangsanwen@caas.net.cnhuangsanwen@caas.net.cnGenomic data for the domestic cucumber (Cucumis sativus var. sativus L.). Here we present genomic data for the domestic cucumber (<em>Cucumis sativus var. sativus L.</em>). The cucumber is a member of the Cucurbitaceae or cucurbit family, a family of great agricultural and horticultural importance that also includes species such as melons, gourds and squashes. A biologically interesting as well as an economically relevant species, it is used as a model system for plant sex determination and vascular biology studies.
The domestic cucumber has seven pairs of chromosomes and a haploid genome of 367 Mb, a smaller genome for the Cucurbitaceae family. The genome was sequenced and assembled with N50 contig and scaffold sizes of 19.8 Kb and 1.14 Mb, respectively. Using the genetic map, 72.8% of the assembled sequences were anchored onto the 7 chromosomes. A total of 26,682 genes were predicted in the current cucumber genome.XingfangGuBingyanXieWeiweiJinJackStaubAndrzejKilianEdwinAGvan der VossenYangWuJieGuoZhiqiJiaBowenZhaoYonghuaHanXuefengLiShenhaoWangQiuxiangShiShiqiangLiuWonKyongChoJae-YeanKimKatarzynaHeller-UszynskaHanMiaoZhouchaoChengShengpingZhangYuhongYangHouxiangKangManLiHailongYangRuiChenShifangLiuBaoxiZhangShuzhiJiangAsanYinqiBaiLarsBolund0000-0003-4165-1531QingleCaiJianjunCaoYongchenDuLinFangXiaodongFangWeiFanZhangjunFeiJunHeQuanfeiHuangSanwenHuangMinJianKarstenKristiansen0000-0002-6024-0917HuiqingLiangBoLiGuangcunLiGuoqingLiJianwenLiJunLiLiLiKuiLinRuiqiangLiShaochuanLiSonggangLiDongyuanLiuHuiLiuYingLiYingruiLiZhuoLiWilliamJLucasYaoLuLijiaMaPeixiangNiWubinQianNanQinXiaoliRenYiRenYuanyuanRenJueRuanZhongbinShiRifeiSunGengTianJianWangJunWang0000-0002-1422-3331MingweiWangXiaowuWangMingWenJianWuZhigangWuZhaolingXuanYongXuGuohuaYangHuanmingYang0000-0002-0858-3410ShuangYangZhentaoYangGuojieZhangJuanbinZhangXiuqingZhangYongZhangZhonghuaZhangJingZhaoHanchengZhengHongkunZhengYanZhouHongmeiZhuGenomic2Transcriptomic4100025_Cucumis_sativus.jpgDomestic cucumberCC BYBioLib.czMiroslav Deml80530636800ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100025/http://cucumber.genomics.org.cn10.1159/0001513201893149010.1038/ng.4751988152710.1371/journal.pone.000579519495411http://www.ncbi.nlm.nih.gov/nuccore/?term=ACHR00000000http://www.ncbi.nlm.nih.gov/bioproject/PRJNA33619