BrianDalrympleCSIRO Agriculture, Australiabrian.dalrymple@uwa.edu.aubrian.dalrymple@uwa.edu.auSupporting data for "Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data" Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits, and identifying potential genome editing targets. QuanH.NguyenRossLTellamMarinaNaval-SanchezLaercioRPorto-NetoWilliamBarendseAntonioReverterBenjaminHayesJamesKijasBrianPDalrympleGenomic2Software6100390.pngBovine HPRS pipelinehttps://creativecommons.org/publicdomain/zero/1.0/UnknownQuan Nguyen & Brian P. Dalrymple1048576ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100390/https://bitbucket.csiro.au/users/ngu121/repos/hprs/browseDepartment of Primary Industrieshttp://dx.doi.org/10.13039/100008431Commonwealth Scientific and Industrial Research Organisationhttp://dx.doi.org/10.13039/501100000943keywordRegulatory genomicskeywordmammalian genomekeywordenhancerskeywordpromoterskeywordtranscription factorsTizianoFlatiCINECAt.flati@cineca.itt.flati@cineca.itLiGeA: a comprehensive database of human gene-fusion events LiGeA (cancer cell Lines Gene fusion portAl) provides an easy access to a comprehensive database designed for storing, displaying and annotating gene fusion events detected from NGS data.SilviaGioiosaMarcoBolisTizianoFlatiAnnalisaMassiniEnricoGarattiniGiovanniChillemiMaddalenaFratelliTizianaCastrignanòligea_logo1.pngLiGeA logohttps://creativecommons.org/licenses/by-sa/4.0/Created ad-hoc from scratchMaddalena Fratelli2147483648''XiuQiuGuangzhou Women and Children's Medical Center, Guangzhou Medical University, Chinaqxiu0161@163.comqxiu0161@163.comSupporting data for "Connections between human gut microbiome and gestational diabetes mellitus" The human gut microbiome can modulate metabolic health and affect insulin resistance, and may play an important role in the etiology of gestational diabetes mellitus (GDM). Here, we compared the gut microbial composition of 43 GDM patients and 81 healthy pregnant women via whole-metagenome shotgun sequencing of their fecal samples collected at 21-29 weeks, to explore associations between GDM and the composition of microbial taxonomic units and functional genes.
Metagenome-wide association study (MGWAS) identified 154,837 genes, which clustered into 129 metagenome linkage groups (MLGs) for species description, with significant relative abundance differences between the two cohorts. Parabacteroides distasonis, Klebsiella variicola, etc., were enriched in GDM patients, whereas Methanobrevibacter smithii, Alistipes spp., Bifidobacterium spp. and Eubacterium spp. were enriched in controls. The ratios of the gross abundances of GDM-enriched MLGs to control-enriched MLGs were positively correlated with blood glucose levels. Random Forest model shows fecal MLGs have excellent discriminatory power to predict GDM status.
Ya-ShuKuangJin-HuaLuSheng-HuiLiJun-HuaLiMing-YangYuanJian-RongHeNian-NianChenWan-QingXiaoSong-YingShenLanQiuYing-FangWuCui-YueHuYan-YanWuWei-DongLiQiao-ZhuChenHong-WenDengChristopherJPapasianHui-MinXiaXiuQiuMetagenomic3100326.jpgBorn in Guangzhou Cohort Study logoCC0Born in Guangzhou Cohort StudyBorn in Guangzhou Cohort Study1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100326/10.1093/gigascience/gix058http://www.ncbi.nlm.nih.gov/sra?term=ERP020710National Natural Science Foundation of Chinahttp://dx.doi.org/10.13039/50110000180981673181Guangzhou Science and Technology Bureauunknown201508030037Shenzhen Municipal Government of ChinaunknownJSGG20160229172752028Shenzhen Municipal Government of ChinaunknownCXB201108250098AShenzhen Municipal Government of ChinaunknownJSGG20140702161403250keywordGut microbiomekeywordgestational diabetes mellituskeywordmetagenome-wide association.MarekFiglerowiczInstitute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Polandmarekf@ibch.poznan.plmarekf@ibch.poznan.plSupporting data for "Comprehensive analysis of microorganisms accompanying human archaeological remains" Metagenome analyses has become a common source of information about microbial communities that occupy a wide range of niches, including archaeological specimens. It has been shown that vast majority of DNA extracted from ancient samples come from bacteria (presumably modern contaminants). However, characterization of microbial DNA accompanying human remains has never been done systematically for a wide range of different samples.
<br />We used metagenomic approaches to perform comparative analyses of microorganism communities present in 161 archaeological human remains. DNA samples were isolated from the teeth of human skeletons dated from 100 AD to 1,200 AD. The skeletons were collected from seven archaeological sites in Central Europe and stored under different conditions. The majority of identified microbes were ubiquitous environmental bacteria that most likely contaminated the host remains not long ago. We observed that the composition of microbial communities was sample-specific and not correlated with its temporal or geographical origin. Additionally, traces of bacteria and archaea typical for human oral/gut flora as well as potential pathogens were identified in two-thirds of the samples. The genetic material of human-related species, in contrast to the environmental species that accounted for the majority of identified bacteria, displayed DNA damage patterns comparable with endogenous human aDNA, which suggested that these microbes might have accompanied the individual before death.
<br />
Our study showed that the microbiome observed in an individual sample is not reliant on the method or duration of sample storage. Moreover, shallow sequencing of DNA extracted from ancient specimens and subsequent bioinformatics analysis allowed both the identification of ancient microbial species, including potential pathogens, and their differentiation from contemporary species that colonized human remains more recently.<br />AnnaPhilipsIreneuszStolarekBognaKuczkowskaAnnaJurasLuizaHandschuhJanuszPiontekPiotrKozlowskiMarekFiglerowiczMetagenomic3100310.jpgComprehensive analysis of microorganisms accompanying human archaeological remainsCC0Philips et al 2017Philips et al 20171073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100310/10.1093/gigascience/gix044http://www.ncbi.nlm.nih.gov/bioproject/PRJNA354503Narodowe Centrum Naukihttp://dx.doi.org/10.13039/5011000042812014/12/W/NZ2/00466keywordMicrobiomekeywordancient DNAkeywordNGSkeywordmetagenomicsShangxianGaoBGI-Shenzhen, Shenzhen 518083, P. R. Chinagaoshangxian@126.comgaoshangxian@126.comAn updated reference human genome dataset of the BGISEQ-500 sequencer The BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoballs (DNB) and combinational probe-anchor synthesis (cPAS) developed from Complete Genomics™ sequencing technology, it generates short reads at a large scale, which can help fulfill the growing demands for sequencing. Here, we present the first human whole genome sequencing dataset from the BGISEQ-500. The dataset was generated by sequencing the widely-used Genome in a Bottle Consortium cell line, HG001 (NA12878). We have previously released the paired end 50bp (PE50) sequences (DOI:10.5524/100252) and here we present the PE100 reads from same sample, together with the assembled genome. We also included examples of the raw images from the sequencer for reference. Finally, we carried out variation calling based on the dataset and compared that to similar amounts of publicly available HiSeq2500 data and the previously identified high confident variations in this previously sequenced genome.JieHuangXinmingLiangYuankaiXuanChunyuGengYuxiangLiHaorongLuShoufangQuXianglinMeiHongboChenTingYuNanSunHuiJiangXinLiuZhaopengYangFengMuShangxianGaoGenomic2no_image.jpgBGISEQ500Public domainBGIBGI97710505984ftp://penguin.genomics.cn/pub/10.5524/100001_101000/10027410025210.1093/gigascience/gix024http://www.ncbi.nlm.nih.gov/bioproject/PRJEB15427HaoLiInstitute of Disease Control and Preventionlihao88663239@126.comlihao88663239@126.comWhole genome sequencing data of human hepatocellular carcinoma cell line HepG2 HepG2 cell line is a hepatoma cell line collected from a 15-year-old American male who had been diagnosed with liver cancer. In total, there are 20,865 articles published over the years. Its application as a model system covers molecular mechanism of liver1, studies on drug targeting2, impact of exogenous substances3, research for genetic diversity4,5 and etc. However, most genomic studies in HepG2 cell line have used the Homo sapiens’ reference genome CRGh37 or CRGh38, which is inaccuracy in some specific situations. Here, we represent 55× coverage whole genome sequencing data of HepG2 cell line to help fill in gaps. The whole genome sequencing generated in this paper would enhance validity of doing cytological experiments with this cell line such as siRNA and CRISPR-Cas9 targets design. Also, these properties will provide scientists some clues in understanding the relationship between genetic variation and disease.ChunyuShengHaoLino_image.jpgno image iconPublic domainGigaDBGigaDB1342177280''AnthonyPapanfussThe Walter and Eliza Hall Institute of Medical Researchpapenfuss@wehi.edu.aupapenfuss@wehi.edu.auThe data for: Genomic resources and draft reference assemblies of the human and porcine scabies mites, Sarcoptes scabiei var. hominis and var. suis The scabies mite, <em>Sarcoptes scabiei</em>, is a parasitic arachnid and cause of the infectious skin disease scabies in humans or mange in many animal species. We sequenced the genome of two samples of <em>S. scabiei</em> var. <em>hominis</em> obtained from unrelated patients with crusted scabies located in different parts of northern Australia using the Illumina HiSeq. We also sequenced samples of <em>S. scabiei</em> var. <em>suis</em> from a pig model. Due to the small size of scabies mite, these data are derived from pools of thousands of mites and are metagenomic, including host and microbiome DNA. We performed cleaning and de novo assembly and present <em>Sarcoptes scabiei</em> var. <em>hominis</em> and var. <em>suis</em> draft reference genomes. We have constructed a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins.EhteshamMofizDeborahHoltTorstenSeemannBartJCurrieKatjaFischerAnthonyTPapenfussGenomic2100198.jpgMale and female sarcoptes scabieiPublic DomainBy Unknown - Popular Science Monthly Volume 14wikimedia524288000ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100198/https://www.protocols.io/widgets/protocol/Draft-genome-assembly-using-parasitic-mite-populat-exwbfpe10.1186/s13742-016-0129-2http://www.ncbi.nlm.nih.gov/bioproject/PRJEB12428http://www.ebi.ac.uk/ena/data/view/LN874268http://www.ebi.ac.uk/ena/data/view/LN874269http://www.ebi.ac.uk/ena/data/view/LN874270keywordscabies mite genomeDavideVerzottoComputational and Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, 138672 Singaporeverzottod@gis.astar.edu.sgverzottod@gis.astar.edu.sgSupporting single-molecule optical genome mapping data from human HapMap and colorectal cancer cell lines. Next generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35 to 300 bases remains to be a challenge. Single molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and are very suitable for the identification of large scale genome structural variations and for de novo genome assemblies when combined with short read NGS data. Here we present the optical mapping data of two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116. <br />
High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs followed by DNA extraction under mild conditions. We digested genomic DNA with KpnI and analyzed 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, per cell line using the Argus optical mapping system. We aligned the maps to the human reference by OPTIMA, a new glocal alignment method, and obtained 6.8x and 5.7x genome coverage, 2.9x and 1.7x more than the coverage obtained with previously available software. DavideVerzottoAudreySMTeoAxelMHillmerNiranjanNagarajanFeiYaoGenome-Mapping13no_image.jpgPartial optical map of GM12878 HapMap cell lineCC-BYOpGen Argus system imageAudrey Teo1073741824ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100182/10016510.1186/s13742-015-0106-1keywordoptical mappingkeywordgenomic mappingkeywordglocal alignmentkeywordoverlap
alignmentkeywordmap-to-sequence alignmentYongHouBGIhouyong@genomics.cnhouyong@genomics.cnSupporting data for "Full-length single cell RNA-seq applied to a viral human cancer: Applications to HPV expression and splicing analysis in HeLa S3 cells". Viral infection causes multiple forms of human cancer, and human papillomavirus (HPV) infection is the primary factor in cervical carcinomas. Single-cell RNA-seq studies highlight the tumor heterogeneity of most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line.
We developed a new high-throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells, 40 of which were randomly selected to perform single-cell RNA sequencing. On the basis of this data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in terms of gene expression, alternative splicing, and gene fusions. Furthermore, by co-expression analysis we can identify a high diversity of HPV-18 gene expression and splicing at the single-cell level.
In addition to providing a characterization of the transcriptome of HeLa S3 cells at the single-cell level, our study demonstrates the power of single-cell RNA-seq analysis of virally infected cells and cancers.
YongHouKarstenKristiansen0000-0002-6024-0917YingruiLiJianWangKuiWuXunXuHuanmingYang0000-0002-0858-3410BoLiXiuqingZhangMichaelDeanLiangWuXiaolongZhangZhikunZhaoLingWangGuiboLiQichaoYuYanhuiWangXinxinLinWeijianRaoZhanlongMeiYangLiRunzeJiangHuanYangFuqiangLiGuoyunXieLiqinXuJieZhangJianghaoChenTingWangTranscriptomic4Wu et al. 2015, Fig 1Figure 1, schematic diagram of MIRALCSCC0Wu et al. 2015, GigaScienceWu et al. 2015, GigaScience7053332ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100160/10.1186/s13742-015-0091-4http://www.ncbi.nlm.nih.gov/bioproject/PRJNA301652keywordsingle-cell transcriptomekeywordHeLakeywordHPVkeywordviruskeywordtumor heterogeneitykeywordcancerkeywordRNA splicingEdmundCrampinUniversity of Melbourne, Australiaedmund.crampin@unimelb.edu.auedmund.crampin@unimelb.edu.auSupporting data for "Spatially transformed fluorescence image data for ERK-MAPK and selected proteins within human epidermis". Human epidermis from three patients underwent single-target immunofluorescence labelling against: calmodulin, β1 integrin, β4 integrin, keratin-10, keratin-14, 14-3-3σ (stratifin), Raf-1, phospho-Raf-1 (pS338), MEK-1/2, phospho-MEK-1/2 (pS218/pS222), ERK-1/2, phospho-ERK-1/2 (pT183/pY185), JunB, c-Jun, c-Fos and Fra2.<br />
Confocal microscopy was used to collect image data over a tissue volume, at or near the diffraction limited imaging (x-y) resolution, providing a unique data set that examines signalling proteins in the context of a homeostatic human tissue. <br />
We have recently used bulk changes in the abundance of proteins and phospho-proteins (a subset of these data) for a model of in situ ERK-MAPK activity, driven by Ca2+ signalling inputs. There may be scope in extending such analyses to include a greater range of the data within a model of epidermal
keratinocyte differentiation. Furthermore, we believe that researchers who study intercellular heterogeneity of signalling components may find these data useful. <br />
Together with our sampled protein abundance data, we provide some preliminary cellular segmentations. On via our <a href="http://www.github.com/uomsystemsbiology/epidermal_data" target=blank>GitHub page</a>, we also provide a collection of MATLAB scripts and segmented data files that allow quantitative analysis of these image data. The intermediate processed files are provided in formats that should allow advanced users to develop their own analyses using open source tools such as python.<br />
JosephCursonsCatherineEAngelDanielGHurleyCristinGPrintP.RodDunbarMarcDJacobsEdmundJCrampinImaging7100168_multiscale_imaging_data.jpgFig2. Selected fluorescence image dataCC0Cursons et al.Cursons et al.6442450944ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100168/http://www.github.com/uomsystemsbiology/epidermal_data10.1186/s13742-015-0102-5Treasury, New Zealandhttp://dx.doi.org/10.13039/501100001544C08X0801Marc D JacobsAustralian Research Councilhttp://dx.doi.org/10.13039/501100000923CE140100036Edmund J CrampinTertiary Education Commissionhttp://dx.doi.org/10.13039/100007879Top Achiever Doctoral ScholarshipJoseph CursonskeywordMEK-1/2keywordcalmodulinkeywordskinkeywordinterfollicular keratinocyteskeywordimmunofluorescencekeywordconfocal microscopykeywordhomeostatic tissuekeywordcellular heterogeneityXiaominLiuBGIliuxiaomin@genomics.cnliuxiaomin@genomics.cnSupporting data for "Deep sequencing of human major histocompatibility complex region contributes to studies of complex disease". The human major histocompatibility complex (MHC) has been shown to be associated with numerous diseases. However, it remains a challenge to pinpoint causal variants of these associations due to the ex-treme complexity of the region. We thus sequenced the entire 5 Mb MHC region in 20,635 individuals of Han Chinese ethnicity (10,689 controls and 9,946 psoriasis patients) and constructed a Han-MHC da-tabase which included both variants and HLA gene typing results with high accuracy. We further identi-fied multiple independent novel susceptibility loci in HLA-C, HLA-B, HLA-DPB1, BTNL2 and an inter-genic variant, rs118178193 for psoriasis and confirmed the well-established susceptibility locus HLA-C*06:02. These discovered psoriasis-associated loci in MHC region were markedly different from those described in Caucasian population in a recent analysis and highlights the importance of generating population-specific MHC databases for studies of complex disease. We anticipate that our Han-MHC reference panel built by deep sequencing of a large number of samples will serve as a useful tool for investigating the role of the MHC region in a variety of diseases and thus advance our understanding of the pathogenesis of these disorders.FushengZhouHongzhiCaoXianboZuoTaoZhangWenjunWangXiaominLiuRicongXuGangChenYuanweiZhangXiaodongZhengXinJinJinpingGaoJunpuMeiYujunShengQibinLiBoLiangJuanShenChangbingShenHuiJiangCaihongZhuXingFanFengpingXuMinYueXianyongYinChenYeCuicuiZhangXiaoLiuLiangYuJinghuaWuMengyunChenXuehanZhuangLiliTangHaojingShaoLongmaoWuJianLiYuXuYijieZhangSuliZhaoYuWangGeLiHanshiXuLeiZengJiananWangMingzhouBaiYanlingChenWeiChenTianKangYanyanWuXunXuZhengweiZhuYongCuiZaixingWangChunjunYangPeiguangWangLeihongXiangXiangChenAnpingZhangXinghuaGaoFurenZhangJinhuaXuMinZhengJieZhengJianzhongZhangXueqingYuYingruiLiSenYangJianjunLiuLennartHammarstromLiangdanSunJunWangXuejunZhangGenomic2no_image.jpgHLA diversity in Han ChineseCC0Liu et al 2015Liu et al 20152147483648ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100156/10.1038/ng.3576HisayoshiYoshizakiDepartment of Pathology, Kanazawa Medical University, Japanyossy@kanazawa-med.ac.jpyossy@kanazawa-med.ac.jpSupporting data and materials for "Large-scale analysis of evolutionary histories of phosphorylation motifs in the human genome". Protein phosphorylation is a post-translational modification that is essential for a wide range of eukaryotic physiological processes, such as transcription, cytoskeletal regulation, cell metabolism, and signal transduction. In this study, we provide data for the assessments of functional phosphorylation signaling using comparative proteome analysis of phosphorylation motifs.
Data for 93,101 phosphosites and 1,003,756 potential phosphosites are described and provide an overview of evolutionary patterns of phosphomotif acquisition and indicate dependence on motif structures. By using these data, we described interaction networks of phosphoproteins, identified kinase substrates associated with phosphoproteins, and performed gene ontology enrichment analyses. HisayoshiYoshizakiShujiroOkudaProteomic10100136_yoshizaki.jpgGO enrichment analysis summaryPublic domainH YoshizakiH Yoshizaki417333248ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100136/http://moana.dnsalias.org/~sgeib/Bcur_RNAseq_Trinity/10.1186/s13742-015-0057-6HongzhiCaoBGIcaohongzhi@genomics.cncaohongzhi@genomics.cnSupporting material for: De novo assembly of a haplotype-resolved human genome. Here we provide the first de novo haplotype-resolved diploid genome sequence of an Asian individual using a unique de novo assembly pipeline. Our pipeline uses fosmid pooling and whole genome shotgun strategies, based on next generation sequencing (NGS) technology. The assembled genome contains 5.15 Gb, with a haplotype N50 of 484 kb. This haplotype-resolved genome represents the most complete genome assembly so far. Our analysis further identified previously undetected indels and novel coding sequences, and thus provides the most complete representation of an individual’s genetic variation. <br /> We generated ~614,850 fosmid clones ranging from 20 kb-80 kb with a mean of 36kb, approximately 30 fosmid clones were pooled and each pool had one or two DNA libraries sequenced using Hiseq 2000. In total, 1,712 Gb of raw sequence data was generated for all the pooled fosmid libraries. Please see the <a href="http://www.nature.com/naturebiotechnology." target="_blank" >linked paper</a> for assembly pipeline details. We then analysed the newly generated haploid-resolved diploid genome (HDG) for SNPs, INDELs, inversions and translocations, of which we identified 3,580,000 SNPs, 762,000 short INDELs (<50bp) and 30,000 long INDELs, 111 inversions and 168 translocations.HailongYangHonglongWuRuibangLuo0000-0001-9711-6533YinlongXieWeihuaHuangGuangzhuHeQiangFengYuhuiSunHaodongHuangShujiaHuangXinTongLarsBolund0000-0003-4165-1531HongzhiCaoLaurieGoodman0000-0001-9724-5976KarstenKristiansen0000-0002-6024-0917AndersKrogh0000-0002-5147-6282SonggangLiBinghangLiuYingruiLiQiongLuoJianWangJunWang0000-0002-1422-3331GaneKa-ShuWong0000-0001-6108-5560XunXuHuanmingYang0000-0002-0858-3410XiuqingZhangHanchengZhengLaurentCAMTellierJianLiBoLiYuWangFangYangPengSunSiyangLiuPengGaoJingSunDanChenZhengHuangYueLixiaoLiuRadojeDrmanacSnezanaADrmanacGenomic2100013_YH.jpgHan Chinese individualPublic domainWikimedia Commons, en:Image:HanGaozu.jpgUnknown8680428757ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100096/10003810009710031810.1038/nbt.3200HuijueJiaBGI Shenzhenjiahuijue@genomics.cnjiahuijue@genomics.cnSupporting data for the paper: "An integrated catalog of reference genes in the human gut microbiome". Here we sequenced 249 fecal samples from European adults, leading to a total of 760 samples in the Metagenome of the Human Intestinal Tract (<a href="http://dx.doi.org/10.1038/nature08821">MetaHIT</a>) project. All 6.4TB whole-genome shotgun sequencing data from 1267 fecal samples in <a href="http://dx.doi.org/10.1038/nature08821">MetaHIT</a>, the Human Microbiome Project (<a href="http://www.hmpdacc.org/HMASM/">HMP</a>) and our <a href="http://dx.doi.org/10.5524/100036">diabetes study on Chinese adults</a> were processed with the <a href="http://dx.doi.org/10.1371/journal.pone.0047656">MOCAT pipeline</a>. The resulting gene catalogs were merged using <a href="http://dx.doi.org/10.1093/bioinformatics/btl158">CD-HIT</a> and complemented with genes from 511 sequenced human gut-related prokaryotic genomes that were present in our gut metagenomes. The final high-quality integrated reference catalog of the human gut microbiome contains 9,879,896 non-redundant genes. The genes were phylogenetically annotated according to 3449 bacterial and archaeal genomes and draft genomes from NCBI, and functionally annotated using orthologous groups from the Kyoto Encyclopedia of Genes and Genomes (<a href="http://www.genome.jp/kegg/pathway.html">KEGG</a>) and the evolutionary genealogy of genes: Non-supervised Orthologous Groups (<a href="http://eggnog.embl.de/version_3.0/index.html">eggNOG</a>) databases. In addition, 11 samples from the Chinese cohort were re-extracted using the MetaHIT DNA extraction protocol and shotgun-sequenced to compare with the original data generated by a slightly different DNA extraction protocol.HuijueJia0000-0002-3592-126XXianghangCaiHuanziZhongQiangFengShinichiSunagawaManimozhiyanArumugam0000-0002-0886-9101JensRoatKultimaEdiPriftiTrineNielsenAgnieszkaSierakowskaJunckerChaysavanhManichanhBingChenFlorenceLevenezLiangXiaoHailongZhaoJumanaYousufAl-AamaSherifEdrisTorbenHansenHenrikBjornNielsenSorenBrunakFranciscoGuarnerOlufPedersenJoelDoréS.DuskoEhrlichMetaHIT consortiumPeerBorkWeinengChenKarstenKristiansen0000-0002-6024-0917SuishaLiangJunhuaLiJianWangJuanWangJunWang0000-0002-1422-3331XunXuHuanmingYang0000-0002-0858-3410DongyaZhangWenweiZhangZhaoxiZhangMetagenomic3100064_GutMicrobiota.jpgThe human gut is home for trillions of microbes.CC-BY-SA 3.0BGIJunda Peng, Jianqing Zhao and Lihua Xie133143986176ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100064/Integrated reference catalog of the human gut mircobiomehttp://meta.genomics.cn/metagene/meta/home20001210031710.1038/nbt.2942http://www.ebi.ac.uk/ena/data/view/ERP000108http://www.ebi.ac.uk/ena/data/view/ERP002061http://www.ebi.ac.uk/ena/data/view/ERP003612http://www.ebi.ac.uk/ena/data/view/ERP004605http://www.ebi.ac.uk/ena/data/view/SRP002163http://www.ebi.ac.uk/ena/data/view/SRP011011http://www.ebi.ac.uk/ena/data/view/SRP008047BoWenBGIwenbo@genomics.cnwenbo@genomics.cnQuantitative proteomics data using mTRAQ/MRM looking at human AKR family members in cancer cell lines Members of the human aldo-keto reductase（AKR）superfamily have been reported to be involved in cancer progression, and to investigate their role further a quantitative method to measure human AKR proteins in cells using mTRAQ-based multiple reaction monitoring (MRM) has been developed. AKR peptides with multiple transitions were carefully selected upon tryptic digestion of the recombinant AKR proteins, while AKR proteins were identified by SDS-PAGE fractionation coupled with LC MS/MS. Utilizing mTRAQ triplex labeling to produce the derivative peptides, calibration curves were generated using the mixed lysate as background, and no significantly different quantification of AKRs was elicited from the two sets of calibration curves under the mixed and single lysate as background. This approach was employed to quantitatively determine the 6 AKR proteins, AKR1A1, AKR1B1, AKR1B10, AKR1C1/C2, AKR1C3 and AKR1C4 in 7 different cancer cell lines, and for the first time to obtain the absolute quantities of all the AKR proteins in each cell. The cluster plot revealed that AKR1A and AKR1B were widely distributed in most cancer cells with relatively stable abundances, whereas AKR1Cs were unevenly detected among cells demonstrating diverse dynamic abundances. The AKR quantitative distribution in different cancer cells, and may enable further insight on the role of AKR proteins are involved in tumorigenesis.
LeiYangChaoChaShaoxingXuHaidanSunXiaominLouLiangLinSiqiLiuXuemeiQiuQuanhuiWangBoWenShenyanZhangYongZhangBaojinZhouJinZiProteomic10100047_MRM.pngMRMCC0PDBPDB55834574848ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100047/10.1021/pr301153z23544749https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/PASS_View?identifier=PASS00087YingruiLiBGIliyr@genomics.cnliyr@genomics.cnDNA methylome of human peripheral blood mononuclear cells from the YH Han Chinese individual. The methylome reported and analyzed here was generated from the same sample of peripheral blood mononuclear cells (PBMCs) from a consented donor (Homo sapiens) whose genome was deciphered in the <a href="http://yh.genomics.org.cn/">YH project</a>. YH is an anonymous male Han Chinese individual who has no known genetic diseases, and whose genome also serves as an Asian reference genome.
Nuclear DNA was extracted and subjected to unbiased, whole-genome bisulfite sequencing (BS-seq) using the Illumina Genome Analyzer. In total, 103.5 Gbp of paired-end sequence data were generated. Of these, 70.4 Gbp (68%) were successfully aligned to either strand of the YH genome with an average mismatch rate of 1.3%, resulting in an average sequencing depth of 12.3-fold per DNA strand or a 24.7-fold overall depth. Of the 18,962,679 CpGs present in the unique haploid part (2.21 Gb) of the YH reference genome sequence, approximately 99.86% were covered by at least one unambiguously mapped read of quality score >14 on either strand, and 92.62% were unambiguously covered on both strands.JingdeZhuMingzhiYeJianYuHonglongWuJihuaSunHongyuZhangRuibangLuo0000-0001-9711-6533YinghuaHeXinJinGuangyuZhouJinfengSunYeboHuangXiaoyuZhouShichengGuoXinLiJiujinXuStephanBeckLarsBolund0000-0003-4165-1531HongzhiCaoMinfengChenQuanChenXuedaHuKarstenKristiansen0000-0002-6024-0917NingLiQibinLiRuiqiangLiYingruiLiGengTianJianWangJunWang0000-0002-1422-3331WenWangHuanmingYang0000-0002-0858-3410ChangYuQinghuiZhangXiuqingZhangHanchengZhengHuisongZhengEpigenomic1100013_YH.jpgHan Chinese individualpublic domain because its copyright has expiredWikimedia Commons, en:Image:HanGaozu.jpgUnknown114890375168ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100014/2012-03-281000 Genomeshttp://www.1000genomes.org/10001510.1371/journal.pbio.100053321085693http://www.ncbi.nlm.nih.gov/sra?term=SRP002339http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17972QibinLiBGIliqb@genomics.org.cnliqb@genomics.org.cnGenomic data from chronic hepatitis B infected humans and healthy controls. Chronic hepatitis B (CHB) infection remains endemic in large parts of the world and, as such, is a major global health issue. However, a thorough understanding of the genetic variants involved in CHB infection susceptibility remains lacking.
This dataset comprises the raw exome sequencing data, SNP sets and InDel sets for 50 CHB patients and 40 healthy controls. The exome sequences were captured by NimbleGen2.1M array targeting 34 Mb of the human genome, containing 180,000 coding exons and 551 miRNA genes. The enriched library was then sequenced on Illumina HighSeq2000 and sequencing reads were aligned to human reference genome (NCBI build 36.3). The average sequencing depth per sample is 43X after the removal of read duplicates. These data provide a resource for identifying genetic variants predisposing humans to CHB infections.WeijunHuangLiangPengQiangZhaoQLiYuanyuanPeiQijunLiaoZhi-LiangGaoYimingWangJunWang0000-0002-1422-3331Genomic2100029_Hepatitis-B.jpgHuman exome – chronic hepatitis B infection predisposing variantsPublic Domain, US Government (CDC)Centers for Disease Control and Prevention's Public Health Image Library(PHIL), with identification number #5631.CDC281320357888ftp://penguin.genomics.cn/pub/10.5524/100001_101000/100029/2012-07-1210.1002/hep.2585022610944http://www.ncbi.nlm.nih.gov/sra?term=SRA048741