BioDataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology

BioDataome is a database of uniformly preprocessed and disease-annotated genomic and epigenomic data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNASeq gene expression, DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with Disease-Ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes ~5600 datasets, ~260000 samples spanning ~500 diseases and can be easily used in large scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading.