Category: Gene Symbols

Converting mouse gene names to the human equivalent and vice versa is not always as straightforward as it seems, so I wrote a function to simplify the task. The function takes advantage of the getLDS() function from the biomaRt to get the hgnc symbol equivalent from the mgi symbol.

Advertisements

Converting mouse gene names to the human equivalent and vice versa is not always as straightforward as it seems, so I wrote a function to simplify the task. The function takes advantage of the getLDS() function from the biomaRt to get the hgnc symbol equivalent from the mgi symbol. For example, let’s convert the following mouse gene symbols, Hmmr, Tlx3, and Cpeb4, to their human equivalent.

In this post, we will go over how to use the GEOquery package to download a data matrix (or eset object) directly into R and append specific probe annotation information to this matrix for it to be exported as a csv file for easy manipulation in Excel or spreadsheet tools. This is especially useful for sharing data with collaborators who are not familiar with R and would rather look up there favorite genes in a spreadsheet format.

Mining gene expression data from publicly available databases is a great way to find evidence to support you working hypothesis that gene X is relevant in condition Y. You may also want to mine publicly available data to build on an existing hypothesis or simply to find additional support for your favorite gene in a different animal model or experimental condition. In this post, we will go over how to use the GEOquerypackage to download a data matrix (or eset object) directly into R and append specific probe annotation information to this matrix for it to be exported as a csv file for easy manipulation in Excel or spreadsheet tools. This is especially useful for sharing data with collaborators who are not familiar with R and would rather look up there favorite genes in a spreadsheet format.

First, let’s start by opening an R session and creating a function to return the eset (ExpressionSet) object or the original list object downloaded by the getGEO() function in R.

getGEOdataObjects <- function(x, getGSEobject=FALSE){
# Make sure the GEOquery package is installed
require("GEOquery")
# Use the getGEO() function to download the GEO data for the id stored in x
GSEDATA <- getGEO(x, GSEMatrix=T, AnnotGPL=FALSE)
# Inspect the object by printing a summary of the expression values for the first 2 columns
print(summary(exprs(GSEDATA[[1]])[, 1:2]))
# Get the eset object
eset <- GSEDATA[[1]]
# Save the objects generated for future use in the current working directory
save(GSEDATA, eset, file=paste(x, ".RData", sep=""))
# check whether we want to return the list object we downloaded on GEO or
# just the eset object with the getGSEobject argument
if(getGSEobject) return(GSEDATA) else return(eset)
}

We can test this function on a GEO dataset such as GSE73835 as follows:

# Store the dataset ids in a vector GEO_DATASETS just in case you want to loop through several GEO ids
GEO_DATASETS <- c("GSE73835")
# Use the function we created to return the eset object
eset <- getGEOdataObjects(GEO_DATASETS[1])
# Inspect the eset object to get the annotation GPL id
eset

Let’s take a look at the first 6 lines of the data frame we just created with the head()function.

As you can see once we export this data frame as a csv file, it is much easier for others to open this file as a spreadsheet and get useful information such as the gene symbol or entrez id with the expression values across the samples.

There are many ways to convert gene accession numbers or ids to gene symbols or other types of ids in R and several R/Bioconductor packages to facilitate this process including the AnnotationDbi, annotate, and biomaRt packages. In this post, we are going to learn how to convert gene ids with the AnnotationDbi and org.Hs.eg.db package.

There are many ways to convert gene accession numbers or ids to gene symbols or other types of ids in R and several R/Bioconductor packages to facilitate this process including the AnnotationDbi, annotate, and biomaRt packages. In this post, we are going to learn how to convert gene ids with the AnnotationDbi and org.Hs.eg.db package. You could potentially modify this code to work with other species such as mice with the org.Mm.eg.db package.

For example, say we have a gene expression matrix stored in M1 created from an eset object you downloaded from GEO. The study I will be using for this example is A Leukemic Stem Cell Expression Signature is Associated with Clinical Outcomes in Acute Myeloid Leukemia deposited on GEO with the accession id GSE24006. To view the script on how to generate the expression set (eset) object see the post – Retrieving Gene Expression Data Objects & Matrices From GEO.

We can generalize this function to go back and forth between gene symbols and entrez ids (or other ids) as follows:

We can generalize this function to go back and forth between gene symbols and entrez ids (or other ids) as follows:
# This function can take any of the columns(org.Hs.eg.db) as type and keys as long as the row names are in the format of the keys argument
getMatrixWithSelectedIds <- function(df, type, keys){
require("AnnotationDbi")
require("org.Hs.eg.db")
geneSymbols <- mapIds(org.Hs.eg.db, keys=rownames(df), column=type, keytype=keys, multiVals="first")
# get the entrez ids with gene symbols i.e. remove those with NA's for gene symbols
inds <- which(!is.na(geneSymbols))
found_genes <- geneSymbols[inds]
# subset your data frame based on the found_genes
df2 <- df[names(found_genes), ]
rownames(df2) <- found_genes
return(df2)
}
# for example, going from SYMBOL to ENTREZID
M1entrez <- getMatrixWithSelectedIds(M1symb, type="ENTREZID", keys="SYMBOL")

Stay tuned for more posts on Converting Gene Names in R with the annotation and biomaRt package.