```{r setup, echo=FALSE}
library(LearnBioconductor)
stopifnot(BiocInstaller::biocVersion() == "3.0")
```
```{r style, echo = FALSE, results = 'asis'}
BiocStyle::markdown()
```
# Organizing Code in Functions, Files ,and Packages
Martin Morgan
October 29, 2014
## Programming
_R_ is a programming language
- Your own functions
```{r fun}
fun 0) {
"hi!"
} else {
"low!"
}
})
```
## From script to package
Scripts contain a combination of data and transformations. Often the
transformations are idiosyncratic, and rely heavily on functions
provided by various packages. Sometimes the script contains a useful
chunk of code that could be reused in different places. Examples we've
encountered in this course might include GC-content of DNA sequences
(or is there a function for that already? check out `r
Biocpkg("Biostrings")`!) and creating a simple 'map' from one type of
annotation to another.
It is easy and beneficial to create a package.
- Only need to think carefully about how to implement the function once.
- Code in the package is reused -- if you correct an error, then all
your scripts automatically benefit.
- Share with your lab mates / colleagues for consistent results, and
with the wider community for fame and glory.
What is an _R_ package?
- Simple directory structure of required files. Minimal:
MyPackage
|-- DESCRIPTION (title, author, version, etc.)
|-- NAMESPACE (imports / exports)
|-- R (R function definitions)
|-- gc.R
|-- annoHelper.R
|-- man (help pages)
|-- MyPackage.Rd
|-- gc.Rd
|-- annoHelper.Rd
Check out the _Rstudio_ package wizard!
## Lab
### GC-content
Write a function that takes a `DNAStringSet` and returns the GC content.
Modify the function using a conditional statement to work whether
provided a `DNAString` or a `DNAStringSet`. Test the function.
Save the function in a file on your AMI.
### Annotation-helper
Write a function that takes as its argument Ensembl gene identifiers
(like the `rownames()` of the _SummarizedExperiment_ object in the
RNASeq vignette yesterday) and uses the `select()` method and
annotation package `r Biocannopkg("org.Hs.eg.db")` to return a named
character vector, where the names of the vector are the Ensembl
identifiers and the values are the corresponding gene SYMBOLs. Adopt
some simple-to-implement policy for handling Ensembl identifiers that
map to more than one gene symbol. Save this function to another file
### A package
Use the _RStudio_ wizard to create a package from the files containing
your GC-content and annotation-helper functions.