For help on your homework mechanics, please see this list of assignment tips

Practice assignment

This is a warm-up for your future assignment. Once you have completed it, you will have set-up for the class and have a better understanding of how to set up your repository, pushing files to GitHub, writing markdown, and using R to do simple analyses.

Paper review

Each student should review this paper in a 450-650 word summary/report. The summary should include:

A brief review of the goal, findings and conclusion of the paper.

A list (or mentioning) of the related datasets/databases and data types used in the study. In the case of datasets, provide some details of the data matrix and meta data.

A brief review of the analytical steps in the paper with more details on some selected parts which are relevant to the course materials. You don’t need to understand all of the analysis, but should be able to identify the key analysis/method used to answer the question the paper is intended to answer

Some comments and critiques about the analytical steps, alternative suggestions or improvements.

We have provided this guideline for this task. The material in the review should not be limited to merely answering the questions in the guideline, but rather using them to provide the required items listed above. Example paper reviews are also provided. Here’s another helpful resource on how to read a research paper

Due date for the paper review: Feb 22, 2018

Delivery: You should put your paper review files in your individual course repository in .md format (notice that you don’t need to write it in .Rmd since you are not going to have any R code in it). You could edit .md files directly from github and therefore can write your full report there, but can also write it in your RStudio and then push it to your repository.

Analysis assignment

This assignment will assess your understanding of the seminar and lecture materials. The assignment is split into two parts. Start early because this assignment will take time to be completed and perfected. Use the issues in the Discussion repo and the seminar time to ask questions. You will find most of the analysis workflow of the assignment in the seminar materials.

Final group project

Deliverables

General principles

Identify a biological question of interest and a relevant dataset. Develop and apply a statistical approach that allows you to use the dataset to answer the question.

We assume the biological question and data fall in the general area of high-throughput, large-scale biological investigations targetted by the course. Beyond that, it is wide open: methylation, SNPs, miRNAs, CNVs, RNA-Seq, CHiP-Seq, gene networks, … it’s fair game. Avoid a dataset that doesn’t have any/much quantitative data, i.e. contains only sequence or discrete data.

Note that definitive answers are not necessarily expected. Rather, aim to provide a critical appraisal of the data, the analytical approach, and the results. You will have to handle the competing pressures to “get it right” and “get it done”. Shortcomings of the data, misfits between the data or the biological question and the statistical model, etc. are inevitable. Your goal is to identify such issues and discuss them critically, without becoming paralyzed. Demonstrate understanding of the statistical concepts and methods that are the foundation of your analytical approach.

We assume the analytical and computing task will have a substantial statistical component, probably enacted via R. So beware of a major analytical or computational undertaking that is, nonetheless, not statistical (example: constructing a database). Creating useful data visualizations can be absolutely vital and is arguably statistical, but your analysis should go beyond merely creating pretty pictures (but please do include some!). Key concepts, at least some of which should come up in your analysis:

Data considerations

Appropriate use of data

If your project involves using unpublished data, ensure your plans are known to the data providers (e.g., your supervisors), and think about implications for publishing - are you are bringing the project team in as collaborators in effect? Are you planning to publish the results of your project, and if so who will be the co-authors? It is best to deal with these questions at the outset of the project.

Privacy of project data

The projects are not made public (other than being on a poster in the lobby of EOS for a few hours). The project report materials are loaded into Github, the secure site we use to manage the course. The course staff and instructors are the only people who have access to the projects other than the other members of the project group. The data used can be uploaded to the project, but this can limited or omitted if there are special concerns about privacy etc. - it’s primarily the code and write-up about the results that needs to be provided for evaluation.

Group makeup

Groups should have 4 to 6 members. We strongly encourage that groups be diverse in terms of backgrounds. In practice, this probably means the students should registered in a mix of programs/departments. All groups and group projects must be approved by the instructors.

STAT 540 Homework Submission Instructions

GitHub You all have a private repository in STAT540-UBC organization account, i.e., the repo zz_lastname-firstnmae_STAT540_2018. We assume that

You’ve already installed Git and (probably) a Git client.

You can use command line Git and/or your Git client and perhaps even RStudio to push, pull, etc. to/from GitHub.

All your work is nicely organized in your repository. Your repository needs to include a clear top-level README.md that contains links to your work. This is the presentation of your repository and it helps others to find your work and contributions!

IMPORTANT NOTE: use the repository within the organization assigned to you to submit all your course work (i.e., the repo zz_lastname-firstnmae_STAT540_2018). Do not use branches or other repositories.

Set-up your private GitHub repo for homework

We’re talking about the repo zz_lastname-firstnmae_STAT540_2018 now.

Make a top-level directory for your assignment, e.g. Homework

We truly mean a directory or “folder” – NOT a Git branch or anything fancy like that! On your local computer, go to the directory where this Git repository lives. Make the 2 directories here.

It is also nice to include a README.md inside each of the assignment and seminar directories.

GitHub automatically renders all Markdown files into (pseudo-)HTML when you visit them in a browser. Whenever a directory in a repo is visited, if it contains a Markdown file called README.md, it will automatically be rendered, effectively serving as a landing or home page.

R Markdown

Write your homework in R Markdown. The file extension should be .rmd.

Recommendation: Create a skeleton of your report by starting with the Markdown file that creates the assignment itself! You can take some things away (unnecessary detail) and add others (R chunks) to morph this into your homework solution.

You’ll have these files if you are using Git(Hub) to keep a current copy of the whole course repository. Or, from the links above, click on “Raw” to get raw Markdown and save to a local file.

HTML

Compile your homework to Markdown (file extension should be .md).

RStudio’s “Knit HTML” button will do this. Your .md file is a intermediate file that can be read nicely on GitHub.

Alternatively, use knit2html() from the knitr package in the R Console or in an R script.

To run from the shell or in a Makefile, use something like Rscript -e "knitr::knit2html('hw01_lastname-firstnmae.rmd')"

Notice that, by default, any figures created are placed into a figures/ subdirectory. The intermediate Markdown file links to these and, therefore, requires them to present your full report. By default, the figures are base64 encoded and embedded into the HTML, which, therefore, is self-contained.

What to put (or not put) into your Git(Hub) repository

This is rather specific to STAT 540 and may not necessarily reflect your workflow in the future and in other contexts.

Locally, you are of course encouraged to keep the file in some logical place within the homework assignment’s directory. But list the names of such data files in your top-level .gitignore file, so that Git ignores it. We do this so that TAs don’t end up with 50 copies of the input data when they mark your work.

Commit the intermediate Markdown (.md) file and the figures stored in the figures/ subdirectory.

Some purists would say intermediate and downstream products do NOT belong in the repo. After all, you can always recreate them from source, right? But here in reality, it turns out to be incredibly handy to have this in the repo.

Commit the end product HTML (.html) file.

See above comment re: version control purists vs. pragmatists.

Push closer to the submission date.

Never ever edit the Markdown or HTML “by hand”. Only edit the R Markdown source and then regenerate the downstream products from that.

How to “turn in” your homework

Make sure you have

Saved all the files associated with your solution locally.

Committed those files to your local Git repository.

Pushed the current state of your local repo to GitHub.

Open an issue, link to the latest commit, tag us

Visit your private GitHub repository in a web browser

Just above the file list, look for the text “latest commit” followed by ten numbers and letters (called the revision SHA) and a clipboard icon

Click on the clipboard icon to copy the revision SHA to your clipboard

Click on “Issues”, then on “New Issue”. Name the issue “Mark homework of your repository name”.

In the description of the issue, tag all TAs by including the text @STAT540-UBC/ta_2018, and paste the revision SHA. You can also include a link to the markdown file as well.