General Information

R can be found on all of the Oceanographic Campus Library's computer stations

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed.

Online Tutorials and Training

Join author Barton Poulson as he introduces the R statistical processing language, including how to install R on your computer, read data from SPSS and spreadsheets, and use packages for advanced R functions.

The course continues with examples on how to create charts and plots, check statistical assumptions and the reliability of your data, look for data outliers, and use other data analysis tools. Finally, learn how to get charts and tables out of R and share your results with presentations and web pages.

R is the language of big data—a statistical programming language that helps describe, mine, and test relationships between large amounts of data. Author Barton Poulson shows how to use R to model statistical relationships using graphs, calculations, tests, and other analysis tools. Learn how to enter and modify data; create charts, scatter plots, and histograms; examine outliers; calculate correlations; and compute regressions, bivariate associations, and statistics for three or more variables. Challenge exercises with step-by-step solutions allow you to test your skills as you progress.

Successful programmers know more than a computer language. They also know how to think about solving problems. They use "computational thinking": breaking a problem down into segments that lend themselves to technical solutions. Code Clinic is a series of ten courses where authors solve the same problems using different programming languages. Here, Mark Niemann-Ross works with R.

Mark introduces challenges and then provides an overview of his solutions in R. Challenges include topics such as statistical analysis, searching directories for images, and accessing peripheral devices.

Visit other courses in the series to see how to solve the exact same challenges in languages like C++, C#, JavaScript, PHP, Python, Ruby, and Swift.

Tidy data is a data format that provides a standardized way of organizing data values within a dataset. By leveraging tidy data principles, statisticians, analysts, and data scientists can spend less time cleaning data and more time tackling the more compelling aspects of data analysis. In this course, learn about the principles of tidy data, and discover how to create and manipulate data tibbles—transforming them from source data into tidy formats. Instructor Mike Chapple uses the R programming language and the tidyverse packages to teach the concept of data wrangling—the data cleaning and data transformation tasks that consume a substantial portion of analysts' time. He wraps up with three hands-on case studies that help to reinforce the data wrangling principles and tactics covered in this course.

Business decisions are often binary: take on this project or put it off for a year; extend credit to this customer or insist on cash; open a new retail outlet in a particular location or find another spot. When an outcome is a continuous variable such as revenue, ordinary regression is often a good technique, but when there are only two outcomes, logistic regression usually offers better tools.

Learn how to use R and Excel to analyze data in this course with Conrad Carlberg. He takes you through advanced logistic regression, starting with odds and logarithms and then moving on into binomial distribution and converting predicted odds back to probabilities. After this foundation is established, he shifts the focus to inferential statistics, likelihood ratios, and multinomial regression. Conrad's comprehensive coverage of how to perform logistic regression includes tackling common problems, explaining relationships, reviewing outcomes, and interpreting results.

Data scientists who use Excel realize that R is emerging as the new standard for statistical wrangling (especially for larger data sets). This course serves as the perfect bridge for the many Excel-reliant data analysts and business users who need to update their data science skills by learning R.

Much of the course focuses on how crucial statistical tasks and operations are done in R—often with the DescTools package—as contrasted with Excel's functions and Data Analysis add-in, and then scales up from there, showing R's more powerful features. Conrad Carlberg will help you effectively toggle between both programs, moving data back and forth so you can get the best of both worlds. Start by learning how to install R and the DescTools package, and the data files used in all the hands-on exercises. Then learn about calculating descriptive statistics on numeric and nominal variables, and running bivariate analyses in both Excel and R. In the "Next steps" video, Conrad breaks down the pros and cons of Excel vs. R and provides tips for learning more about statistics in each application.

We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals, all while analyzing data with R. We provide R programming examples in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses. We will use visualization techniques to explore new data sets and determine the most appropriate approach. We will describe robust statistical techniques as alternatives when data do not fit assumptions required by the standard approaches. By using R scripts to analyze data, you will learn the basics of conducting reproducible research.

Given the diversity in educational background of our students we have divided the series into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

This course is part of the Microsoft Professional Program Certificate in Data Science.

R is rapidly becoming the leading language in data science and statistics. Today, R is the tool of choice for data science professionals in every industry and field. Whether you are full-time number cruncher, or just the occasional data analyst, R will suit your needs.

This introduction to R programming course will help you master the basics of R. In seven sections, you will cover its basic syntax, making you ready to undertake your own first data analysis using R. Starting from variables and basic operations, you will eventually learn how to handle data structures such as vectors, matrices, data frames and lists. In the final section, you will dive deeper into the graphical capabilities of R, and create your own stunning data visualizations. No prior knowledge in programming or data science is required.

What makes this course unique is that you will continuously practice your newly acquired skills through interactive in-browser coding challenges using the DataCamp platform. Instead of passively watching videos, you will solve real data problems while receiving instant and personalized feedback that guides you to the correct solution.

This course is part of the Microsoft Professional Program Certificate in Data Science.

In this computer science course from Microsoft, developed in collaboration with the Technical University of Denmark (DTU), get the knowledge and skills you need to use R, the statistical programming language for data scientists, in the field of your choice.

In this course you will learn all you need to get up to speed with programming in R. Explore R data structures and syntaxes, see how to read and write data from a local file to a cloud-hosted database, work with data, get summaries, and transform them to fit your needs. Plus, find out how to perform predictive analytics using R and how to create visualizations using the popular ggplot2 package.

The demand for skilled data science practitioners in industry, academia, and government is rapidly growing. The HarvardX Data Science Series prepares you with the necessary knowledge base and skills to tackle real-world data analysis challenges. The series covers concepts such as probability, inference, regression and machine learning and helps you develop a skill set that includes R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux, version control with git and GitHub, and reproducible document preparation with RStudio. In the R Basics course, we learn the basic building blocks of R. As done in all our courses, we use motivating case studies, we ask specific questions, and learn by answering these through data analysis. Our assessments use code checking technology that will permit you to get hands-on practice during the courses.

Throughout the series, we will be using the R software environment. You will learn R, statistical concepts, and data analysis techniques simultaneously. In this course, we will introduce the necessary basic R syntax to get you going. However, rather than cover every R skill you need, we introduce just enough so you can continue learning in the next courses, which will provide more in depth coverage. We believe that you can better retain R knowledge when you learn it to solve a specific problem. The motivating question in this course relates to crime in the United States and we provide a relevant dataset. You will learn some basic R skills to permit us to answer specific questions about differences across the different states.

HarvardX has partnered with DataCamp for all assignments. This allows students to program directly in a browser-based interface. You will not need to download any special software, but an up-to-date browser is recommended.

Improvements in modern biology have led to a rapid increase in sensitivity and measurability in experiments and have reached the point where it is often impossible for a scientist alone to sort through the large volume of data that is collected from just one experiment.

For example, individual data points collected from one gene expression study can easily number in the hundreds of thousands. These types of data sets are often referred to as ‘biological big data’ and require bioinformaticians to use statistical tools to gain meaningful information from them.

In this course, part of the Bioinformatics MicroMasters program, you will learn about the R language and environment and how to use it to perform statistical analyses on biological big datasets.

Do you want to learn how to harvest health science data from the Internet? Or learn to understand the world through data analysis? Start by learning R Statistics!

Skilled professionals who can process and analyze data are in great demand today. In this course you will explore concepts in statistics to make sense out of data. You will learn the practical skills necessary to find, import, analyze and visualize data. We will take a look under the hood of statistics and equip you with broad tools for understanding statistical inference and statistical methods. You will also perform some really complicated calculations and visualizations, following in the footsteps of Karolinska Institute’s researchers.

Statistical programming is an essential skill in our golden age of data abundance. Health science has become a field of big data, just like so many other fields of study. New techniques make it possible and affordable to generate massive data sets in biology. Researchers and clinicians can measure the activity for each of 30000 genes of a patient. They can read the complete genome sequence of a patient. Thanks to another trend of the decade, open access publishing, the results of such large scale health science are very often published for you to read free of charge. You can even access the raw data from open databases such as the gene expression database of the NCBI, National Center for Biotechnology Information.

We will dive into this data together. Learn how to use R, a powerful open source statistical programming language, and see why it has become the tool of choice in many industries in this introductory R statistics course.

Programming tutorials for the statistical software R, presented for beginners. A complete set of videos for learning how to use the Statistical Software R. Discusses importing data, getting started, descriptive statistics, and bivariate hypothesis tests, both parametric and non-parametric. These videos are intended to pair well with an introductory statistics course.

This R Tutorial Videos playlist will help you in understanding the various fundamentals of R programming with examples in detail. It takes you through R Programming, Data Manipulation, Exploratory Data Analysis, Data Visualization, Data Mining, Regression, Sentiment Analysis and using R Studio.