R : Basic Data Analysis – Part 1

This First R Tutorial aims at introducing you to the fascinating world of Data Science and Analytics using the almighty tool called R. You will be aided with how to do steps which you can follow and work with.

File for this discussion. Please open the xls file and save it as csv = > WHO

How to Open a CSV file in R ?

Set working directory to the directory containing the CSV/xls file . say /directory/filename.csv

setwd("directory")

A Dataframe is an in memory representation of the data where each column represents a variable and each row represents a single observations in the data file.

We will open the csv file and store it in a dataframe.

DataFrame = read.csv("filename.csv")

In this post we will talk about a sample data sate from World Health Organization ( WHO.csv)

WHO = read.csv("WHO.csv")

How to do basic analysis of data

Once the WHO.csv is loaded in WHO dataframe now we should inspect the dataframe to check for basic analysis on the data which includes –

summary command is vary useful in diving litter deeper into variable inspection. It gives us Min ,Max ,Mean as well as 1st , 2nd and 3rd Quartile information. This can give us an indication on whether our variable is following Normal distribution or not. Usually Mean is representative of central tendency only if we have a variable normally distributed. It also gives us an idea of how many missing values that variable has.

Another very useful command is subset command. Many a times we want to divide our dataset on one or more criteria. Subset does exactly that for us. Let us say in our WHO dataframe we want to get data for countries whose population is greater than 30,000.

WHO_higher_population = subset(WHO,Population >30000)

Lets say we want to inspect the variable Population further and know standard deviation and which record # has the max value

sd(WHO$Population)
which.max(WHO$Population)

We can do basic plotting to inspect the nature of data and relationship between variables