The table gives age (years) of the best actress and actor when Oscar was won. Click on the file to download it and move it into RStudio. We will produce the barplot for the actresses.

Age

ActressCount

ActorCount

20-29

27

1

30-39

34

26

40-49

13

35

50-59

2

13

60-69

4

6

70-79

1

1

80-89

1

0

pie(oscar$ActressCount, labels=oscar$Age)

Note that the video assumes that you have the original data not the frequency table.

Type of Data: Qualitative (Categorical)

DATA FILE: Each individual has been categorized to one of the levels.

GENERAL FORM OF R COMMAND:

variablename.freq = table(variablename)

EXAMPLE:

Dataset: painters (Located in library MASS)

The data frame is a compilation of technical information of a few eighteen century classical painters. The data set belongs to the MASS package, and has to be pre-loaded into the R workspace prior to its use.

library(MASS)

We will construct the frequency distribution of the school variable

school.freq = table(painters$School)

school.freq

EXERCISE: Produce a barplot and pie chart for the school categorical variable.

Note that the video assumes that you have the original data not the frequency table.

Type of Data: Quantitative (Numerical)

GENERAL FORM OF R COMMAND:

stem(dataframename$variablename)

EXAMPLE:

Dataset: faithful

There are two observation variables in the data set. The first one, called eruptions, is the duration of the geyser eruptions. The second one, called waiting, is the length of waiting period until the next eruption. We will create a stem-and-leaf plot of the eruptions variable.

There are two observation variables in the data set. The first one, called eruptions, is the duration of the geyser eruptions. The second one, called waiting, is the length of waiting period until the next eruption. We will create a histogram of the eruptions variable.

There are two observation variables in the data set. The first one, called eruptions, is the duration of the geyser eruptions. The second one, called waiting, is the length of waiting period until the next eruption. We will create a boxplot of the eruptions variable.

plot(ExplanatoryVariable, ResponseVariable, main="The Title of the Plot", xlab="Definition of the Explanatory Variable", ylab="Definition of the Response Variable")

EXAMPLE:

Dataset: faithful

There are two observation variables in the data set. The first one, called eruptions, is the duration of the geyser eruptions. The second one, called waiting, is the length of waiting period until the next eruption. We will create a scatter plot of the waiting intervals (response) versus eruption durations (explanatory)

The data consists of Year of diagnosis, incidence rates (out of 100,000 people) for Total popultaion, Males, and Females. Click on the file to download it and move it into RStudio. We will produce the timeplot for the total.

EXERCISE: Produce a timeplot for females and males separately and compare.

Type of Data: Quantitative (Numerical)

GENERAL FORM OF R COMMAND:

5-number Summary

summary(dataframename$variablename)

Mean

mean(dataframename$variablename)

Median

median(dataframename$variablename)

Interquartile Range

IQR(dataframename$variablename)

Standard Deviation

sd(dataframename$variablename)

EXAMPLE:

Dataset: faithful

There are two observation variables in the data set. The first one, called eruptions, is the duration of the geyser eruptions. The second one, called waiting, is the length of waiting period until the next eruption. We will create a histogram of the eruptions variable.

The data gives various population characteristics on Minnesota Counties from 1900 to 2010. Click on the file to download it and move it into RStudio. We will produce a motion chart for this data. To see the detailed instructions click here.

The data gives 2008 alcohol consumption per person for 182 countries. Click on the file to download it and move it into RStudio. We will produce a map this data. To see the detailed instructions click here.

The data gives 2009 life expectancies for the 197 countries. Click on the file to download it and move it into RStudio. We will produce a map/chart of this data. Our location variable is Country and color variable is Life_Expectancy.