In this R tutorial, we will be using the highway mpg dataset. In this R tutorial, we will use a variety of scatterplots and histograms to visualize the data.

Scatterplots will be used to create points between cyl vs. hwy and cyl vs. cty. Once these are created, we can visually see the top choices for city and highway driving for the best mpg among 4, 6 and 8 cylinder vehicles.

Histograms will be used for the use of different types of drives. This data will be broken up into subsets and the classes will be identified for the three types of drives. In addition, there will be two additional histograms that will be broken into subsets against cyl vs. drv and cyl vs. class.

Install and Load Packages

Before we load the data, we will need to load the appropriate libraries for this R tutorial.

Scatterplot

A scatterplot is a comparison between 2 variables. The x-axis and y-axis show an observation between the 2 variables.Below are a series of scatter plots for visual comparison of mpg comparison of cyl versus cty and hwy. Also, the below scatter plots use a regression line to see the decline of mpg from the 4 cylinder vehicles to the 8 cylinder vehicles.

As one can see from the below, the vehicle classes that do well in city; also do well on the highway. The vehicle class that does the best in the subcompact class. The subcompact class does exceptionally well in both the city and highway mpg.

Can you see how the mpg goes down with 4, 6 and 8 cylinders in the city and highway mpg?

Scatter plot for cyl vs cty with mapping class as a color aesthetic with regression line

1

2

3

>mpg_stat<-ggplot(mpg,aes(x=cyl,y=cty))

>mpg_stat+geom_point(aes(color=class))+

stat_smooth(method="lm")

Scatterplot for cyl vs hwy with mapping class as a color aesthetic with regression line

1

2

3

>mpg_stat<-ggplot(mpg,aes(x=cyl,y=hwy))

>mpg_stat+geom_point(aes(color=class))+

stat_smooth(method="lm")

Additional Scatterplots for MPG

The above scatter plots give a great view of how mpg decrease with larger cylinder vehicles. However, many points are plotted in the same location, and it’s difficult to see the distribution.

Scatterplot Cyl vs. Hwy

1

2

>ggplot(mpg,aes(cyl,hwy))+

geom_jitter()

Scatterplot Cyl vs. Cty

1

2

>ggplot(mpg,aes(cyl,cty))+

geom_jitter()

The below is a simple plot used to distinguish the highway and city mpg against cylinders.

Notice how the 4 cyl is darker, and 8 cyl is lighter?

1

2

>mpg_gg<-ggplot(mpg)

>mpg_gg+geom_point(aes(x=hwy,y=cty,color=cyl))

Below is a bit more complicated but it’s a scatter plot matrix with the histogram for cyl, cty and hwy.

Scatterplot Matrix with Histogram

Below is a bit more complicated but it’s a scatter plot matrix with the histogram for cyl, cty and hwy.

Scatterplot + Facet Grid

The below scatterplot will show a comparison between class, cyl, cty and hwy mpg.

1

2

>scatterplot_class<-ggplot(mpg,aes(x=cty,y=hwy))+geom_point()

>scatterplot_class+facet_grid(cyl~class)

Histogram with Drives

Below are histograms with drives (drv) and also the data is broken up into subsets of classes. In addition, we will identify the classes with the highest number of vehicles with 4, front, and rear wheel drives.

1

2

>mpg_gg+geom_bar(aes(x=drv,fill=factor(drv)),position="dodge")+

theme(axis.text.x=element_text(angle=45,hjust=1))

MPG Class and Cycle Table

1

2

>drv_cyl<-table(mpg$class,mpg$cyl)

>drv_cyl

1

2

3

4

5

6

7

8

4568

2seater0005

compact322130

midsize160232

minivan10100

pickup301020

subcompact21275

suv801638

Subsets

The below histograms will compare cyl, class and drv across all totals. There are a total of 3 histograms for a complete visual of each.

Subset of Class with a factor of drive with binwidth

In addition, the histograms will identify the classes with the highest number of vehicles with 4, front, and rear wheel drives. Also, a shaded total of each drive in each class with cylinder totals of 4, 6 and 8 within each class.

1

2

3

4

>ggplot(mpg,aes(x=cyl,fill=drv))+

geom_histogram(binwidth=20,alpha=.5,position="identity")+

facet_wrap(~class)+

stat_bin(na.rm=FALSE)

Subset of class with a factor of drive

Below will give a total of classes among the drv type, such as 4, r and f.

1

2

3

>mpg_gg+geom_bar(aes(x=drv,fill=factor(drv)),position="dodge")+

facet_wrap(~class,scales="free_y")+

theme(axis.text.x=element_text(angle=45,hjust=1))

1

2

>drv_class<-table(mpg$class,mpg$drv)

>drv_class

1

2

3

4

5

6

7

8

4fr

2seater005

compact12350

midsize3380

minivan0110

pickup3300

subcompact4229

suv51011

After reviewing the data and using graphical analysis, the subcompact class would be my top vehicle class choice. The reason being is that this class does well in the city and on the highway. Also, as you can see from the above, the midsize class has the most front-wheel drive vehicles. The pickup has the most four-wheel drive vehicles and the suv class has the most rear-wheel drive vehicles.