In this R tutorial, we will analyze and visualize the Halloween Candy Power Ranking dataset using ggplot(). The data was pulled from a survey online with over 260,000 votes. The data is pulled from Kaggle.com provided the dataset The Ultimate Halloween Candy Power Ranking. The Halloween candy will be analyzed by using functions such as the head(), str(), and summary().

There are many variables in the dataset to compare the types of candy such as chocolate, fruity, caramel, and hard. We will also create plots on the sugar percentage versus the price percentage and how the candy totals to the win percentage.

Install and Load Packages

Below are the packages and libraries that we will need to load to complete this tutorial:

View the Kaggle Halloween Candy Power Ranking Dataset

Now that our libraries are installed, let’s pull in the data and take a look at the summary of the Halloween Candy Power Ranking dataset. Also, we will be able to able to view the head data with functions head(), str(), and summary().

str() function

Another way to print the Halloween Candy Power Ranking data is by using the str() function. The str() command displays the internal structure of an R object. This function is an alternative the to summary() function. When using the str() function, only one line for each basic structure will be displayed.

Sugar Percentage and Price Percentage Scatterplot

Below we will create a scatterplot to plot the sugar percentage and price percentage to see how the amount of sugar has on the cost of candy. Each point in the plot is determined by the value of the variable on the x-axis (sugar percentage) and on the y-axis (price percentage).

1

2

>ggplot(data=halloween_candy,aes(x=sugarpercent,y=pricepercent))+

geom_point()

Sugar Percentage and Price Percentage Scatterplot with Encircling

In some cases, I like to encircle groups of points in a scatterplot to draw attention. We will still be using ggplot and adding geom_circle(). This functionality is part of the ggalt package so please make sure it’s installed. The geom_circle() will automatically enclose points in a polygon.

Note: If you are working with large numbers and looking to disable disabling scientific notation, make sure to run: options(scipen = 999)

Sugar Percentage and Price Percentage Scatterplot with Text

Froma visual perspective, I really like the scatterplot for this tutorial. However, I believe adding the Halloween Candy names to the plot will provide even more benefit to the analysis. We can add the geom_text() and input a few methods to create the plot.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

>ggplot(data=halloween_candy,aes(x=sugarpercent,y=pricepercent,

label=competitorname))+

geom_point(color="orange")+

geom_smooth(method="lm")+

geom_text(check_overlap=T,

vjust="bottom",

nudge_y=0.01,

angle=35,

size=2,

color="purple")+

labs(title="Halloween Candy Power Ranking Scatterplot with Text",

y="Sugar Percentage",

x="Price Percentage")+

theme(plot.title=element_text(hjust=0.5))

Halloween Candy Features

The variables in the Halloween Candy Power Ranking dataset include various attributes that help create the rankings for each candy. Let’s take chocolate for an example, it will be either be 1 (TRUE) or 0 (FALSE). A piece of candy can have always more than one attribute. For example, the 100 Grand candy bar will be 1 (True) for chocolate and 1 (True) for caramel.

Halloween Candy Chocolate Bar Chart

Let’s start off by creating a simple bar chart of chocolate candy.

1

2

>ggplot(candy_features,aes(x=chocolate))+

geom_bar()

Halloween Candy Chocolate and Caramel Bar Chart

Previously I brought up the fact that candy has more than one feature and I used the example of chocolate and caramel. Let’s create a bar chart with chocolate and caramel. The first step we will take is creating a variable to pull data from 2:10. This will leave out the competitorname, sugarpercent, pricepercent, and winpercent. Secondly, we must make a variable to apply all features as logical. the lapply() function returns a list of the same length as X. Each of these elements of which is the result of applying FUN to the corresponding element of X.

1

2

>candy_features<-halloween_candy%>%select(2:10)

>candy_features[]<-lapply(candy_features,as.logical)

Now let’s run ggplot a fill of caramel and the new variable, candy_features.

1

2

>ggplot(candy_features,aes(x=chocolate,fill=caramel))+

geom_bar()

Halloween Chocolate Candy Features Grid Arrange

First we must create variables for each chocolate and feature as show below: