Exercise

Delimit dates

In this exercise, you'll split up the dates so that you have a month, day, and year variable in the data. This will help you group the data for later analyses. For example, you will compare Greinke's amazing July run to his other months in the 2015 season.

To do this, you'll leverage the separate() function from the tidyr package. This function allows you to easily split an existing column (col) in a data frame (data) into new columns, which are specified with the into argument (e.g. into = c("year", "month")). Lastly, the remove argument tells R whether to remove the original column from the data and the sep argument tells R what to split on.

The ifelse() function will also help you to make the new variable july, which you'll use more extensively in later exercises. For now, know that it takes three arguments: a conditional statement, the value to return if the condition is true, and the value to return if the condition is false. Copy and paste the following into the console to see an example:

age <- c(3, 12, 32, 40, 17)
ifelse(age < 19, "child", "adult")

This ifelse() statement returns "child" or "adult" depending on whether the conditional statement age < 19 is TRUE or FALSE.

Instructions

100 XP

The tidyr package is already loaded in your workspace.

Use separate() to separate the dates from the game_date variable of greinke into three new variables named year, month, and day. The sep argument is simply "-", since you want to split on the dashes. Use the remove argument to preserve the original game_date column.

Convert month in the greinke data frame to a numeric using as.numeric().

Create a new column july in the greinke data frame. Use the ifelse() command to assign the value "july" if it is the month of July (i.e. month is equal to 7) and "other" otherwise.

View the head() of greinke to make sure that your newly created variables appear.

Print the summary() of greinke$july to see how many observations appear in each category. Be sure to wrap the factor() function around this variable (a character vector) so that it is treated as categorical data.