Exercise

Dealing with entry error

Think for a minute about that Boat histogram. Every month, average weekday commuter boat ridership was on either side of four thousand. Then, one month it jumped to 40 thousand without warning?

Unless the Olympics were happening in Boston that month (they weren't), this value is certainly an error. You can assume that whoever was entering the data that month accidentally typed 40 instead of 4.

Because it's an error, you don't want this value influencing your analysis. In this exercise, you'll locate the incorrect value and change it to 4.

After you make the change, you'll run the last two commands in the editor as-is. They use functions you may not know yet to produce some cool ridership plots: one showing the lesser-used modes of transport (take a look at the gorgeous seasonal variation in Boat ridership), and one showing all modes of transport. The plots are based on the long version of the data we produced in Exercise 4 -- a good example of using different data formats for different purposes.

If you'd like to learn how to do this on your own, check out DataCamp's Data Visualization with ggplot2 courses!

Instructions

100 XP

Create a numeric variable i to store the index of the incorrect Boat value in mbta6. Combine a call to which() with a comparison operator (i.e. >) to determine the row number.

Overwrite the incorrect value of mbta6$Boat with a 4.

Verify that the change was made by looking at another histogram of mbta6$Boat.