In this R Tutorial, we will complete data analysis and data visualization with ggplot, maps and mapdata of Florida shark attacks from 1882 until July 28, 2018. The shark attack data will be analyzed based on total occurrences in the state of Florida and will graphically be displayed using maps and mapdata.

Install and Load Packages

Below are the packages and libraries that we will need to load to complete this tutorial:

summary() function

The above outputs from the summary() and head() functions help to display the shark attack data.

lapply() function

The next step is to use the lapply() command and see classes of data we have.

1

>lapply(fl_counties_attacks,class)

1

2

3

4

5

$subregion

[1]"character"

$total

[1]"integer"

Clean the Florida County Shark Attack Dataset

Next, we will plot out all of the counties that have had shark attacks and will need to clean out the counties with no shark attacks.

1

>head(fl_counties_attacks,30)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

subregion total

1Volusia299

2Brevard144

3Palm beach75

4St Johns41

5Duval39

6Martin37

7St Lucie33

8Indian River22

9Miami-Dade16

10Monroe16

11Broward14

12Pinellas12

13Bay9

14Collier8

15Lee8

16Sarasota7

17Escambia6

18Flagler6

19Manatee4

20Nassau4

21Okaloosa4

22Franklin2

23Gulf2

24Charlotte1

25Santa Rosa1

26Walton1

27Alachua0

28Baker0

29Bradford0

30Calhoun0

As you can see from the above, only 26 counties had shark attacks. Now let’s use this data to create a variable for the 26 counties with shark attacks.

1

>fl_attacks_26<-head(fl_counties_attacks,26)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

subregion total

1Volusia299

2Brevard144

3Palm beach75

4St Johns41

5Duval39

6Martin37

7St Lucie33

8Indian River22

9Miami-Dade16

10Monroe16

11Broward14

12Pinellas12

13Bay9

14Collier8

15Lee8

16Sarasota7

17Escambia6

18Flagler6

19Manatee4

20Nassau4

21Okaloosa4

22Franklin2

23Gulf2

24Charlotte1

25Santa Rosa1

26Walton1

Florida County Shark Attacks Plot

Below is a ggplot() to plot out all of the Florida counties with shark attacks:

1

2

3

4

5

6

7

>ggplot(fl_attacks_26,aes(x=subregion,y=total,fill=subregion))+

geom_bar(stat="identity")+

xlab("Florida Counties")+

ylab("Total Shark Attacks")+

ggtitle("Florida Counties Shark Attacks")+

theme(axis.text.x=element_text(angle=50,hjust=1))+

theme(plot.title=element_text(hjust=0.5))

The above looks nice but let’s plot the total shark attacks from the greatest to the least by running a factor() function. This works only if your data is already in order from greatest to least of the total.

Now it’s time to map the counties in the state of Florida with the below:

1

2

3

4

5

6

7

8

9

>fl_df<-subset(states,region=="florida")

>counties<-map_data("county")

>fl_county<-subset(counties,region=="florida")

>fl_base<-ggplot(data=fl_df,mapping=aes(x=long,y=lat,group=group))+

coord_fixed(1.3)+geom_polygon(color="black",fill="gray")

>fl_county_map<-fl_base+theme_nothing()+

geom_polygon(data=fl_county,fill=NA,color="white")+

geom_polygon(color="black",fill=NA)

>fl_county_map

Note: If you receive an error as shown below, just remove + theme_nothing() from the ggplot and re-run the ggplot.

1

Error intheme_nothing():could notfind function"theme_nothing"

The above will be the foundation for plotting the shark attacks among the counties in Florida.

We currently have two sets of data; fl_counties_attacks and mapsdata. We must be sure that the data is similar by checking the subregions of each. now let’s take a look at each dataset by running the head() function.

1

>head(fl_county)

1

2

3

4

5

6

7

longlat group order region subregion

12216-82.6606229.8167229012216florida alachua

12217-82.6377029.8167229012217florida alachua

12218-82.6205129.8453729012218florida alachua

12219-82.5976029.8511029012219florida alachua

12220-82.5804129.8797529012220florida alachua

12221-82.5804129.9141329012221florida alachua

1

>head(fl_counties_attacks)

1

2

3

4

5

6

7

subregion total

1Volusia299

2Brevard144

3Palm beach75

4St Johns41

5Duval39

6Martin37

From the above output, we will need to fix the case of the subregions by running the sapply() function.

The only issue I can really see with the above is the range of data is not very large. In this case, with counties having shark attacks under 30 or so won’t show visibly to really notice a shark attack happening in that Florida county.