Note

Introduction

About two years ago, I was taking my dog for a walk through a park and I began to notice the birds and how fascinating they were! 🐦 I began regularly going out birding (aka “bird-watching”) and reading up on these cool little flying dinosaurs.

It turns out there’s a lot of data in the birding world as well. Birding attracts the sort of detail-oriented person who likes to count and record stuff.

So there are opportunities to get involved in citizen science projects, including a long-running project called the Christmas Bird Count (CBC). It started in 1900, when Frank Chapman, an ornithologist, came up with the idea of counting birds as an alternative to hunting them at Christmas (hunting them being the previous tradition).1

Birders have been going out every year around Christmas, to spend the day walking, biking, or driving through a census area to count all the birds they see or hear.

For the past two years, I have gone out with Hamilton’s Christmas Bird Count. I learn a lot while I’m out there and it feels like we are contributing to a larger purpose because of the data we are collecting.

So I thought I would look at the data and see what it could tell me!

Specifically, I’ve noticed birders will say things like, “the House Sparrows are getting worse every year” or, “the number of Bald Eagles has increased”, and I was wondering if the Christmas Bird Count data would agree or disagree with those statements.

To access the data, I went on the Bird Studies Canada website, clicked on Citizen Science, then Christmas Bird Count, then CBC Audubon Database, and then Historical Results by Count. I downloaded all years of data that existed for the Hamilton count.

Data cleaning

As shown below, it turns out that the first row just gives information about the count name and latitude/longitude, so I extracted those two pieces of information as current_circle_name and lat_long and then sliced the file so that the first two lines were excluded from the dataset. I then used clean_names from the janitor package.

I decided to programmatically separate the first three tables out from the dataset, in order to be able to easily replicate this analysis when more years’ data gets added, or if I want to repeat this analysis for another count area.

# Declare two lists, one for each table and one for the header of each table
meta_data_table <- list()
meta_data_table_headers <- list()
for (i in 1:3) {
# For each of the three tables, put the first row of data into this variable (i.e., the header data)
meta_data_table_headers[[i]] <- hamilton_cbc[1, ] %>%
replace(is.na(.), "NA") # If a column name is NA, replace that with the letters "NA", in case there is data in the column
j <- 2 # Start j at 2, because we want to exclude the header row
meta_data_row <- list()
# Add each row of data to meta_data_row
while (!is.na(hamilton_cbc[["circle_name"]][j])) { # "circle_name" being the name of the first column and we know there is a row of NAs between each table
meta_data_row[[j]] <- hamilton_cbc[j, ]
j <- j + 1
}
meta_data_table[[i]] <- meta_data_row %>%
bind_rows()
meta_data_table[[i]] <- meta_data_table[[i]] %>%
rename_all(funs(meta_data_table_headers[[i]])) %>%
remove_empty(which = "cols") %>%
rename(count_year = 1) %>% # rename knows we are referring to the first column!
clean_names()
# Remove the meta-data table from the rest of the dataset
hamilton_cbc <- hamilton_cbc %>%
slice((j + 1) : n())
}

Whew! So now we have a list meta_data_table that contains the three meta-data tables.

In the below code chunk, I converted the three tables in the meta_data_table list into one overall_meta_data table using the reduce function from the purrr package.

Then, using the replace_with_na_all function from the naniar package, anywhere the data was recorded as “Unknown”, I converted it to a proper NA.

I then mutated two of the variables: I used the mdy and year functions from the lubridate to convert the character variables low_temp3 (actually a date variable) and date into a Date and dbl variable, respectively.

I also used the word function from the stringr package. The word function extracts out only the first word from a character string.

But we still need to deal with the fact that there is a fifth table below the bird count data table. Here is the end of the fourth table and the start of the fifth table. Notice that there is a line of NAs between the two tables:

I won’t show the data of the fifth table here, but it shows the full names of the counters. I don’t want to add this data to overall_meta_table, so I will just remove the table from the dataset.

In order to remove the fifth table, I noticed that every row of the bird count data included a [ character in the species variable (for example: “Snow Goose [Chen caerulescens]”), so I filtered out every row in the dataset that didn’t have the [ character. Notice that, in the str_detect filter, I put two \’s in front of the [. That is because the [ is a special regex character and str_detect needs to know I mean the actual character [, and putting two \’s in front of the [ is how to do that.

hamilton_cbc <- hamilton_cbc %>%
filter(str_detect(species, "\\["))

So, where are we at the moment? We have only one dataset left, and it looks like this:

This is all metadata and we can take it out of this dataset and put it in our overall_meta_data file. The only variable we will keep in the hamilton_cbc dataset is the calendar year, so that we can plot by year and so we can join to the metadata later if we want.

To join the count_participant_meta_data to the overall_meta_data, first we will parse the different variables out:

It turns out that how_many_counted also has a cw value, which means the bird was not seen on count day itself, but was seen on a day close to the count. I am going to set these bird counts to be NA, as they don’t have a specified value.