Final Project

For this assignment, you will create a Jupyter Notebook with your answers to the questions below, and submit this Jupyter Notebook to a Github repository for the Final Project following the instructions below Part II: Submit Your Jupyter Notebook to GitHub.

Structure of Final Project

In your Jupyter Notebook, provide your answers for Sections 1 AND 2 and then choose one option for Section 3: either option A or B. Indicate in your Markdown documentation which option you have chosen for Section 3.

Due Date

You need to complete this assignment (Final Project) by Sunday, August 26th at 8:00 AM (U.S. Mountain Daylight Time). See this link to convert the due date/time to your local time.

What You Need

Be sure that you have completed all of the lessons from the Earth Analytics Bootcamp. Completing the challenges at the end of the lessons will also help you with this assignment. Review the lessons as needed to answer the questions.

You will need to fork and clone a Github repository for the Final Project from https://github.com/earthlab-education/ea-bootcamp-final-project-yourusername. You will receive an invitation to the Github repository for the Final Project via CANVAS.

Note: the repository will be empty, as you will add a new Jupyter Notebook containing your answers to the questions below.

Be Sure to Add Documentation to Your Notebook

Use Markdown Titles to Document Workflow (5 pts)

Start with Markdown cell containing a Markdown title for this assignment, plus an author name and date in list form. Bold the words for author and date, but do not bold your name and today’s date.

Add a Markdown title (## Title) and some text before each code cell to identify each question.

Describe the purpose of your code (e.g. what are you accomplishing by executing this code?). Think carefully about how many cells you should have to best organize your data (hint: review lessons for examples of how code can be grouped into cells).

Use Python Comments to Document Code and Functions (5 pts)

Within code cells, be sure to also add Python comments to document each code block and use the PEP 8 guidelines to assign appropriate variable names that are short and concise but also clearly indicate the kind of data contained in the variable.

Be sure to add documentation within your functions using Python comments to tell the user what the function is doing and and what inputs it can take.

Import Python Packages (1 pt)

In the questions below, you will be working with numpy arrays and pandas dataframes.

You will also be downloading files using urllib.request and accessing directories and files on your computer using os. Last, you will also be creating plots of your data.

Import all of the necessary Python packages to accomplish these tasks.

Section 1: Questions 1-10 Using Pandas Dataframes

Get Data

Use .urllib.request to download the following .csv file of fires in California and import the data to a pandas dataframe:

CA_fires_1992_2015_gt_100_acres.csv from https://ndownloader.figshare.com/files/12835340

The data contains one record for every fire greater than 100 acres that occurred between 1992 and 2015. The dataset has columns for the size of the fire (acres) and for the year and month of the fire, along with other details about the cause, reporting agency, county name, etc.

Question 1: Explore Structure of the Pandas Dataframe (2 pts)

Use the appropriate functions to print the first few rows of the pandas dataframe and the last few rows of the dataframe.

Note: as this dataframe contains many records, it is not helpful to print the whole dataframe.

fd_unq_id

source_reporting_unit_name

fire_name

year

month

month_num

cause

fire_size

fire_size_class

state

county

0

1338131

Mendocino Unit

VANN

1992

February

2

Equipment Use

120.0

D

CA

NaN

1

216388

Yuma Field Office

WALTERS

1992

March

3

Debris Burning

1800.0

F

CA

NaN

2

218766

California Desert District

MESA

1992

April

4

Equipment Use

4200.0

F

CA

NaN

3

1373316

CDF - San Bernardino Unit

COLLINS

1992

April

4

Arson

125.0

D

CA

NaN

4

1373321

CDF - San Bernardino Unit

COVINGTON

1992

April

4

Arson

104.0

D

CA

NaN

fd_unq_id

source_reporting_unit_name

fire_name

year

month

month_num

cause

fire_size

fire_size_class

state

county

4096

300308392

Butte Unit

RICHVALE

2015

October

10

Equipment Use

250.0

D

CA

Butte

4097

300308084

CDF - San Benito-Monterey Unit

CIENEGA

2015

October

10

Miscellaneous

690.0

E

CA

San Benito

4098

300209443

Sequoia And Kings Canyon National Parks

BURNT

2015

October

10

Lightning

161.0

D

CA

NaN

4099

300293910

Colorado River Agency

HIGHWAY

2015

November

11

Missing/Undefined

323.0

E

CA

Riverside

4100

300293894

Ventura County Fire Department

SOLIMAR

2015

December

12

Missing/Undefined

1288.0

F

CA

Ventura

Question 2: Summarize Fire Size (4 pts)

Use the appropriate function to calculate summary statistics of only the fire size (acres).

In your Markdown documentation for this question, write a sentence or two stating:

the mean, minimum, and maximum fire size (acres) in this dataset.

the total number of fires in this dataset.

Hints:

It can helpful to determine how to select the data you need first before summarizing it.

You can also review how to run summary statistics on a specific column in a pandas dataframe.

fire_size

count

4101.000000

mean

2995.314133

std

13481.045403

min

100.100000

25%

180.000000

50%

354.000000

75%

1155.000000

max

315578.800000

Question 3: Calculate Total Number of Fires For Each Year (4 pts)

Use the appropriate function to calculate the total number of fires per year, and save as a new dataframe.

Note: the displayed data below only shows the first few rows in the dataset.

Hints:

Review the use of groupby to run statistics on pandas dataframes.

Think about what value you want to use to group the data and what value you want to use to determine the total number of fires.

fd_unq_id

year

1992

237

1993

187

1994

201

1995

189

1996

333

Question 4: Reset Index (2 pts)

Use the appropriate function to reset the index of the dataframe created in the previous question, so that the year returns to being a column. Save the reset dataframe as a new dataframe.

Note: the displayed data below only shows the first few rows in the dataset.

year

fd_unq_id

0

1992

237

1

1993

187

2

1994

201

3

1995

189

4

1996

333

Question 5: Plot Total Number of Fires For Each Year (2 pts)

Create a plot of your choice (i.e. type, color) that displays the total number of fires for each year of data.

Be sure to label your x- and y-axes appropriately and give your plot an appropriate title.

Hint:

Think about which dataframe you want to use for the plot and what data you need to plot.

Question 6: Convert Units For Fire Size (4 pts)

Write a function to convert the units of fire size from acres to hectares (i.e. a standard unit that represents 10,000 square meters). One hectare is equal to 2.47105 acres.

Question 7: Apply Function to Column (4 pts)

Run the function you created in the previous question to convert the units of the fire size in your pandas dataframe from acres to hectares.

Use the appropriate function to print only the first few rows to display the converted data.

Hint:

Review how to apply a function to a column in a pandas dataframe.

fd_unq_id

source_reporting_unit_name

fire_name

year

month

month_num

cause

fire_size

fire_size_class

state

county

0

1338131

Mendocino Unit

VANN

1992

February

2

Equipment Use

48.562352

D

CA

NaN

1

216388

Yuma Field Office

WALTERS

1992

March

3

Debris Burning

728.435281

F

CA

NaN

2

218766

California Desert District

MESA

1992

April

4

Equipment Use

1699.682321

F

CA

NaN

3

1373316

CDF - San Bernardino Unit

COLLINS

1992

April

4

Arson

50.585783

D

CA

NaN

4

1373321

CDF - San Bernardino Unit

COVINGTON

1992

April

4

Arson

42.087372

D

CA

NaN

Question 8: Calculate Mean Fire Size For Each Year (4 pts)

Use the appropriate function to calculate the mean fire size (in hectares) per year and save as a new dataframe.

Note: the displayed data below only shows the first few rows in the dataset.

Hints:

Review the use of groupby to run statistics on pandas dataframes.

Think about what value you want to use to group the data and what value you want to use to determine the mean size of fires.

fire_size

year

1992

461.297265

1993

642.637897

1994

771.721812

1995

417.441164

1996

825.744342

Question 9: Plot Mean Fire Size For Each Year (2 pts)

Create a plot of your choice (i.e. type, color) that displays the mean size of fires for each year of data.

Be sure to label your x- and y-axes appropriately and give your plot an appropriate title.

Hint:

Recall the step you completed in Question 4 to reset the index after the groupby.

Think about which dataframe you want to use for the plot and what data you need to plot.

Question 10: Discuss Results (4 pts)

Write a few sentences (2-3) on each of the following:

Do the number of fires appear to be increasing over time in California? Explain and support your answer using your plot of total number of fires per year.

Does the average size of fires appear to be increasing over time in California? Explain and support your answer using your plot of mean size of fires per year.

Which result (i.e. total number of fires or mean fire size per year) do you think provides a more appropriate measure of fire danger in California?

Section 2: Questions 11-19 Using Numpy Arrays

Get Data

Use .urllib.request to download the following .csv file of the number of fires by month and year in California and import the data to numpy arrays:

CA-fires-month-count-1992-to-2015.csv from https://ndownloader.figshare.com/files/12835346

The dataset contains a row for each year specified in the dataset name and contains a column for each month (starting with January through December). The values represent the number of fires that occurred in that month and year, based on fires greater than 100 acres that occurred between 1992 and 2015.

Question 11: Write Function to Calculate Sum Across Columns (4 pts)

Write a function that calculates the sum across columns of a numpy array.

Hints:

Recall which existing numpy function you can use to calculate a sum. You will include this function within the function you write to answer this question.

Review the lessons on functions to review the use of axes to calculate a statistic across the rows or columns of a numpy array.

Question 12: Execute Function to Calculate Sum Across Columns (2 pts)

Run the function created in the previous question (i.e. to calculate sum across columns in a numpy array) on the numpy array you created for CA-fires-month-count-1992-to-2015.csv. Save the output to a new numpy array.

Question 13: Write Function to Calculate Sum Across Rows (4 pts)

Write a function that calculates the sum across rows of a numpy array.

Hints:

Recall which existing numpy function you can use to calculate a sum. You will include this function within the function you write to answer this question.

Review the lessons on functions to review the use of axes to calculate a statistic across the rows or columns of a numpy array.

Question 14: Execute Function to Calculate Sum Across Rows (2 pts)

Run the function created in the previous question (i.e. to calculate sum across rows in a numpy array) on the numpy array you created for CA-fires-month-count-1992-to-2015.csv. Save the output to a new numpy array.

Question 16: Check Dimensions of Numpy Arrays (4 pts)

Write one conditional statement that checks that the dimensions (i.e. shape) are the same between:

the numpy array for the sum across columns and the numpy array containing the month names AND

the numpy array for the sum across rows and the numpy array containing the month names

Within your conditional statement, print a message stating whether or not both of these conditions are true.

Hint:

Compare the shape of the arrays, rather than the single value for the dimension.

Recall the operator to check equality between two values.

Review how to write a conditional statement that checks for two conditions.

Question 17: Plot Numpy Array (6 pts)

Imagine that you have been asked to write a short article for the public on the fire season (i.e. the range of time within a year in which fire is most likely to occur) in California.

Review the data in your summarized numpy arrays (i.e. sum of columns and sum of rows), and choose the one of these arrays to create to represent the fire season in California.

For your chosen array, create a plot of your choice (i.e. type, color). Be sure to label your x- and y-axes appropriately and to give your plots the approriate titles.

In your Markdown documentation, write a few sentences (1-2) to answer each of the following:

What do the values in each of these numpy arrays (i.e. the one for sum of columns and the one for sum of rows) represent?

Why did you choose the array that you plotted to represent the fire season in California?

Question 18: Discuss Results (6 pts)

Write a few sentences (1-2) on each of the following:

Based on the data you have analyzed, how would you define the fire season (i.e. the range of time within a year in which fire is most likely to occur) in California?

How could you modify your workflow to examine whether the fire season was expanding over time? Think about how the data is organized and how you could split it up to look at how the fire season was changing over time.

Question 19: Discuss Pandas Dataframes vs Numpy Arrays (6 pts)

In the numpy array section, you calculate the sum across columns. Write a short paragraph (3-4 sentences and include a list if desired) on the following:

How could you have analyzed the pandas dataframe to get the same values? Outline a pandas dataframe workflow to arrive at the same values.

Hint: think about the data provided in the original numpy array - do you have similar information in the pandas dataframe?

Section 3 - Option A: Questions 20-24 Using Pandas Dataframes

To answer these questions, use the same pandas dataframe that you previously imported from CA_fires_1992_2015_gt_100_acres.csv.

Question 20: Calculate Number of Fires By County (4 pts)

Use the appropriate function to calculate the total number of fires per county and save as a new dataframe.

Note: the displayed data below only shows the first few rows in the dataset.

Hints:

Review the use of groupby to run statistics on pandas dataframes.

Think about what value you want to use to group the data and what value you want to use to determine the total number of fires.

fd_unq_id

county

Alameda

7

Alpine

5

Amador

3

Butte

35

Calaveras

13

Question 21: Determine Top 5 Counties for Number of Fires (4 pts)

Sort your pandas dataframe from the previous question, so that you can determine the top five counties that have experienced the most fires.

Note: the displayed data below only shows the first few rows in the sorted dataset.

Section 3 - Option B: Questions 25-30 Using Numpy Arrays

Get Data

Use .urllib.request to download the following .csv file of the mean size of fires in California by month and import the data to numpy arrays:

CA-fires-month-mean-size-1992-to-2015.csv from https://ndownloader.figshare.com/files/12835349

The dataset contains a row for each year specified in the dataset name and contains a column for each month (starting with January through December). The values represent the mean size of fires that occurred in that month and year, based on fires greater than 100 acres that occurred between 1992 and 2015.

Question 25: Write Function to Convert Units (4 pts)

Write a function to convert the units of fire size (acres) to square kilometers. One square kilometer is equal to 247.105 acres.

Question 26: Write Function to Calculate Mean Across Columns (4 pts)

Write a function that calculates the mean across columns of a numpy array.

Hints:

Recall which existing numpy function you can use to calculate a mean. You will include this function within the function you write to answer this question.

Review the lessons on functions to see the use of axes to calculate a statistic across the rows or columns of a numpy array.

Question 27: Write Function to Execute Multiple Tasks (4 pts)

Write a function that executes both of the functions you wrote in Questions 25 and 26, in the appropriate order: the conversion from acres to square kilometers and then, the calculation of the mean of the columns on an input numpy array.

Hint:

Review how to pass an implicit variable from one function to another (i.e. the output of the first function becomes the input of the second function).

Question 28: Execute Function and Save Output (2 pts)

Execute the function created in the previous question to determine the mean across columns on values that have been converted from acres to square kilometers.

Guided Activity to Submit Pull Request to submit a pull request of your Jupyter Notebook for the Final Project to the Earth Lab repository for the Final Project (https://github.com/earthlab-education/ea-bootcamp-final-project-yourusername).

Include @jlpalomino in your message for the Pull Request to notify the instructor of your submission.