In this lesson series your overall goal is to compare tree height measurements taken by humans in the field to height values extracted from a lidar remote sensing canopy height model. You are working with two different types of data:

Raster - the lidar canopy height model and

Vector point location data

In the previous lesson, you learned how to extract raster values from an area derived by create a buffer region around each point in a shapefile. In this lesson you will summarize the human made measurements and then compare them to lidar.

Is Lidar Derived Tree Height the Same As Human Measured Tree Height?

You have now done the following:

You’ve opened and cleaned up some lidar canopy height model data

You’ve extracted height values for the field plot locations where humans measured trees.

Next, you need to summarize the in situ collected tree height data, measured within circular plots across our study area. You will then compare the maximum measured tree height value to the maximum LiDAR derived height value for each circular plot.

For this lesson, you will use the a .csv (comma separate value) file, located in SJER/2013/insitu/veg_structure/D17_2013_SJER_vegStr.csv.

First determine the number many plots are in the tree height data. Note that the tree height data is stored in .csv format.

Before you go any further, you may want to select just the columns that you will need in your analysis. This will make your data a bit cleaner. In some cases you will not want to drop columns. However for this lesson, there is no reason to keep the extra data as you won’t use it in this analysis!

Summarize Tree Height Data Using Pandas

You want to calculate a summary value of max tree height (the tallest tree measured) in each plot. You have a unique id for each plot - plotid that can be used to group the data. The tree height values themselves are located in the stemheight column.

You can calculate this by using the .groupy() method in pandas.

The steps are

.groupby() - group the data by the plotid column - your unique identifier for each plot.

.agg() - provide the summary statistics that you want to return for each plot. in this case max and mean.

below ['stemheight]` is the name of the column that you want to summarize.

You are almost done sumarizing your data. For expressive and reproducible reasons, add the word insitu to each column header so it’s very clear which data columns are human measured. This is important given you will MERGE this data frame with the data frame containing lidar mean, min and max values.

Notice that below you use a pythonic approach to creating for loops. Rather than looping through each column and appending the word “insitu”, you create a pythonic for loop which populates a list. You then reassign that list fo the column names for the insitu_stem_ht dataframe.

['insitu_'+colforcolininsitu_stem_ht.columns]

['insitu_plotid', 'insitu_insitu_mean', 'insitu_insitu_max']

Rename each column - appending “insitu”.

# Add insitu to each column name to make your data more expressiveinsitu_stem_ht.columns=['insitu_'+colforcolininsitu_stem_ht.columns]# Reset the index (plotid)insitu_stem_ht=insitu_stem_ht.reset_index()insitu_stem_ht.head()

plotid

insitu_mean

insitu_max

0

SJER1068

3.866667

19.3

1

SJER112

8.221429

23.9

2

SJER116

8.218750

16.0

3

SJER117

6.512500

11.0

4

SJER120

7.600000

8.8

Merge InSitu Data With Spatial data.frame

Once you have our summarized insitu data, you can merge it into the centroids data frame. Merge requires two data.frames and the names of the columns containing the unique ID that we will merge the data on. In this case, you will merge the data on the plot_id column. Notice that it’s spelled slightly differently in both data.frames so we’ll need to tell Python what it’s called in each dataframe.

Note that if you want to merge two GeoDataFrames together, you cannot use the standard Pandas merge function. This will turn the GeoDataFrame into a regular DataFrame. Instead, you need to use the mergemethod of a GeoDataFrame object, like so:

Column Names Matter

Take note that while you you don’t have to rename the columns as you did above in order to successfully computer your final merged dataframe, it helps if you do because

Now anyone looking at your data knows what each column represents.

If you export the data to a text file, your columns are named expressively

If you return to this analysis in 6 months, you will still be able to quickly understand what data are in each column if they are well named!

Plot Data (CHM vs Measured)

You’ve now merged the two dataframes together. Your are ready to create your first scatterplot of the data. You can use the pandas .plot() to create a scatterplot (or you can use matplotlib directly). The example below uses pandas plotting.

Scatterplot showing the relationship between lidar and measured tree height with a 1:1 line.

OPTIONAL - Export Results as a .csv File

You may want to export your final analysis file as a .csv file. You can use the Pandas .to_csv() method to export a dataframe. .to_csv requires a directory that exists on your computer and a file name like this:

# Export the final data frame as a csv file.SJER_final_height_df.to_csv("data/spatial-vector-lidar/outputs/sjer-lidar-insitu-merge.csv")

Create Map of Plot Locations Sized by Tree Height

Finally, you may want to create a map where points are sized according to tree height. To do that you

Create or use a point geometry for each site location. In the case below we are using the data.frame that had the buffered points, and updating the geometry so that it is a point rather than the buffered polygon geometry.

Then you set the point markersize using an attribute in your geodataframe. In the example below you use insitu_maxht

fig,ax=plt.subplots(figsize=(10,10))ax.imshow(SJER_chm_data,cmap='Greys',extent=sjer_chm_plt)# Plot centroids of each geometry as points so that you can control their sizeSJER_final_height.centroid.plot(ax=ax,marker='o',markersize=SJER_final_height['insitu_max']*80,c='purple')plt.show()

Map showing plot locations with points sized by the height of vegetation in each plot overlayed on top of a canopy height model.

Optional - Create Difference Bar Plot: Lidar vs Measured

The last comparison that you may wish to explore is the plot by plot difference between lidar and measured tree height data. This is often helpful when you are trying to troubleshoot outlier values in your data. For instance you may notice that a few plots have very large differences between lidar and measured tree height.

You may decide to either:

Visit the sites if you are close to the field site or

Explore imagery for the sites to see if you can figure out a good reason for why the results may be so different.