Workshop 2 – Calculating and visualizing USEITI data using QGIS

By Waseem Mardini on July 11, 2016

On June 7, Publish What You Pay – United States held the second training workshop on using extractives data and QGIS, an open source GIS mapping application. This time around, we delved deeper into visualizing data, and explored how to calculate a new data point from imported datasets in QGIS. ​

The first training workshop, held on May 19, centered around cleaning and organizing data in preparation for use in data analysis and data visualization software. If you missed that training, we recommend that you watch the first workshop video recording and follow along in the training manual.

The training instructor, Diana Parker, walked participants through the steps of importing data andshapefiles, gave a refresher on the basic visualization options introduced in the first training session, and explained more advanced visualization techniques in QGIS.

Diana also demonstrated how QGIS can be used to compare datasets and calculate a new field using pre-existing data. For example, we determined the the
Federal revenue received per barrel of oil, by county for a single year. We then visualized that data on a map of the US.

​To determine the estimated revenue per barrel, participants were instructed to import the shapefile depicting US county boundaries as avector layer in QGIS. Next, we imported the Federal oil and gas production by county and the Federal oil and gas revenue by county datasets into QGIS.

For the purposes of this training these datasets have already been sorted and organized from the original USEITI datasets for use in QGIS and made available to training participants here. To learn how to prepare the Federal production and revenue USEITI datasets refer to pages 1-4 of the workshop 2 training manual or watch the full walkthrough covered in the first workshop.

Both of these datasets include FIPS codes, which are unique identifiers for
geographic information throughout the US. For data to be placed on a map using software like QGIS, the data will need to have some form of geographical
identifier. The FIPS codes in the datasets used in this training relate to state and county locations. To associate the data with the correct location in
QGIS, the two datasets were joined to the US county boundaries shapefile and the FIPS field was matched to the GEOID target field in the
US county shapefile.
​
With production and revenue datasets joined, Diana then instructed participants to duplicate the shapefile layer and to rename the new layer to “USD per
BBL (2013).” With this new layer selected participants used the QGIS calculator function to create a new attribute “USDBBL13” using the expression
{“Oil_bbl13” / “ProdOil13”}. The expression instructs QGIS to divide the Federal oil revenue by the oil production on Federal land for each county
revealing an estimated revenue figure per barrel of oil for each county for the year 2013. The “USDBBL13” attribute can then be selected as the value to be
visualized in the new layer as seen below.

The visualization raises some interesting questions. If you look at the legend in the map above you will see that the counties highlighted in red have
revenue per barrel of a net negative amount. On the other end of the spectrum we can see the data showing revenue per barrel of up to $362,022. The way the
data is presented above is good example of why you should be checking the data and visualizations as you go, as well as the importance of using the right
data classifications. By changing the data classifications for the legend we reveal that the negative and extremely high revenue per barrel values only
accounted for three counties included in the dataset, as can be seen in the map below (look for the lowest values in yellow and highest in light blue).
Obviously there are still serious questions about revenue per barrel values. However, by adjusting the legend to more appropriately fit the underlying data
we get a more useful visualization as seen in the map below. For more on how to choose the right data classification, refer to p. 12 of the workshop 2 training manual.

When working with several datasets and numerous layers of shapefiles, things can get complicated and it is important to keep your work organized. As Diana
guided participants through the training she cited best practices to follow while working with QGIS.

Before starting a project, set aside a folder to house all the relevant files, including the csv files, shapefiles, and the QGIS project file.

Save your work often.

When working with multiple datasets in a single map you will often duplicate layers, as Diana demonstrates in this training. It is important to
rename duplicated layers to keep things organized.

After datasets are joined to shapefiles, the datasets themselves can be safely deleted from the layers panel window.

As additional layers with visualized data are added they can be dragged up and down in the layers panel to change the stacking order of the
visualizations on the map.

The visualization of each layer can be toggled on and off, allowing you to check that the data you want depicted in your final map is not obscured
behind other layers.

This training also covered some additional interesting ways to visualize data and produce exportable maps as seen in the screenshot above. The map above
depicts the revenue for oil & gas production on Federal lands for each county in Montana in one layer. In an additional layer, pie charts present the
percentage of total revenue from different resource types or exploratory projects.

You can watch the full recording of the second workshop here to learn more about visualizing data in QGIS. Follow this link to download the training manual and datasets. After viewing the course, please share your feedback by filling out our post-course survey. We have additional data skills training workshops planned for the future and your input can help in determining the topics we cover.

————

Waseem Mardini is the Policy Advisor at PWYP-US, follow him on Twitter @pwypusa