Mashup of Projects

Navigation

Exploring Alberta Public Data

There is a really interesting data set available here, which contains all of the oil & gas production in Alberta. There are a few cool ideas, as far as what to do with this data.

Join the volumetric flow to the pipelines, and create a map of pipeline flows

Join the volumetric flow to the wells, and create a map of the well flows

Find the price sensitivity/intercept for each producer, and facility.

Create network diagrams illustrating how the gas flows

Find the producers who react the most to market changes, and who have the highest producing wells

If you don't have time to read this, but just want to see the network diagrams, click here.
Looking through the page where the data is available, there are a few files to choose from. Monthly production data, going back to 2015, well license file, well infrastructure file. I found that I had to join the files in order to have enough information. Below is what I gathered from each file:

Well License File: Licence Number (used to join), Licensee Name (who, as far as we know the production belongs to).

Production Data: Lot's of information about the facilities that the wells like to. In order to join the last two tables to this, we will need to filter it for the product that we want (for example gas, or oil), and we only care about ActivityID's of PROD. Once we apply those filters, we can join using the FromToID. If the ActivityID is PROD, the FromToID, is a FromID.

If you want to follow along using R, then download all of those CSV files, and type the following:

library(dplyr)
library(ggplot2)
#Read the Well Infrastructure File for the Licensee, and WellID
inf % group_by(LicenseeName) %>% summarise(volume = sum(Volume), number = n()) %>% ungroup()
gasops$avg
The linked facility ID, is not always present in the well infrastructure file, so we will lose some volumes due to this.
Once I have joined all of those tables together, I could then join them to the gas price in the province, and have a look at the price sensitivity for each producer, and each facility. I would expect the volume produced to be a function of the square root of the price, or volume squared is proportional to the price. I tried price being directly proportional, and price being proportional to the square root of volume, and found the lowest residual error with the first (volume squared proportional to price).
To find out what it actually looks like, we will need to read in at least a years worth of data, so I will just make a for loop to rbind the monthly data into one big dataframe.

finalgas
Then loop through all the companies, and create a linear regression (y=mx+b), for all of the price/volume relationships.

#Find the price sensitivity for each company
for (company in unique(finalgas$LicenseeName)) {
model
We can just check that overall the model makes sense. Summarise the finalgas dataframe by month, and plot the price vs volume.

Overall that looks similar to what was expected. Now you can sort/filter/view the finalgas data frame to see at what price the facilities operate at their lowest flow. The most interesting of these facilities, would be the gas storage facilities, as they would focus on the price, whereas other producers may pay no attention to it. You can see how reactive the facilities are to the price based on the sensitivity variable.
The product will often pass through several facilities which could affect price sensitivity, so it would be useful to aggregate by the final facility. We could use a python script to chase from one facility to another and come up with the final destination facility (that I will call lastFacilityID), for each row (well) in our table. To do that, let’s first output a list of the facilities we are interested in.

write.csv(welldataprod[,4], "facilitylist.csv", row.names=FALSE)

We should also output a list of all the production with the ActivityID "REC", and the FromToID column, which would be a FromID when the ActivityID is REC.

received
Now we can jump over to python, and write a small script to:

Create a dictionary recording where a facility sends it’s produc, we will call this receivedDict

Run through the list of facilities we are interested in (1st level facilities, from the facilitylist.csv file). For each facility keep going to the next facility as defined by the receivedDict, until the current facility is not in the receivedDict, then add that current facility to the finalFacility dictionary.

Output a CSV of the last facilities, based on the final facility dictionary.

This blog post is getting a bit long, now that you can re-produce the source of the data, I will share a few visualisation results.

I created a python script to add all the well flows onto pipelines on the way between the well and the final facility. This of course took a very long time (around 40 hours). However, the only common variables between the pipeline shapefile, and the well production is the surface location. I loaded every pipeline with start/end surface locations, and traced every connection and only add the volumes to the pipelines which reach the final destination. Even doing this, there are still errors, because there will often be multiple pipeline routes through the same surface locations. Until there are more fields to join by, I don't think this is accurate enough. Below is the result.
I also joined the well production into postgis, then exported a shape file, and opened it in QGIS. There are lot's of fields to join by, so there should be no errors in this.

The final thing, is another python script that went through each facility to the final facility, and made a diagram for it, which you can view here, along with the Y-Intercept for the linear regression model, which should be the facilities break even price, in $/GJ assuming that production is elastic.