Barbara Shaurette

Using Google Cloud Functions to Create a Simple POST Endpoint

I was tempted to title this "how to use GCF to create a simple ETL process", but that's not quite what I'm demonstrating here.

It does loosely fit the description of an ETL process - the script extracts some values from a POSTed payload, rearranges some of the values to fit a specific schema, then loads the transformed payload into a data store.

But what you're going to see here is not the heavy lifting we normally think of when we see the acronym "ETL".

And maybe that's a good thing, as it illustrates the beautiful simplicity of Google's new Cloud Function service.

Some background:

I work on a data infrastructure team that already has an account and a project set up on Google Cloud Platform. That project is already associated with a data store - a BigQuery project/dataset. I'm not going to cover how to set all that up since it's out of scope here, but you can start with these docs: https://cloud.google.com/docs/

I'm currently working on a project to accept realtime event data from a media platform we work with. We expect the data to come in at a medium-to-high volume, but we're still in testing so I don't have details on how well this job will handle the volume or how well it will scale - that will come later.

The project:

What I am going to talk about is this flow, with some general info on how to build the tools I needed to handle each step:

vendor POSTs the event data payload to my http endpoint

we receive, validate, and transform the payload data

we write that data to a BigQuery table

The pieces I had to build to do this:

a Google Cloud Function

a BigQuery table

gcloud:

Before we go much further - assuming that you already have a Google Cloud project, with a BigQuery dataset, and all the permissions set up to link the two - you will also need the gcloudcommand line tool. Go here and follow the steps to install:

gcloud is what you'll use to deploy your function to Google Cloud. Installation will update your PATH to include the Google Cloud SDK in ~/.bash_profile. You may need to go through some authorization steps using the email address you have associated with your project. You may not need to add any gcloud components, although if you do instructions are included in the installation output.

For the example here, you should probably have these components:

BigQuery Command Line Tool

Cloud SDK Core Libraries

Cloud Storage Command Line Tool

Setting up the script:

In a local folder, do some of the basic setup you normally would to start a Python project:

create a main.py - this wil be your script

add a requirements.txt for any libraries you might need to install

use virtualenv to keep everything contained, particularly if you're going to test locally

In your main.py, you are free to build your Python script in whatever way works for you. You can import any libraries you might need, you script structure can be as simple or as complex as you need it to be.

The only key requirement is that you name a function that will be the entry point for your script - that name will be how your function is referenced in the GCP dashboard, and will be used to deploy the code to GCP.

You should put your code in Github or whatever your choice of repository is, but be aware that GCP also stores the most recent version of the source code. In your project, navigate to the functions dashboard, e.g.: