Cost Effective Docker Jobs on Google Cloud

Recently, I wanted to run some jobs using docker images. I’m a huge advocate of using Docker, so naturally I was going to build a Docker image, run my Python scripts, and then schedule said job to run on a configurable schedule basis. Doing so on AWS is pretty easy by using lamda and step functions; however, since this wasn’t a paid gig and I wasn’t able to get someone to fork the bill, enter Google Cloud!

Google Cloud Platform (GCP) is, in a way, the new kid on the block. AWS has a long history with the cloud platform and excellent customer support, whereas Google’s customer service is a bit like Bigfoot: you’ve heard of it, some people say they’ve seen it, but it doesn’t really exist. However, Google is still an amazing tech company: they release early and they improve their products to make them awesome(e.g. Android). And best of all, they offer 300 free credits. So I decided to go for Google, how bad could it be?
In this post, I’ll talk about how I set up the Google Cloud to work for me. It took blood, sweat, and tears, but I got it working. I scheduled a job occasionally: I spin up a cluster of instances, run the job, and shut it down! Not only is that cool (ya, I’m a geek), it’s also quite cost-effective.

I will outline what I did, and even share my code with you:.
Here goes:

Step 1 – Build docker image and push to Google Cloud private registry

The first step was the easiest and most trivial. It is pretty much the same as AWS.

Create a build docker image

Let’s start with creating a build image. GitLab CI allows you to use your own image as your build machine. If you’re using a different CI, I leave it to you to adjust this for your own system.

1

2

3

4

5

6

7

8

9

10

from docker:latest

RUN apk add--no-cache python py2-pip curl bash

RUN curl-sSL https://sdk.cloud.google.com | bash

ENV PATH$PATH:~/google-cloud-sdk/binser

RUN pip install docker-compose

This a Dockerfile for the build machine. It uses a docker machine, pulls pip, and installs gcloud.

Then I push this build image to docker-hub. If you haven’t done this before you need to:
1) Singup to docker cloud https://hub.docker.com and remember your username.

Create a GCP service account

1

You have to create a service account, give it access to the registry, then export the key file as JSON. This is very simple step. If you’re unsure how to do it, just click through the IAM / Admin – you need to create a user, give it an IAM and export the key.

Customize CI Script to push to private registery

Once this is all done and you have your build machine, we can work on your CI script. I will show you how to do this on GitLab CI, but you can adapt this for your own environment. First create a build environment variable called CLOUDSDK_JSON and paste the contents of the JSON key you created in the previous step as the value of that key. Then add the following: .gitlab-ci.yaml file to your project.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

image:/build-machine

services:

-docker:dind

stages:

-build

-test

-deploy

before_script:

-apk add--no-cache python py2-pip

-pip install--no-cache-dir docker-compose

-docker version

-docker-compose version

-gcloud version

build_image:

stage:build

except:

-develop

-master

script:

-docker build-t:latest.

deploy:

stage:deploy

only:

-develop

-master

script:

-docker build-t:latest.

-echo$CLOUDSDK_JSON&gt;key.json

-gcloud auth activate-service-account--key-file=key.json

-docker tag:latest$PRIVATE_REGISTERY/:latest

-gcloud docker--push$PRIVATE_REGISTERY/:latest

-gcloud auth revoke

Adjust the job-image-name to your job docker image name, service_account_name to the service account name you created and the build image to the image you pushed to docker hub. This YAML file is directed as a python job, but you can change it to any other language.
I have 3 stages: build, test, and deploy.
I build and test on all branches, but only deploy on master. GitLab CI has an issue, each step can happen on a different machine, so my first build step isn’t kept to the deploy phase, which forced me to re-build in the deploy phase.

Once this is done, you CI system should be pushing your image to your Google private registry, well done!

Step 2 – Running Jobs in a Tеmp cluster

Here comes the tricky part. Since jobs only need to run every x time, and only for a limited period, it’s ideal to be run as a Google function. However those are limited to one hour, and can only be written in JavaScript (AWS support multiple languages with lamda and with state machines). Since I didn’t want to pay for full-time cluster time running, I had to develop my own way to run jobs.

Kubernetes Services

Controlling jobs in a cluster and cluster control can be achieved using Kubernetes. This is one part of GCP that really shines: it let’s you define services, jobs, pods (a collection of containers), and then run them.

To do this, I wrote a Kubernetes Service class in Python that will:
– Spin up / create a cluster.
– Launch docker containers on the cluster.
– Once jobs finish, shut down the cluster.

1

2

3

4

5

6

7

8

9

classKubernetesService():

def __init__(self,namespace='default'):

self.api_instance=kubernetes.client.BatchV1Api()

service=build('container','v1')

self.nodes=service.projects().zones().clusters().nodePools()

self.namespace=namespace

This is the class and constructor. The full code for this class has more configuration and env variables, as is part of the App Engine Cron project. I will include repo if you want full details on how to achieve this.

1

2

3

4

5

6

7

def setClusterSize(self,newSize):

logging.info("resizing cluster {} to {}".format(CLUSTER_ID,newSize))

self.nodes.setSize(projectId=PROJECT_ID,zone=ZONE,

clusterId=CLUSTER_ID,nodePoolId=NODE_POOL_ID,

body={"nodeCount":newSize}).execute()

This function can control the cluster size. It can spin it up before jobs need to be run, then shut it down afterwards:

kubernetes_job function creates containers (an additional function that creates container objects with env variables. Containers are then part of a pod, and that pod is part of a job template that is part of a job spec. You can read more about it in the Kubernetes docs.

If you don’t want to code to continue to wait for the jobs, you can poll for completion, and that is what shutdown_cluster_on_jobs_complete is for. It will shutdown the cluster once there are no running jobs.

This class controls the entire job scheduling and ensures their execution is successful.
It’s part of an appengine (however, they can be used independently).
Next we need to have this script scheduled or triggered to activate.
And that is our cron scheduler task.

Cron scheduler appengine service

Sadly, Google doesn’t give you an easy way to run code in the Cloud; you actually have to write more code to run code (silly, right?)

The concept is that the appengine provides you with a cron web scheduler that calls you own apps endpoints in given intervals.

First, you add cron.yaml to your project and you configure which endpoint and time interval to hit that endpoint:

1

2

3

4

5

6

7

8

9

cron:

-description:task tokick off all updates

url:/events/run-jobs

schedule:every2hours

-description:task toshutdown jobs when finished

url:/events/shutdown-jobs

schedule:every5min

Then we can add a handler to shut down the jobs, and to kick them off.

Last we want to add a Setting class to load env like variables from the datastore:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

import os

from google.appengine.ext import ndb

ifos.getenv('SERVER_SOFTWARE','').startswith('Google App Engine/'):

PROD=True

else:

PROD=False

classSettings(ndb.Model):

name=ndb.StringProperty()

value=ndb.StringProperty()

@staticmethod

def get(name):

NOT_SET_VALUE="NOT SET"

retval=Settings.query(Settings.name==name).get()

ifnotretval:

retval=Settings()

retval.name=name

retval.value=NOT_SET_VALUE

retval.put()

ifretval.value==NOT_SET_VALUE:

raise Exception(('Setting %s not found in the database. A placeholder '+

'record has been created. Go to the Developers Console for your app '+

'in App Engine, look up the Settings record with name=%s and enter '+

'its value in that record\'s value field.')%(name,name))

returnretval.value

Note that most of the app depends on the datastore. Sadly, Google doesn’t allow you to have env variables easily, but you can set up env variables in the datastore.
For this I added a class called Settings.

Then we just add bind the route handler:

1

2

3

4

5

6

7

import webapp2

app=webapp2.WSGIApplication([('/events/run-jobs',RunJobsHandler)],

debug=True)

This should allow our app to spin up a cluster, launch containers, and then shut down the cluster. In my code, I also added a handler for the shutdown.

Then make sure you have gcloud installed (here is how and just deploy the appengine using the gcloud deploy command and you should be good to go ( here is how
While my example runs the same docker image, and just has different operation with different env variables, you can easily adjust this code to suit whatever need you might have.
Here is the full git repo: gcp-optimized-jobs
Hope you find it useful!