Deploy InfluxDB and Grafana on Kubernetes to collect Twitter stats

Monitor your Twitter stats with a Python script, InfluxDB, and Grafana running in Kubernetes or OKD.

Subscribe now

Get the highlights in your inbox every week.

Kubernetes is the de facto leader in container orchestration on the market, and it is an amazingly configurable and powerful orchestration tool. As with many powerful tools, it can be somewhat confusing at first. This walk-through will cover the basics of creating multiple pods, configuring them with secret credentials and configuration files, and exposing the services to the world by creating an InfluxDB and Grafana deployment and Kubernetes cron job to gather statistics about your Twitter account from the Twitter developer API, all deployed on Kubernetes or OKD (formerly OpenShift Origin).

What you'll learn

This walkthrough will introduce you to a variety of Kubernetes concepts. You'll learn about Kubernetes cron jobs, ConfigMaps, Secrets, Deployments, Services, and Ingress.

If you choose to dive in further, the included files can serve as an introduction to Tweepy, an "easy-to-use Python module for accessing the Twitter API," InfluxDB configuration, and automated Grafana dashboard providers.

Architecture

This app consists of a Python script that polls the Twitter developer API on a schedule for stats about your Twitter account and stores them in InfluxDB as time-series data. Grafana displays the data in human-friendly formats (counts and graphs) on customizable dashboards.

Get a Twitter developer API account

Clone the TwitterGraph repo

The TwitterGraph GitHub repo contains all the files needed for this project, as well as a few to make life easier if you want to do it all over again.

Set up InfluxDB

InfluxDB is an open source data store designed specifically for time-series data. Since this project will poll Twitter on a schedule using a Kubernetes cron job, InfluxDB is perfect for holding the data.

Configure InfluxDB credentials using secrets

Currently, Kubernetes is running an InfluxDB container with the default configuration from the docker.io/influxdb:1.6.4 image, but that is not necessarily very helpful for a database server. The database needs to be configured to use a specific set of credentials and to store the database data between restarts.

Kuberenetes secrets are a way to store sensitive information (such as passwords) and inject them into running containers as either environment variables or mounted volumes. This is perfect for storing database credentials and connection information, both to configure InfluxDB and to tell Grafana and the Python cron job how to connect to it.

You need four bits of information to accomplish both tasks:

INFLUXDB_DATABASE—the name of the database to use

INFLUXDB_HOST—the hostname where the database server is running

INFLUXDB_USERNAME—the username to log in with

INFLUXDB_PASSWORD—the password to log in with

Create a secret using the kubectl create secret command and some basic credentials:

This command creates a "generic-type" secret (as opposed to "tls-" or "docker-registry-type" secrets) named influxdb-creds populated with some default credentials. Secrets use key/value pairs to store data, and this is perfect for use as environment variables within a container.

As with the examples above, the secret can be seen with the kubectl get secret command:

Now that the secret has been created, it can be shared with the InfluxDB pod running the database as an environment variable.

To share the secret with the InfluxDB pod, it needs to be referenced as an environment variable in the deployment created earlier. The existing deployment can be edited with the kubectl edit deployment command, which will open the deployment object in your system's default editor set. When the file is saved, Kubernetes will apply the changes to the deployment.

To add environment variables for each of the secrets, the pod spec contained in the deployment needs to be modified. Specifically, the .spec.template.spec.containers array needs to be modified to include an envFrom section.

Using the command kubectl edit deployment influxdb, find that section in the deployment (this example is truncated):

This section describes a very basic InfluxDB container. Secrets can be added to the container with an env array for each key/value to be mapped in. Alternatively, envFrom can be used to map all the key/value pairs into the container, using the key names as the variables.

For the values in the influxdb-creds secret, the container spec would look like this:

After editing the deployment, Kubernetes will destroy the running pod and create a new one with the mapped environment variables. Remember, the deployment describes the desired state, so Kubernetes replaces the old pod with a new one matching that state.

You can validate that the environment variables are included in your deployment with kubectl describe deployment influxdb:

Environment Variables from:
influxdb-creds Secret Optional: false

Configure persistent storage for InfluxDB

A database is not very useful if all of its data is destroyed each time the service is restarted. In the current InfluxDB deployment, all of the data is stored in the container and lost when Kubernetes destroys and recreates pods. A PersistentVolume is needed to store data permanently.

To get persistent storage in a Kubernetes cluster, a PersistentVolumeClaim (PVC) is created that describes the type and details of the volume needed, and Kubernetes will find a previously created volume that fits the request (or create one with a dynamic volume provisioner, if there is one).

Unfortunately, the kubectl CLI tool does not have the ability to create PVCs directly, but a PVC can be specified as a YAML file and created with kubectl create -f <filename>:

From the output above, you can see the PVC influxdb was matched to a PV (or Volume) named pvc-27c7b0a7-1828-11e9-831a-0800277ca5a7 (your name will vary) and bound (STATUS: Bound).

If your PVC does not have a volume, or the status is something other than Bound, you may need to talk to your cluster administrator. (This process should work fine with MiniKube, MiniShift, or any cluster with dynamically provisioned volumes.)

Once a PersistentVolume has been assigned to the PVC, the volume can be mounted into the container to provide persistent storage. Once again, this entails editing the deployment, first to add a volume object and second to reference that volume within the container spec as a volumeMount.

Expose InfluxDB (to the cluster only) with a Service

By default, pods in this project are unable to talk to one another. A Kubernetes Service is required to "expose" the pod to the cluster or to the public. In the case of InfluxDB, the pod needs to be able to accept traffic on TCP port 8086 from the Grafana and cron job pods (which will be created later). To do this, expose (i.e., create a service for) the pod using a Cluster IP. Cluster IPs are available only to other pods in the cluster. Do this with the kubectl expose command:

Some of the details (specifically the IP addresses) will vary from the example. The "IP" is an IP address internal to your cluster that's been assigned to the service through which other pods can communicate with InfluxDB. The "Endpoints" are the container's IP and port that's listening for connections. The service will route traffic to the internal cluster IP to the container itself.

Set up Grafana credentials and config files with secrets and ConfigMaps

Building on what you've already learned, configuring Grafana should be both similar and easier. Grafana doesn't require persistent storage, since it's reading its data out of the InfluxDB database. It does, however, need two configuration files to set up a dashboard provider to load dashboards dynamically from files, the dashboard file itself, a file to connect the dashboard file to InfluxDB as a data source, and finally a secret to store default login credentials.

The credentials secret works the same as the influxdb-creds secret already created. By default, the Grafana image looks for environment variables named GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD to set the admin username and password on startup. These can be whatever you like, but remember them so you can use them to log into Grafana when you have it configured.

Create a secret named grafana-creds for the Grafana credentials with the kubectl create secret command:

Share this secret as an environment variable using envFrom, this time in the Grafana deployment. Edit the deployment with kubectl edit deployment grafana and add the environment variables to the container spec:

Validate that the environment variables have been added to the deployment with kubectl describe deployment grafana:

Environment Variables from:
grafana-creds Secret Optional: false

That's all that's required to start using Grafana. The rest of the configuration can be done in the web interface if desired, but with just a few config files, Grafana can be fully configured when it starts.

Kubernetes ConfigMaps are similar to secrets and can be consumed the same way by a pod, but they don't store the information obfuscated within Kubernetes. Config maps are useful for adding configuration files or variables into the containers in a pod.

The Grafana instance in this project has three config files that need to be written into the running container:

influxdb-datasource.yml—tells Grafana how to talk to the InfluxDB database

grafana-dashboard-provider.yml—tells Grafana where to look for JSON files describing dashboards

twittergraph-dashboard.json—describes the dashboard for displaying the Twitter data collected

Kubernetes makes it easy to add these files: they can all be added to the same config map at once, and they can be mounted to different locations on the filesystem despite being in the same config map.

If you have not done so already, clone the TwitterGraph GitHub repo. These files are really specific to this project, so the easiest way to consume them is directly from the repo (although they could certainly be written manually).

From the directory with the contents of the repo, create a config map named grafana-config using the kubectl create configmap command:

The kubectl create configmap command creates a config map named grafana-config and stores the contents as the value for the key specified. The --from-file argument follows the form --from-file=<keyname>=<pathToFile>, so in this case, the filename is being used as the key for future clarity.

Like secrets, details of a config map can be seen with kubectl describe configmap. Unlike secrets, the contents of the config map are visible in the output. Use kubectl describe configmap grafana-config to see the three files stored as keys in the config map (results are truncated because they're looooooong):

Each of the filenames should be stored as keys and their contents as the values (such as the grafana-dashboard-provider.yml above).

While config maps can be shared as environment variables (as the credential secrets were above), the contents of this config map need to be mounted into the container as files. To do this, a volume can be created from config map in the grafana deployment. Similar to the persistent volume, use kubectl edit deployment grafana to add volume .spec.template.spec.volumes:

Then edit the container spec to mount each of the keys stored in the config map as files in their respective locations in the Grafana container. Under .spec.template.spec.containers, add a volumeMounts section for the volumes:

The name section references the name of the config map volume and adding the subPath items allows Kubernetes to mount each file without overwriting the rest of the contents of that directory. Without it, /etc/grafana/provisioning/datasources/influxdb-datasource.yml for example, would be the only file in /etc/grafana/provisioning/datasources.

Each of the files can be verified by looking at them within the running container using the kubectl exec command. First, find the Grafana pod's current name. The pod will have a randomized name similar to grafana-586775fcc4-s7r2z and should be visible when running the command kubectl get pods:

# list of datasources to insert/update depending
# what's available in the database
datasources:
# <string, required> name of the datasource. Required
- name: influxdb

Expose the Grafana service

Now that it's configured, expose the Grafana service so it can be viewed in a browser. Because Grafana should be visible from outside the cluster, the LoadBalancer service type will be used rather than the internal-only ClusterIP type.

For production clusters or cloud environments that support LoadBalancer services, an external IP is dynamically provisioned when the service is created. For MiniKube or MiniShift, LoadBalancer services are available via the minikube service command, which opens your default browser to a URL and port where the service is available on your host VM.

The Grafana deployment is listening on port 3000 for HTTP traffic. Expose it using the LoadBalancer-type service using the kubectl expose command:

As mentioned above, MiniKube and MiniShift deployments will not automatically assign an EXTERNAL-IP and will be listed as <pending>. Running minikube service grafana (or minikube service grafana --namespace <namespace> if you created your deployments in a namespace other than Default) will open your default browser to the IP and port combo where Grafana is exposed on your host VM.

At this point, Grafana is configured to talk to InfluxDB and has automatically provisioned a dashboard to display the Twitter stats. Now it's time to get some actual stats and put them into the database.

Create the cron job

A Kubernetes cron job, like its namesake cron, is a way to run a job on a particular schedule. In the case of Kubernetes, the job is a task running in a container: a Kubernetes job scheduled and tracked by Kubernetes to ensure its completion.

Create a secret for the Twitter API credentials

The cron job uses your Twitter API credentials to connect to the API and pull the stats from environment variables inside the container. Create a secret to store the Twitter API credentials and the name of the account to gather the stats from (substitute your own credentials and account name below):

Create a cron job

Finally, it's time to create the cron job to gather statistics. Unfortunately, kubectl doesn't have a way to create a cron job directly, so once again the object must be described in a YAML file and loaded with kubectl create -f <filename>.

Looking over this file, the key pieces of a Kubernetes cron job are evident. The cron job spec contains a jobTemplate describing the Kubernetes job to run. In this case, the job consists of a single container with the Twitter and InfluxDB credentials' secrets shared as environment variables using the envFrom that was used in the deployments.

Wrapping the job template spec are the cron job spec options. The most important part, outside of the job itself, is arguably the schedule, set here to run every 3 minutes forever. The other important bit is the concurrencyPolicy, which is set to replace, so if the previous job is still running when it's time to start a new one, the pod running the old job is destroyed and replaced with a new pod.

Use the kubectl create -f cronjob.yml command to create the cron job:

kubectl create -f cronjob.yaml
cronjob.batch/twittergraph created

The cron job can then be validated with kubectl describe cronjob twittergraph (example truncated for brevity):

Note: With a schedule set to */3 * * * * , Kubernetes won't immediately start the new job. It will wait three minutes for the first period to pass. If you'd like to see immediate results, you can edit the cron job with kubectl edit cronjob twittergraph, and (temporarily) change the schedule to * * * * * to run every minute. Just don't forget to change it back when you're done.

Success!

That should be it. If you've followed all the steps correctly, you will have an InfluxDB database, a cron job collecting stats from your Twitter account, and a Grafana deployment to view the data. For production clusters or cloud deployments of Kubernetes or OpenShift, visit the LoadBalancer IP to log into Grafana using the credentials you set earlier with the GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD. After logging in, select the TwitterGraph dashboard from the Home dropdown at the top-left of the screen. You should see something like the image below, with current counts for your followers, folks you are following, status updates, likes, and lists. It's probably a bit boring at first, but if you leave it running, over time and with more data collection, the graphs will start to look more interesting and provide more useful data!

Where to go from here

The data collected by the TwitterGraph script is relatively simplistic. The stats that are collected are described in the data_points dictionary in the app.py script, but there's a ton of data available. Adding a new cron job that runs daily to collect the day's activity (number of posts, follows, etc.) would be a natural extension of the data. More interesting, probably, would be correlating the daily data collection, e.g., how many followers were gained or lost based on the number of posts that day, etc.

Topics

About the author

Chris Collins - Chris Collins is an SRE at Red Hat and a Community Moderator for OpenSource.com. He is a container and container orchestration, DevOps, and automation evangelist, and will talk with anyone interested in those topics for far too long and with much enthusiasm.
Prior to working at Red Hat, Chris spent thirteen years with Duke University, variously as a Linux systems administrator, web hosting architecture and team lead, and an automation engineer.
In his free time, Chris enjoys brewing beer,...

Footer

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.

Opensource.com aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other countries.