Building Functions with riff

There are many ways to package and run workloads in the cloud. The newest, and most interesting is functions, or serverless.

Industry analysts, researchers, and the tech press agree that the serverless wave is happening. The hyperscale cloud providers have all started offer “functions as a service,” coupled with the rest of their cloud service offerings.

Pivotal’s contribution to the functions movement is riff, an open-source project, recently unveiled at SpringOne Platform.

We’ve seen a surge in interest around riff since its release. It is easy to learn and get up and running with riff. In this post, we will quickly walk through what project Riff is, its benefits and a few use cases that may spark some ideas.

I was able to learn and understand all of riff in about 2 hours. riff is also great because I can use a standard container listening on port 8080 to back my functions. I also like the function side-car approach; makes it really easy to add support for new language runtimes.

Why Functions? And Why is it Called “Serverless”?

Serverless refers to the fact that developers or operators do not need to provision or maintain the underlying infrastructure needed to run functions. The model is popular for the reason any new piece of developer tech goes mainstream: it’s a higher-level of abstraction that makes life easier. With functions, a developer can execute a small slice of code in response to an event, called a trigger. When the event occurs, the code runs, performing its tightly-scoped function.

How does this make life easier for your engineers? A few ways.

The function is narrowly defined. It’s intended to perform a specific, simple job. This eschews a more ambitious (and cumbersome) scope. (Value Stream Mapping and lean principles tell us that smaller iterations lead to big velocity improvements.)

Event integration is built-in. You don’t have to manage this separately.

Operational efficiencies. Applying functions to distributed computing automates event-based scheduling and self-scaling. That means less grunt work for ops teams.

As an added bonus, it can save you a boatload of money on infrastructure, either on-prem or in the public cloud. Compared to code running on virtual machines and containers, functions consume fewer resources because they don't run when idle, and they scale based on actual load.

Meet riff

riff is a service for executing functions triggered by events. Like Cloud Foundry, riff can run on-premises and in the public cloud. Here’s a look at the key features of riff:

Portability & Kubernetes-native support. riff extends Kubernetes (k8s) by defining custom resource definitions. These are represented in YAML and posted to the k8s API server. You can run riff locally, and anywhere K8s runs. Here are a few examples of riff running atop a K8s services in the public cloud like GKE:

Support for many languages. Since functions are packaged as containers, they can be written in a variety of languages: All you need is a function invoker for the language you are using. riff already has invokers for Java, NodeJS, and Python. There is even a command invoker for running native executables and shell scripts.

1st class event streaming. riff provides functions with built-in event integration. Developers will love this feature; it frees them from the toil of wiring up connections to message brokers like Kafka and RabbitMQ.

Scalability. riff scales your functions automatically based on event volume. Functions can scale from 0 to 1, from 1 to N, and back down to 0 when there are no events.

When to Use riff

Functions, as a programming model, are actually quite old. So even though function services are new, organizations across all industries can use riff today to explore how to use serverless to address real-world use cases.

Real-Time ETL Behind the Firewall

In this data pipeline scenario, an organization receives data from multiple upstream sources, via streaming and scheduled feeds. The pipeline performs real-time ETL (extract-transform-load) using functions that run on riff to keep their system of record up-to-date.

The platform operations team of this organization has installed and configured riff in their own data center. Figure 1 depicts the high-level architecture of the solution.

Figure 1. Real-time ETL with Project riff

In this example, ETL developers wrote the Extract and Transform functions in Python. They used Java for the DB Load function. The functions are packaged in Docker containers and deployed using kubernetes resource definitions. Using similar resource definitions, they also declared four (4) event topics: Raw Data, Valid Data, Enriched Data and Error. (riff’s current underlying event broker is Kafka; there will be other pluggable implementations in the future.)

Now, let’s examine the solution in more detail.

There are three (3) upstream sources: i) Text files, which are sent by a partner every 2 hours; ii) legacy database, which several users update throughout the day; and, iii) JSON data sent in real time by the streaming API of another partner.

Each upstream source has a corresponding data collector service that posts the raw data to riff’s HTTP gateway. This approach turns incoming data into events on the Raw Data topic. (To make data collection even easier, future versions of riff will provide mechanisms to code data source functions for integration with external services. Such functions would be deployed and scaled like any other. Further, source functions would interact directly with the topics, and wouldn’t need to connect using an HTTP gateway. The input events for such "source" functions would be lifecycle triggers and/or metadata such as query parameters.)

riff’s Function Controller monitors event activity. So as soon as any activity occur on the Raw Data topic, it scales the Extract function from 0 to 1. Depending on the event volume, the Function Controller scales the function up to N replicas, where N is the maxReplicas property in the function’s YAML configuration file. (Alternatively, the default maxReplicas value could be derived from the number of partitions on the input topic.)

The Extract function listens to the Raw Data topic, validates the Raw Data event and sends it to the Valid Data topic.

This time, riff’s Function Controller scales the Transform function from 0 to 1. This function enriches the Valid Data event using custom logic, and sends it to the Enriched Data topic.

Finally, the Function Controller scales the DB Load function from 0 to 1, which stores the Enriched Data event into the system of record.

If, for some reason, errors occur during the event processing, the functions will send the event, along with some re-processing metadata, to the Error topic.

It is important to point out that riff provides first-class support for event stream processing. For instance, you could use windowing operations to emit aggregated counts collected over fixed time intervals.

About the Author

Guillermo is an award-winning Enterprise Architecture practitioner with 20+ years of progressive experience in different industries. Since 2011, he has led the delivery of cloud-native and digital transformation initiatives with Cloud Foundry at numerous Fortune 500 organizations. He focuses on all aspects of distributed systems including defense-in-depth, Internet scale, multi-cloud and fault-tolerance capabilities. He has presented at multiple conferences including SpringOne Platform, Cloud Foundry Summit, Pivotal Internet-of-Things roadshows and VMware Partner Exchange. Guillermo is passionate about his family, business, technology and soccer.