Apache Kafka on Heroku

Streaming data service optimized for developers.

What is Kafka?

Apache Kafka is a distributed commit log for fast, fault-tolerant
communication between producers and consumers using message based
topics. Kafka provides the messaging backbone for building a new
generation of distributed applications capable of handling billions of
events and millions of transactions.

Take control of your events

Events are everywhere — user activity streams, log events,
telemetry from embedded devices and mobile phones, and more. Kafka
flips the script from push to pull, letting you take control of high
volume event streams in your applications to transform the customer
experience. With Kafka, you can accept inbound events at any scale
with ease and route them to key-based partitions, providing a clear
path to real-time stream processing for user activity tracking, ad
tracking, IoT, mobile sync and messaging systems.

New ways to process data and time

Kafka lets you rethink the relationship between data, time and
operations in your application. Kafka takes transactional data in
tables and reduces it to a series of events, each representing a
keyed record and operation at a point in time. This lets you create
a record of all change events in your application for data recovery,
replay, simulation and auditing. These same primitives let you build
powerful data processing pipelines for analytics and transformation
use cases, with consumers reading data from a set of topics,
applying functions, and writing the output to a new set of topics.

Why Apache Kafka on Heroku?

Manage event streams of all sizes

Whether you’re first exploring event-driven architecture or
looking for an enterprise-grade Kafka solution, Heroku has you
covered. Get started with a plan to develop and test; change
to a larger plan - with a simple command on the CLI - as your
production needs expand.

World class operations

Now you can consume Kafka as a service with Heroku’s world-class
orchestration and thoughtfully tuned configurations that keep
Kafka fast and robust. We distribute Kafka resources across
network zones for fault-tolerance, and ensure your Kafka cluster
is always available and addressable.

Elegant developer experience

Easy to use CLI and web tooling make Kafka simple to provision,
configure and operate. Add topics, create partitions, manage log
compaction, and monitor key metrics from the comfort of the CLI or
Heroku Dashboard.

Seamless integration with apps

Run producers and consumers as Heroku apps for simple vertical and
horizontal scalability. Config vars make it easy to securely
connect to your Kafka cluster, so you can focus on your core
logic.

How it works

Messages

Kafka is a message passing system, messages are events and can have keys.

Brokers

A Kafka cluster is made up of brokers that run Kafka processes.

Topics

Topics are streams of messages of a particular category.

Partitions

Partitions are append only, ordered logs of a topic’s messages. Messages have offsets denoting position in the partition. Kafka replicates partitions across the cluster for fault tolerance and message durability.

Producers

Producers are client processes that send messages to a broker on a topic and partition. Producers can use a partitioning function on keys to control message distribution.

Consumers

Consumers read messages from topics' partitions on brokers, tracking the last offset read to coordinate and recover from failures. Consumers can be deployed in groups for scalability.

Log compaction

Log compaction keeps the most recent value for every key so clients can restore state.

One of the biggest benefits of Apache Kafka on Heroku is the developer experience. We can use the same familiar tools and unified management experience for Kafka as we do for our Heroku apps and other add-ons, and we now have a system that more closely matches our team structure.

Ryan DaigleDirector of Engineering, Spreedly

We don’t have a DevOps team, so using Apache Kafka on Heroku means we don’t have to worry about setting up infrastructure, or configuring and tuning our Kafka instance to ensure performance. This helps us focus on making our products better.

Jonathan GeggattSr. Platform Engineer, HotelTonight

With Apache Kafka on Heroku, we can automate nearly everything and move more calls to a real-time push/pull data stream. Kafka allows us to run asynchronous batches on larger data calls, which cuts the processing time in half, increases reliability, and reduces time spent on monitoring and management.

Apache Kafka on Heroku offers a single solution that powers both event notification between apps and event data flows for site analytics. We no longer have to manually configure apps or manage additional event streaming mechanisms. It saves us time and reduces complexity.

Michael WaggTech Lead, carwow

Apache Kafka on Heroku

Build data intensive apps

Elastic queuing

Kafka on Heroku acts as the edge of your system, durably accepting high volumes of inbound events - be it user click interactions, log events, mobile telemetry, ad tracking, or other events. This enables you to create new types of architectures for incremental processing of immutable event streams. You can add and remove downstream services seamlessly without impacting the ability to accept high throughput inbound events, and Kafka’s durability ensures events are available when services reconnect after failures so no events are lost.

Data pipelines and analytics

Kafka is an ideal transport for building data pipelines for transforming stream data and computing aggregate metrics. Pipelines can help you build advanced data-centric applications and enable analytics teams to make better decisions. Kafka’s distributed architecture and immutable event streams make it trivial to build pipelines for incremental, parallel processing of fast moving data. You can integrate all the disparate sources and sinks of data in your organization.

Microservices coordination

Kafka enables you to model your application as a collection of microservices that process events and exchange state over channel-like topics. Kafka becomes the backplane for service communication, allowing microservices to become loosely coupled. Bootstrapping microservices becomes order independent, since all communications happens over topics. Service discovery is simply a matter of connecting to new topics. Consuming and producing services, as well as Kafka brokers, can be scaled independently so your architecture is fully elastic. Kafka distributes topics and replicates messages across multiple servers for event durability, so if a broker fails for any reason, your event data will be safe. If a service fails it can reconnect and start processing from the last known offset.

Tech session

Apache Kafka can be used to stream billions of events per day — but do you know where to use it in your app architecture? Find out at our technical session. See a live demo and hear answers to questions from Heroku product experts.

Podcast

Listen to our podcast with Software Engineering Daily from October 25th, 2016.

Apache Kafka is a durable, distributed message broker that’s a great choice for managing large volumes of inbound events, building data pipelines, and acting as the communication bus for microservices. In this Software Engineering Daily podcast, Heroku engineer, Tom Crayford, talks about building the Apache Kafka on Heroku service, challenges we faced, and why we focused on Kafka in the first place.