Kafka in the Cloud: Who Needs Clusters Anyway?

Confluent is reimagining what it means to run Kafka in the cloud, and is jettisoning the notion that clusters are even necessary to get started with the real-time streaming data technology.

Confluent today announced that its Confluent Cloud customers can deploy a Kafka service that can scale from 0 to 100 MBps in throughput and scale back down in seconds. Customers only pay for the data that streamed through the service as part of Confluent’s new consumption-based pricing model.

Both announcements, made this morning at the company’s Kafka Summit London conference, help eliminate the need to build Kafka clusters that are big enough to handle large surges in traffic but often sit underutilized in between those surges. This practice of overprovisioning Kafka clusters is fairly standard, but is also expensive and wastes resources.

In fact, Confluent is questioning why the concept of a cluster is even necessary when running Kafka in the cloud. Obviously, server clusters are still an important ingredient to running Kafka (although you could run Kafka on a single SMP machine, on premise or in the cloud). Clusters of x86 server nodes still provide the underlying hardware upon which Kafka runs.

But according to Confluent CTO and co-founder Neha Narkhede, deployment requirements and access patterns for Kafka in the cloud are diverging from what clusters traditionally expose to software layers riding further up the stack.

“If you are using Kafka as a fully managed service, you might step back and ask whether the notion of a cluster still means anything,” Narkhede, who co-created Apache Kafka with Jay Kreps and Jun Rao, wrote today in a blog post.

“When you are deploying your own infrastructure, it’s quite necessary to think of discrete clusters, but when you’re using a cloud-native Kafka service, the concept begins to seem like unnecessary baggage held over from the on-prem world,” she wrote. “In the cloud, clusters become a mere detail of the plumbing, an abstraction that we should not have to think about.”

Confluent decided to do something about the impedance mismatch between a streaming Kafka data services and the underlying cluster – at least for customers who have signed up for its fully managed Kafka service, which is called Confluent Cloud.

Confluent is putting its money where its mouth is by promising that customers can scale their Kafka service up and down on demand, and pay only for the data that they actually streamed through the service, without incurring any additional fees or making any additional preparations or plans ahead of time. The caveat is that this promise is good only up to 100 MBps in data throughput.

Customers that need to stream more than 100 MBps per second will need to sign up for the enterprise version of Confluent Cloud, which ostensibly requires more careful provisioning of underlying clusters to handle the bigger data demands. Confluent says its enterprise environment can scale into the 10s of GB per second.

Narkhede says Confluent had to tackle a number of challenges to deliver the Kafka scalability on-demand.

Customers can get up and running in as little as five seconds, Confluent says

“First, we found that elasticity is not trivial,” she wrote. “When usage spikes, there is no time to boot new containers or add nodes to a cluster. And Kafka was not the only thing we had to scale! We also had to work around various limits on cloud infrastructure, like limits on the number of VPCs and elastic load balancers per account.”

Delivering that level of elasticity also meant that the Confluent team needed to do “continuous, intelligent data balancing across the various nodes of a Kafka cluster,” she wrote. “And finally, doing that efficiently required that we work to minimize the data we had to move around on each rebalance event.”

Confluent has been working to make management of Kafka cluster easier, for both on-prem and cloud customers. While Kafka provides a powerful new way to utilize and think about data, managing larger Kafka clusters is not easy. In addition to building shrink-wrapped Kafka services on the cloud, the company is also looking to container technology like Docker and Kubernetes to lessen the administrative burden of managing Kafka clusters.

“There’s a whole bunch of gotchas when trying to do this yourself using Kubernetes,” Narkhede told Datanami last year following the release of the Confluent Operator for Kubernetes. “There are many mistakes that people end up making. Because ultimately, they’re not really the experts of Kafka. They may know Kubernetes from deploying other stateless applications as part of their microservices transition, but really something stateful like Kafka needs a lot of care and careful consideration.”

Confluent Cloud is currently viable on AWS and Google Cloud. Confluent Cloud customers can get the consumption-based pricing up to 100 MBps on both of these public clouds. The cost of inbound and outbound data is $0.11 per gigabyte on Google Cloud and $0.13 on AWS, which makes Google Cloud the low-cost leader for Confluent Cloud customers. Data storage costs for both public clouds are the same, at $.10 per gigabyte per month on both AWS and Google Cloud.

Confluent also announced that its cloud customers can now use several of its Kafka-related services, including Schema Registry, KSQL and S3 Sink Connector, in preview mode.

The Confluent Schema Registry streamlines how users define and track how data schemas are used. The software maintains a repository of metadata that’s needed to recreate the history of the schemas, which gives customers the assurance that future changes won’t impact the business.

KSQL is a SQL query engine for Kafka streams that was announced two years ago. The software provides on onramp for business analysts to use their existing SQL skills to get answers from data streamed in Kafka.

The S3 connector allows Confluent Cloud vendors to use AWS’s S3 as a source or a sink for Kafka data flows.