Category: Ruby
(page 1 of 98)

Note: These release notes cover only the major changes. To learn about various bug fixes and changes, please refer to the change logs or check out the list of commits in the main Karafka repository on GitHub.

TL;DR

Note: Changes above don’t include Zeitwerk setup for your non-Rails projects. See this commit for details on how to replace Karafka::Loader with Zeitwerk.

Note: If you use Sidekiq backend, keep in mind that before an upgrade, you need to consume all of the messages that are already in Redis.

Note: This release is the last release with ruby-kafka under the hood. We’ve already started the process of moving to rdkafka-ruby.

Changes (features, incompatibilities, etc)

Auto-reload of code changes in development

Up until now, in order to see your code changes within the Karafka process, you would have to restart it. That was really cumbersome as for bigger and more complex Kafka clusters, restart with reconnections and rebalancing could take a significant amount of time. Fortunately, those times are already gone!

All you need to do is enabling this part of the code before the App.boot in your karafka.rb file:

# For non-Rails app with Zeitwerk loader
if Karafka::App.env.development?
Karafka.monitor.subscribe(
Karafka::CodeReloader.new(
APP_LOADER
)
)
end
# Or for Ruby on Rails
if Karafka::App.env.development?
Karafka.monitor.subscribe(
Karafka::CodeReloader.new(
*Rails.application.reloaders
)
)
end

and your code changes will be applied after each message/messages batch fetch.

Keep in mind though, that there are a couple of limitations to it:

Changes in the routing are NOT reflected. This would require reconnections and would drastically complicate reloading.

Any background work that you run, outside of the Karafka framework but still within, might not be caught in the reloading.

If you use in-memory consumer data buffering that spans across multiple batches (or messages in a single message fetch mode), it WON’T work as code reload means re-initializing all of the consumers instances. In cases like that. you will be better, not using the reload mode at all.

It is also worth pointing out, that if you have a code that should be re-initialized in any way during the reload phase, you can pass it to the Karafka::CodeReloader initializer:

Parsers are now Deserializers in the routing and accept the whole Karafka::Params::Params object

Parsers as a concept, that would be responsible for serialization and deserialization of data violated SRP (see details here). From now on, they are separate entities that you can use independently. For the upgrade, just rename parser to deserializer for each topic you’re using in the routes:

def consume
params_batch.each do |params|
puts "Hello #{params['login']}!\n"
end
end

Karafka used to merge your data directly within the Karafka::Params::Params object root scope. That was convenient, but not flexible enough. There are some metadata details in the root params scope that could get overwritten, plus in case you would send something else than a JSON hash, let’s say an array, you would get an exception and you would have to use a custom parser to bypass that (see this FAQ question).

Due to that and in order to better separate your incoming data from the rest of the payload (headers, metadata information, etc), from now on, all of your data will be available under the payload params key:

Usage

Once included in your RSpec setup, this library will provide you two methods that you can use with your specs:

– #karafka_consumer_for – this method will create a consumer instance for the desired topic. It needs to be set as the spec subject.
– #publish_for_karafka – this method will “send” message to the consumer instance.

Note: Messages sent using the `#publish_for_karafka` method won’t be sent to Kafka. They will be “virtually” delegated to the created consumer instance so your specs can run without Kafka setup.

New instrumentation called Karafka::Instrumentation::ProctitleListener has been added. Its purpose is to provide you with a nicer proc title with a descriptive value. In order to use it, please put the following line in your karafka.rb boot file:

Single consumer class supports more than one topic

Since now, you are able to use the same consumer class for multiple topics:

App.consumer_groups.draw do
consumer_group :default do
topic :users do
consumer UsersConsumer
end
topic :admins do
consumer UsersConsumer
end
end
end

Note: you will still have separate instances per each topic partition.

Delayed re-connection upon critical failures

If a critical failure occurs (network disconnection or anything similar) Karafka will back off and wait for reconnect_timeout (defaults to 10s) before attempting to reconnect. This should prevent you from being clogged by errors and logs upon serious problems.

Support for Kafka 0.10 dropped in favor of native support for Kafka 0.11

Support for Kafka 0.10 has been dropped. Weird things may happen if you decide to use Kafka 0.10 with Karafka 1.3 so just upgrade.

Reorganized responders – multiple_usage constrain no longer available

multiple_usage has been removed. Responders won’t raise any exception if you decide to send multiple messages to the same topic without declaring that. This feature was a bad idea and was creating a lot of trouble when using responders in a long-running, batched like flows.

Following code would raise a Karafka::Errors::InvalidResponderUsageError error in Karafka 1.2 but will continue to run in Karafka 1.3:

While Karafka is processing, ruby-kafka prebuffers more data under the hood in a separate thread. If you have a big consumer lag, this can cause your Karafka process to prebuffer hundreds or more megabytes of data upfront. Lowering the queue size makes Karafka more predictable by default.

Documentation

Our Wiki has been updated accordingly to the 1.3 status. Please notify us if you find any incompatibilities.

Getting started with Karafka

If you want to get started with Kafka and Karafka as fast as possible, then the best idea is to just clone our example repository:

Many of you may not be aware, but Ruby 3 is not only a distant, abstract concept. Ruby 3 is an end goal of a process that is already pretty advanced.

This week I had a chance to participate in a non-public Ruby 3 gathering/hack challenge that was organized by Miles Woodroffe and Cookpad in Bristol.

The goal of events like this one is to gather Ruby core team members as well as many of the prominent Ruby community developers and speakers in one place, to present Ruby 3 work progress, other Ruby improvements, gather feedback, exchange ideas and learn.

People see Ruby as a language that is being developed in separation from the community outside of Japan. It’s hard for me to make an opinion on this, as even with how things are now, it’s not that hard to keep track of all of the changes. However, I do believe, that gatherings like one in Bristol bind the community together, especially since they do connect core Ruby developers with people that are building many of the Ruby ecosystem components.

In this article, I will try to cover for you some of the things that happened as well as some of my opinions on the state of Ruby 2.7 and Ruby 3.

Organization

The event was divided into three main parts:

Workshops – hacking Ruby 2.7 with all the current development improvements and fixing or reviewing some of them in details

Presentations – Presentations from Ruby core team members on their recent work related to Ruby future

However, not many people know that under the hood, a code as above might create hundreds of thousand objects per second when the data-sets are big enough. I worry that, with a new syntax-sugar, people will use this approach more often, thus sometimes slowing their code a lot just by not understanding what is going on under the hood.

I’ve been working with Koichi-san on introducing internal method cache for the “dot colon” method references. I will write a separate blog post about that soon and explain why in the end it was not merged into Ruby and what we can do to countermeasure that issue.

Apart from that, I’ve been working on dry-monitor with Anton Davydov as it is really rare to be able to meet him and Solnic the same time, in the same place. We did a lot of conceptual work that will allow for shipping some exciting features in the future.

“Nothing new” one may say. Many of the things presented were already announced or introduced during various conferences before. What was different is the fact that the core members had way more time to answer questions and to get involved in discussions.

Due to the type of work that I do in Castle, and my OSS I was in particular interested in new concurrency models for Ruby.

Koichi-san presented concepts like Auto-Fibers, Threadlets, Guilds / Isolates as well as ideas on where and when each of them could be used. If at least part of the solution hits Ruby 3.0, we might see significant performance boosts for many things.

Discussions

Here are things that I did consider significant that were discussed:

Matz is aware of the “pipeline operator problem.”

Matz calls it a chain operator, and he is considering either changing the syntax or postponing this change at all.

Matz does not believe that more ways of expressing the same concepts in the language increase its entropy.

Matz is against deprecating by “gemifying” non-syntax potential incompatibilities (see ERB old and new API).

Guilds API is not stable; thus, for now, there is no way to mimic that feature with threads.

Ko1s Guilds API Ruby branch is not workable, and the progress on Guilds is not too fast.

Global Object Space is a problem for Guilds in the context of memory allocation.

There is no way to assess memory fragmentation without taking dumps. Noah Gibbs suggested a solution that could allow cheap runtime estimation of that value. However, I did not yet verify his idea.

Gradual Write Barrier insertion should allow further memory optimizations while maintaining compatibility with the C API.

Summary

It is worth keeping in mind that one of the things that make Ruby a productive tool is the availability of libraries and pre-brewed solutions.

Gatherings like this one allow libraries creators and maintainers to get a bit more insight on current and future development of features and improvements that could be used to build up even more amazing libraries.

At the moment, I’m disappointed only about the fact that Guilds API is not yet ready even as a concept. I do understand the reasons, but having “more or less” frozen API would allow me to mimic it with native threads and make things like Karafka “Guilds ready”™. Without such a piece of information, none of the lib builders knows what to expect. I do fear, that if Guilds are not being presented upfront, we might end up having Ruby 3 with a feature, that won’t be supported by the majority of the main libs for a long time.

After the gathering, I also don’t share Paweł Świątkowski worries that much anymore.

For example, what he calls an NIH syndrome is in my opinion more of a cautious approach towards building things that will have to be maintained for a long time (and knowing Matz approach – probably forever).

There shouldn’t be a single component of the language, that couldn’t be debugged or fixed by at least one of the core members. It applies to things like GC, Memory allocators but also any new stuff like pattern matching. In the end, who would want things that could break Ruby but that couldn’t be fixed fast enough? Same applies to the chain operator (that, in my opinion, is useless).

Is Ruby on a good road to become something more than it is now? Definitely yes, however, it is a bumpy one with many challenges on the horizon.