Why Do We Need a New Format

Many developers and operators have strong opinions about software configuration formats. Debates about the pros and cons of a particular format center on readability, whether a format
supports comments, and so on.

Those are valid concerns indeed but configuration files are not always hand crafted by a human. In the age of rising automation expectations, the ease of generation of a particular format is rarely discussed.

Historically, RabbitMQ uses Erlang term files for configuration. Besides being the standard way of configuring Erlang-based systems, it strikes a good balance of power and safety: any Erlang data structure can be used,
including arbitrary nesting, yet arbitrary code cannot be evaluated.

That format, however, also has a few downsides that became obvious once the project had accumulated a certain critical mass of users:

It's not familiar to those getting started with RabbitMQ

It has subtle aspects such as required trailing dots and commas that confuse beginners

Arbitrary nesting can be powerful and sometimes necessary but it also can greatly complicate config file generation

In some cases familiarity with different Erlang data types was necessary (e.g. lists vs. binaries) for no good reason

Team RabbitMQ wanted to address all of those concerns but particularly the last one. Provisioning tools such as Chef and BOSH manage to generate functional config files but that code is difficult to read and maintain,
which in turn means that it is error-prone.

The New Format

This overrides default heartbeat value
offered by the server to 30 seconds.

Most settings use a single line, with configuration key and value separated by an
equality sign and zero or more spaces. Such formats have been around for
decades and are known to be fairly readable for humans.

Here's a slightly longer example:

In addition to the hearbeat setting, it also configures a TCP listener
to use port 5672 and bind to all interfaces available.

Settings can be structured (logically grouped) using dots. For example,
all plain TCP (as opposed to TCP plus TLS) listener settings are grouped
under listener.tcp.*.

Here's how TLS certificates and key are configured in the new format:

Compare this to the same settings in the classic (Erlang terms) format:

Besides being easier to read, the new version is much easier to generate.

It also has one less obvious improvement: the values are now
validated with a schema. For path values such as the private key path this means
that should a file not be found or not be readable, the node will
immediately report it and refuse to start. Previously the node would
start but the files would fail to load at runtime, which is a great
way to confuse deployment and monitoring tools.

Fields that expect numerical values will refuse to accept strings, and so on.
The new format offers some of the benefits of static typing, which is not
the case with many commonly used formats.

Collections

Single value keys are trivial to configure in this format. But what about
collections? For example, it's possible to configure more than
one TCP listener. It is also possible to list cluster nodes for
peer discovery purposes. How does this format account for that?

The new format supports collections that are maps (dictionaries). For values
that are arrays or sets, the keys are ignored. Here's how to specify a list of nodes
for peer discovery:

The keys in this example are 1, 2 and so on. Any key values can be used. Sequentially
growing numbers are easy to generate, so that's what our documentation examples use.

The two config formats are then merged. How is this possible? The trick is in translating
the new format to the old one, which we will cover next.

Alternatively it is possible to only use the legacy config format. It makes sense
during a transition period, for example.

How it Works

As mentioned above, the new format is translated into the classic one
under the hood since that's what a lot of libraries, including in Erlang/OTP, expect.
The translation is done by a tool called Cuttlefish,
originally developed by Basho Technologies. On start, RabbitMQ nodes use Cuttlefish
to do the following:

Collect config schema files from all plugins

Run Cuttlefish to do the translation

Combines the result with the advanced.config file

Loads the final config

For both RabbitMQ core and plugins the process is entirely transparent. All the
heavy lifting is done by a number of functions that form a translation schema.
Cuttlefish does the parsing and invokes schema functions to perform validation
and translation.

Plugin Configuration

Plugins that have configurable settings now ship their own schemas that are extracted
and incorporated into the main one on node boot.

Here's what management plugin configuration might look like:

The schema file for management.* keys is provided by the management plugin.

Conclusion

This new format makes RabbitMQ config files be more familiar and readable
to humans, easier to generate for tools, and introduces value validation against an extensible schema.
Plugins can ship their own config schema files and benefit from the new format.

It still possible to use the previous format or combine the two. We believe that
the new format can cover the proverbial 80% of use cases, though.

Why Do We Need Peer Discovery?

Users of open source data services such as RabbitMQ have increasing
expectations around operations automation. This includes so-called Day
1 operations: initial cluster provisioning.

When a RabbitMQ cluster is first formed, newly booting nodes need
to have a way to discover each other. In versions up to and including 3.6.x were
two ways of doing this:

CLI tools

A list of nodes in configuration file

The former option is used by some provisioning tools but is generally
not very automation friendly. The latter is more convenient but
has its own limitations: the set of nodes is fixed and changing it requires
a config file redeployment and node restart.

A Better Way

There is a third option and it has been around in the community for a few years:
rabbitmq-autocluster, a plugin
originally developed by Gavin Roy.
That plugin modifies RabbitMQ boot process and injects a peer discovery step.
The list of peers in this case doesn't have to come from the config file:
it can be retrieved from an AWS autoscaling group
or an external tool such as etcd.

rabbitmq-autocluster authors concluded that there is no one true way of
performing peer discovery and that different approaches made sense for different
deployment scenarios. As such, they introduced a pluggable interface.
A specific implementation of this pluggable interface is called a peer
discovery mechanism. Given the explosion of platforms and deployment automation
stacks in the last few years, this turned out to be a wise decision.

For RabbitMQ 3.7.0 we took rabbitmq-autocluster and integrated its
main ideas into the core with some modifications influenced by our
experience supporting production RabbitMQ installations and community
feedback.

How Does it Work?

When a node starts and detects it doesn't have a previously
initialised database, it will check if there's a peer
discovery mechanism configured. If that's the case, it will
then perform the discovery and attempt to contact each
discovered peer in order. Finally, it will attempt to join the
cluster of the first reachable peer.

Some mechanisms assume all cluster members are known ahead of time (for example, listed
in the config file), others are dynamic (nodes can come and go).

RabbitMQ 3.7 ships with a number of mechanisms:

AWS (EC2 instance tags or autoscaling groups)

Kubernetes

etcd

Consul

Pre-configured DNS records

Config file

and it is easy to introduce support for more options in the future.

Since the ability to list cluster nodes in the config file is not new,
let's focus on the new features.

Node Registration and Unregistration

Some mechanisms use a data store to keep track of node list.
Newly joined cluster members update the data store to indicate their presence.
etcd
and Consul are two options supported via
plugins that ship with RabbitMQ.

With other mechanisms cluster membership is managed out-of-band (by a mechanism that
RabbitMQ nodes cannot control). For example, the AWS mechanism uses EC2 instance
filtering or autoscaling group membership, both of which are managed and updated
by AWS.

Using a Preconfigured Set

First we have to tell RabbitMQ to use the classic config mechanism for peer discovery.
This is done using the cluster_formation.peer_discovery_backend key.
Then list one or more nodes using cluster_formation.classic_config.nodes, which is a collection:

Finally the operator needs to provide a set of tags to filter on. The tags are key/value pairs.
This means it is possible to filter on more than one tag, for example, rabbitmq and cluster name
or environment type (e.g. development or test or production).

Here's a complete config example that uses 3 tags, region, service and environment:

We are all set with this example. The only thing left to discuss is how to handle a natural race
condition that occurs when a cluster is first formed and node listing therefore can only
return an empty set. This will be covered in a separate section below.

IAM Roles and Permissions

If an IAM role is assigned
to EC2 instances running RabbitMQ nodes, a policy has to be used to allow said instances use EC2 Instance
Metadata Service. Here's an example of such policy:

Without this policy in place the AWS peer discovery plugin won't be able to list instances and
discovery will fail. When discovery fails, the node will consider it to be a fatal condition
and terminate.

Node Names

By default node names with AWS peer discovery will be computed using private hostnames.
It is possible to switch to private IP addresses as well:

cluster_formation.aws.use_private_ip = true

The Chicken and Egg Problem of Peer Discovery

Consider a deployment where the entire cluster is provisioned at once
and all nodes start in parallel. For example, they may have been just
created by BOSH or an AWS Cluster Formation template. In this case
there's a natural race condition between node registration and more
than one node can become "first to register" (discovers no existing
peers and thus starts as standalone).

Different peer discovery backends use different approaches to minimize
the probability of such scenario. Some acquire a lock with their
data service (etcd, Consul) and release it after registering, retrying
if lock acquisition fails.

Others use a technique known as randomized startup delay. With
randomized startup delay nodes will delay their startup for a randomly
picked value (between 5 and 60 seconds by default).
While this strategy may seem naive at first, it works quite well in practice
with sufficiently high max delay intervals. It is also used for leader election
in some distributed system algorithms, for example, Raft.

Some backends (config file, DNS) rely on a pre-configured set of peers
and do not suffer from this issue since when a node attempts to join
its peer, it will continue retrying for a period of time.

What Peer Discovery Does not Do

Peer discovery was introduced to solve a narrow set of problems. It does not
change how RabbitMQ clusters operate once formed. Even though some mechanisms
introduce additional features,
some problems (shared secret distribution and monitoring, for example)
should be solved by different tools.

Peer discovery is also performed by blank (uninitialised) nodes. If a
node previously was a cluster member, it will try to contact its "last
seen" peer on boot for a period of time. In this case, no peer
discovery will be performed. This is no different from how earlier
RabbitMQ versions worked in this scenario.

Peer Discovery Troubleshooting

Reasoning about an automated cluster formation system that also
uses a peer discovery mechanism that has external dependencies
(e.g. AWS APIs or etcd) can be tricky. For this reason all peer
discovery implementations log key decisions and most log all external
requests at debug log level. When in doubt, enable debug logging
and take a look at node logs!

And keep in mind what's covered in the above section on when
peer discovery is not meant to kick in.

Differences from rabbitmq-autocluster

While the new peer discovery subsystem is similar to rabbitmq-autocluster
in many ways, there is a couple of important differences that matter
to operators.

With rabbitmq-autocluster, nodes will reset themselves before joining
its peers. This makes sense in some environments and doesn't in other.
Peer discovery in RabbitMQ core does not do this.

rabbitmq-autocluster allows environment variables to be used
for mechanism-specific configuration in addition to RabbitMQ
config file. While this feature was retained to simplify migration,
it should be considered deprecated by the peer discovery subsystem
in 3.7.0.

Peer discovery in the core uses the new configuration file format
heavily. rabbitmq-autocluster does not support that format since it
now is effectively a 3.6.x-only plugin.

Future Work

Most major aspects of the peer discovery subsystem described in this
post have a few years of battle testing via rabbitmq-autocluster. However,
as more and more users adopt this feature in more and more environments,
new feedback from a broader set of users and use cases accumulates.

Currently one open ended question is whether inability to contact
an external service used by a peer discovery mechanism (e.g. an AWS API endpoint
or etcd or DNS) should immediately be considered a fatal failure that makes
the node stop, or should peer discovery be retried for a period of time.
You feedback is welcome on the RabbitMQ mailing list.

]]>http://www.rabbitmq.com/blog/2018/02/12/peer-discovery-subsystem-in-rabbitmq-3-7/feed/0What’s New in RabbitMQ 3.7http://www.rabbitmq.com/blog/2018/02/05/whats-new-in-rabbitmq-3-7/
http://www.rabbitmq.com/blog/2018/02/05/whats-new-in-rabbitmq-3-7/#commentsMon, 05 Feb 2018 10:05:25 +0000Michael Klishinhttp://www.rabbitmq.com/blog/?p=786After over 1 year in the works, RabbitMQ 3.7.0 has quietly shipped
right before the start of the holiday season. The release was heavily
inspired by the community feedback on 3.6.x. In this post we’d like to
cover some of the highlights in this release.

RabbitMQ 3.7.0 focuses on automation friendliness and
operability.

New Configuration Format

Let's start with the new configuration
format. Historically RabbitMQ has used Erlang term files for
configuration. We will cover the pros and cons of this in a separate
blog post. Most importantly the classic format is hard to generate,
which complicates automation.

The new format is heavily inspired by sysctl and ini files. It is
easier to read for humans and much easier to generate for provisioning
tools.

In addition to being more friendly to humans and machines
this new config file includes validation for keys and certain value
types such as file paths. Should a certificate or public key file
not exist, the node will report it and fail to start. Same for
unknown or misspelled keys.

Expect a more detailed post about the new format in the future.

Peer Discovery Subsystem

When a RabbitMQ cluster is first formed, newly booting nodes need
to have a way to discover each other. In versions up to and including 3.6.x were
two ways of doing this:

CLI tools

A list of nodes in configuration file

The former option is used by some provisioning tools but is generlaly
not very automation friendly. The latter is more convenient but
has its own limitations: the set of nodes is fixed and changing it requires
a config file redeployment and node restart.

There is a third option and it has existed in the community for a few years:
rabbitmq-autocluster by Gavin Roy.
That plugin modifies RabbitMQ boot process and makes peer discovery more
dynamic: for example, the list of peers can be retrieved from an AWS autoscaling group
or an external tool such as etcd.

For RabbitMQ 3.7.0 we took rabbitmq-autocluster and integrated its
main ideas into the core with some modifications inspired by our
experience with production RabbitMQ installations and community
feedback.

The result is a new peer discovery subsystem which will be covered
in a separate blog post. It supports a number of mechanisms and platforms:

AWS (EC2 instance tags or autoscaling groups)

Kubernetes

etcd

Consul

Pre-configured DNS records

Config file

and makes it easy to introduce support for more options in the future.

Distributed Management Plugin

Statistics database overload was a major pain point in earlier
releases. It had to do with the original management plugin design
which delegated stats collection and aggregation for the entire cluster
to a single dedicated node. No matter how efficient that node is, this
has scalability limitations.

At some point this problem accounted for a significant portion of
the support tickets and mailing list threads, so it was decided that
a significant and breaking management plugin redesign was warranted.

In the new design, each node hosts and aggregates its own stats, and
requests data from other nodes as needed when an HTTP API request
comes in. We now have close to a year worth of support data and user
feedback and happy to report that stats DB overload is effectively no
longer an issue.

Redesigned CLI Tools

One long standing limitation of RabbitMQ CLI was the fact
that plugins could not extend it. This changes with the 3.7.0 release.
Plugins such as Shovel and Federation now can provide their own commands
that help operators assess the state of the system and manage it.

rabbitmq-diagnostics is a new command for operators that include
some of the commands previously available in rabbitmqctl but also
new ones. The list of diagnostics commands will continue to grow
based on user feedback on our mailing list.

Proxy Protocol Support

It's fairly common for clients to connect to RabbitMQ nodes via a proxy
such as HAproxy or AWS ELB. This created a complication for operators:
real client IP addresses were no longer known to the nodes and therefore
cannot be logged, displayed in the management UI, and so on.

Fortunately a solution to this problem exists and is supported by
some of the most popular proxy tools: the Proxy protocol.
Starting with 3.7.0, RabbitMQ supports Proxy protocol if the operator
opts in. It requires a compatible proxy but no client library changes.
Per Proxy protocol spec requirements, when the protocol is enabled,
direct client connections are no longer supported.

Cross-protocol Shovel

The Shovel plugin now supports AMQP 1.0 endpoints in both directions (as a source
and destinations). This means that Shovel now can move messages from an AMQP 1.0 only broker to RabbitMQ or vice versa.

Per-vhost Message Stores

Starting with 3.7.0, each virtual host gets its own message store
(actually, two stores). This was primarily done to improve resilience
and limit potential message store failures to an individual vhost
but it can also improve disk I/O utilization in environments
where multiple virtual hosts are used.

Other Noteworthy Changes

The minimum required Erlang/OTP version is now 19.3. We highly
recommend at least 19.3.6.5. That release contains fixes to two
bugs that could prevent nodes with active TCP connections from shutting down,
which in turn could greatly complicate automated upgrades. That version
together with 20.1.7 and 20.2.x contain a fix for the recently disclosed
ROBOT TLS attack.

During the 3.7 development cycle we introduced a new versioning scheme for clients.
Client library releases for Java and .NET are no longer tied to those of RabbitMQ
server. This allows clients to evolve more rapidly and follow a versioning
scheme that makes sense for them. Both Java and .NET clients are into
their 5.x versions by now, and include important changes that warrant
a major version number bump, for example, lambda and .NET Core support.

Package Distribution Changes

Starting with 3.7.0, RabbitMQ packages (binary artifacts) are distributed using
three services:

Bintray provides package downloads as well as a Debian and Yum (RPM) repositories

GitHub releases include all release notes and provide a backup package download option

If you currently consume packages from rabbitmq.com, please switch to one of options above.

Unlike rabbitmq.com's legacy apt repository, Package Cloud and Bintray provide package versions older
than the most recent one. And, of course, now there are official Yum repositories for RabbitMQ itself
as well as our zero dependency Erlang/OTP RPM package.

]]>http://www.rabbitmq.com/blog/2018/02/05/whats-new-in-rabbitmq-3-7/feed/0New Reactive Client for RabbitMQ HTTP APIhttp://www.rabbitmq.com/blog/2017/10/18/new-reactive-client-for-rabbitmq-http-api/
http://www.rabbitmq.com/blog/2017/10/18/new-reactive-client-for-rabbitmq-http-api/#commentsWed, 18 Oct 2017 11:55:48 +0000Arnaud Cogoluègneshttp://www.rabbitmq.com/blog/?p=775The RabbitMQ team is happy to announce the release of version 2.0 of HOP, RabbitMQ HTTP API client for Java and other JVM languages. This new release introduce a new reactive client based on Spring Framework 5.0 WebFlux.

Reactive what?

As stated in Spring Framework WebClient documentation:

The WebClient offers a functional and fluent API that takes full advantage of Java 8 lambdas. It supports both sync and async scenarios, including streaming, and brings the efficiency of non-blocking I/O.

This means you can easily chain HTTP requests and transform the result, e.g. to calculate the total rate for all virtual hosts in a RabbitMQ broker:

This also means you can build a fully reactive dashboard application to monitor a farm of RabbitMQ clusters. Thanks to the underlying Reactor Netty library, the dashboard application will use as less resources as possible (HTTP connection pooling, non-blocking I/O).

]]>http://www.rabbitmq.com/blog/2017/10/18/new-reactive-client-for-rabbitmq-http-api/feed/0RabbitMQ Java Client 5.0 is Releasedhttp://www.rabbitmq.com/blog/2017/09/29/rabbitmq-java-client-5-0-is-released/
http://www.rabbitmq.com/blog/2017/09/29/rabbitmq-java-client-5-0-is-released/#commentsFri, 29 Sep 2017 09:07:28 +0000Arnaud Cogoluègneshttp://www.rabbitmq.com/blog/?p=734The RabbitMQ team is happy to announce the release of version 5.0 of the RabbitMQ Java Client. This new release is now based on Java 8 and comes with a bunch of interesting new features.

Java 8 is Now a Pre-requisite

RabbitMQ Java Client has been supporting Java 6 (released in 2006!) for many years. It was time to bump the pre-requisites to benefit from modern Java features. No need to worry for those stuck on Java 6 or Java 7: we will support Java Client 4.x.x series for the next upcoming months (bug fixes and even relevant new features if possible). Note the Java Client 5.0 (as well as 4.x.x) also supports Java 9.

Spring Cleaning

Some classes and interfaces showed to be less relevant these days and were marked as deprecated in the previous major versions: this is the case of FlowListener and QueueingConsumer (among others). They have been removed in 5.0.

New Lambda-oriented Methods

Lambda-oriented methods have been introduced for common use cases, e.g. to consume a message:

Other lambda-oriented methods are also available for most of theclientlisteners. This should make relevant application code more concise and readable.

More Flexibility to Specify Client Certificate

In Java, a client certificate is presented through a SSLContext's KeyManager. If different client connections needed different client certificates in the RabbitMQ Java Client, they needed different instances of ConnectionFactory. In 5.0, we introduced the SslContextFactory:

You can now set your own SslContextFactory in the ConnectionFactory to provide the logic based on the connection name to create the appropriate SslContext for this connection. The SslContextFactory implementation can look up certificates from a file system directory or from any other certificate repository (database, LDAP registry, etc). Combined with NIO (added in 4.0), this is a great way to have many client connections in a single JVM process that uses only a few threads.

Breaking Changes

A major release is a good time to do some cleaning as seen above and to introduce new features. Unfortunately, those new features sometimes break existing API. Cheer up, as we strived to maintain backward compatibility and most applications shouldn't be impacted by those changes. If in doubt, check the dedicated section in the release change log.

Wrapping Up

The RabbitMQ team hopes you'll enjoy this new version of the Java Client. Don't hesitate to consult the release change log for all the details. The binaries are available as usual from Maven Central and from our Bintray repository. To use RabbitMQ Java Client 5.0, add the following dependency if you're using Maven:

]]>http://www.rabbitmq.com/blog/2017/09/29/rabbitmq-java-client-5-0-is-released/feed/0Brand new rabbitmqctl in 3.7.0http://www.rabbitmq.com/blog/2016/12/15/brand-new-rabbitmqctl-in-3-7-0/
http://www.rabbitmq.com/blog/2016/12/15/brand-new-rabbitmqctl-in-3-7-0/#commentsThu, 15 Dec 2016 14:37:15 +0000Daniil Fedotovhttp://www.rabbitmq.com/blog/?p=700As of v3.7.0 Milestone 8, RabbitMQ ships with a brand new set of CLI tools (rabbitmqctl, rabbitmq-plugins, and more), reworked from the ground up. We had a few goals with this project:

We wanted to use a more user-friendly command line parser and produce more useful help and error messages.

CLI tools should be extensible from plugins: plugins such as management, federation, shovel, trust store all have functions that are meant to be invoked by CLI tools but the only way of doing it was `rabbitmqctl eval`, which is error prone and can be dangerous.

We wanted to give Elixir a try on a real project and make it easier for developers new to Erlang to extend the CLI functionality.

Our CLI tools historically didn't have good test coverage; the new ones should (and do).

CLI tools should be able to produce machine-friendly formats, be it JSON, CSV or something else; there was no internal infrastructure for doing that in the original implementation.

CLI tools should be a separate repository just like all plugins, client libraries, and so on.

Nine months later the experiment was declared a success and integrated into RabbitMQ distribution.

There's also a longer document that covers new features and implementation decisions.

]]>http://www.rabbitmq.com/blog/2016/12/15/brand-new-rabbitmqctl-in-3-7-0/feed/0Metrics support in RabbitMQ Java Client 4.0http://www.rabbitmq.com/blog/2016/11/30/metrics-support-in-rabbitmq-java-client-4-0/
http://www.rabbitmq.com/blog/2016/11/30/metrics-support-in-rabbitmq-java-client-4-0/#commentsWed, 30 Nov 2016 15:11:09 +0000Arnaud Cogoluègneshttp://www.rabbitmq.com/blog/?p=681Version 4.0 of the RabbitMQ Java Client brings support for runtime metrics. This can be especially useful to know how a client application is behaving. Let's see how to enable metrics collection and how to monitor those metrics on JMX or even inside a Spring Boot application.

Metrics activation

Metrics are collected at the ConnectionFactory level, through the MetricsCollector interface. The Java Client comes with one implementation, StandardMetricsCollector, which uses the Dropwizard Metrics library.

Dropwizard Metrics is mature and widely used across the Java community, so we thought it'd make sense to make it our default metrics library. Nevertheless, you can come up with your own MetricsCollector implementation if your application has specific needs in terms of metrics.

Note metrics collection is disabled by default, so it doesn't impact users who don't want to have runtime metrics.

Here is how to enable metrics collection with the default implementation and then get the total number of published messages with all the connections created from this ConnectionFactory:

ConnectionFactory connectionFactory = new ConnectionFactory();
StandardMetricsCollector metrics = new StandardMetricsCollector();
connectionFactory.setMetricsCollector(metrics);
// later in the code
long publishedMessagesCount = metrics.getPublishedMessages().getCount();

Other available metrics are open connections and channels, consumed messages, acknowledged messages, and rejected messages.

Using metrics inside the application code usually doesn't make much sense: those metrics are rather meant to be sent to some monitoring backends. Fortunately, Dropwizard Metrics support many of those: JMX, Graphite, Ganglia, even CSV file export. Let's see how to use it with JMX, a Java standard to manage and monitor applications.

With JMX

Dropwizard Metrics has built-in support for JMX thanks to the JmxReporter class. The MetricRegistry just needs to be shared between the StandardMetricsCollector and the JmxReporter:

As soon as Spring Boot detects Dropwizard Metrics on the classpath, it creates a MetricRegistry bean. It doesn't take long to retrieve this bean and use it in our StandardMetricsCollector. Note Spring Boot automatically creates the necessary resources (RabbitMQ's ConnectionFactory and Spring AMQP's CachingConnectionFactory) if it detects Spring AMQP on the classpath. The glue code to add to a configuration class isn't obvious but it's at least straightforward:

Neat isn't it? Plugging the RabbitMQ Java Client metrics on the /metrics endpoint is explicit right now, but this should hopefully become automatic (as in Spring Boot auto-configuration) as long as the appropriate conditions are met (Dropwizard Metrics and RabbitMQ Java Client version 4.0 or more on the classpath).

Wrapping up

Operators and developers can now have more insights about applications using the RabbitMQ Java Client. Available metrics can tell you whether the application is operating normally or not. And thanks to Dropwizard Metrics large range of supported backends, plugging the Java Client metrics on your favorite monitoring tool should be straightforward.

]]>http://www.rabbitmq.com/blog/2016/11/30/metrics-support-in-rabbitmq-java-client-4-0/feed/0RabbitMQ Java Client 4.0 is releasedhttp://www.rabbitmq.com/blog/2016/11/24/rabbitmq-java-client-4-0-is-released/
http://www.rabbitmq.com/blog/2016/11/24/rabbitmq-java-client-4-0-is-released/#commentsThu, 24 Nov 2016 13:14:23 +0000Arnaud Cogoluègneshttp://www.rabbitmq.com/blog/?p=675The RabbitMQ team is happy to announce the release of version 4.0 of the RabbitMQ Java Client. This new release does not introduce any breaking changes and comes with a bunch of interesting new features.

New independent release process

From now on, the Java Client will be released separately from the broker. It'll make it easier and faster to ship bug fixes as well as new features.

Logging support with SLF4J

SLF4J is now used in several places of the Java Client to report logging messages. It's also used in the default exception handler implementation that ships with the client. This gives the application developer a large choice of logging implementations (e.g. Logback, Log4j) and a large choice of destinations to direct logs to (file, but also logging-specific protocols).

Metrics support

The Java Client can now gather runtime metrics such as number of sent and received messages. The metrics are collected by default through Dropwizard Metrics library, but collection is pluggable if you have some fancy requirements. Using Dropwizard Metrics gives the opportunity to use many monitoring backends out-of-the-box: JMX, Spring Boot metrics endpoint, Ganglia, Graphite, etc.

A separate blog post will cover metrics support in depth.

Support for Java NIO

The Java Client has been historically using the traditional Java blocking IO library (i.e. Socket and its Input/OutputStreams). This has been working for years, but isn't adapted to all kinds of workloads. Java NIO allows for a more flexible, yet more complex to implement model to handle network communication. Long story short, Java NIO allows to handle more connections with fewer threads (the blocking IO mode always needs one thread per connection).

Don't think of Java NIO as some kind of turbo button: your client application won't be faster by using NIO, it'll likely be able to use fewer threads if you use a lot of connections, that's all.

Note blocking IO is still the default in the Java Client, you need to explicitly enable NIO. The NIO mode uses reasonable defaults, but you may also have to tweak it for your workload through the NioParams class.

Automatic recovery enabled by default

Automatic recovery has been there for a few years now, and we know that many users always enable it, so we've decided to enable it by default. You can still choose not to use it, but you'll need to disable it explicitly.

Miscellaneous goodies and fixes

This new release comes also with its load of goodies and fixes. Take a look at the AddressResolver interface for instance: it's an abstraction to resolve the RabbitMQ hosts you want to connect to. Combined with automatic recovery, you end up with a robust client that can reconnect to nodes that weren't even up when it started in the first place.

The RabbitMQ Java Client version 4.0 is available on Maven Central (as well as on our Bintray repository). To use it, add the following dependency if you're using Maven:

]]>http://www.rabbitmq.com/blog/2016/11/24/rabbitmq-java-client-4-0-is-released/feed/0What’s new in RabbitMQ 3.6.0http://www.rabbitmq.com/blog/2015/12/28/whats-new-in-rabbitmq-3-6-0/
http://www.rabbitmq.com/blog/2015/12/28/whats-new-in-rabbitmq-3-6-0/#commentsMon, 28 Dec 2015 16:07:51 +0000Alvarohttp://www.rabbitmq.com/blog/?p=660We are pleased to announce the immediate availability of RabbitMQ
3.6.0, a new version of the broker that comes packed with lot of
new features. Before
we go on, you can obtain it here:
http://www.rabbitmq.com/download.html.

This release brings many improvements in broker features, development
environment for our contributors, and security. Let’s take a look at
some of the most significant ones.

Features

There are quite a few new features and improvements inside RabbitMQ
3.6.0 but from my point of view the most important one are
lazy-queues. Disclaimer: the author of this blog post worked on this
feature

Lazy Queues

This new type of queues work by sending every message that is
delivered to them straight to the file system, and only loading
messages in RAM when consumers arrive to the queues. To optimize disk
reads messages are loaded in batches.

There are a few advantages from this approach versus the old
approach. RabbitMQ default queues keep a cache of messages in memory
for fast delivery to consumers, the problem with this cache is that if
consumers aren’t fast enough, or consumers go completely offline, then
more and more messages will be held in RAM, which at some point will
trigger the algorithm that makes the queue page messages to disk. Even
tho in previous releases we have
improved the paging algorithm,
paging can still block the queue process, which could result in
credit flow
kicking in, which ends up blocking publishers.

With lazy queues there’s no paging, since as stated above, all
messages are sent straight to disk. Our tests have shown that this has
the consequence of having a more even throughput for queues, even when
consumers are offline.

Another advantage of lazy queues is the reduced RAM usage due to the
elimination of the message cache mentioned above.

Finally, lazy queues can be enabled and disabled at runtime. You can
use policies to
convert queues from default ones to lazy queues, and even back to the
default mode if you feel the need for it.

Faster Mirror Queue Synchronization

Synchronization between queues has been greatly improved. Before
RabbitMQ 3.6.0 the synchronization algorithm would try to send one
message at a time to those mirrors that were out of sync. This
algorithm has been improved by implementing batch publish operations
inside RabbitMQ’s queues.

During development our tests showed that for a queue with one million
messages, the old algorithm would take approximately 60 seconds for a
full sync, while the new algorithm takes around 10 seconds for the
same amount of messages.

Moving to Git

During a big part of this year, our development moved completely from
our self-hosted Mercurial repository, to a Git based workflow hosted
on Github. This has improved a lot our own productivity as a team,
making it easier to work on new features, and get feedback between
colleagues.

What’s better tho, is the fact that now is much easier for RabbitMQ
users to send their contributions back to us.

This release comes with quite a few improvements to the broker
directly sent by six different external contributors. Of course we
want to improve that number.

Move to Erlang.mk

RabbitMQ as a project predates popular build tools from the Erlang
ecosystem like Rebar or Erlang.mk, therefore we had our own way to
build the broker and to manage Erlang dependencies. This was
unfortunate since it made a little bit harder to integrate external
libraries with RabbitMQ, and at the same time, it complicated things
for other Erlang users to use RabbitMQ libraries. Just take a look at
this Github search where people are trying different ways to integrate
our very own gen_server2 into their projects:
gen_server2 search

To improve the situation in this area, one of our colleagues worked
hard on a complete overhaul of our build system. We stayed with
make, a tried and tested tool, but we migrated to
Erlang.mk, a make based build system for the
Erlang world.

This improved how we handle dependencies, allowed us to remove lot of
code that was duplicating features already provided by Erlang.mk, and
even reduced build times!

Changing things on our build system, means introducing breaking
changes on how we build RabbitMQ Plugins. If you are a plugin author,
you might want to read our new
plugin development guide.

Security

Last but not least, let’s talk about improvements in security,
specifically on how passwords are handled in RabbitMQ. Before version
3.6.0, passwords were stored in RabbitMQ as an md5 hash, which for
this day and age, is less than ideal. Now we have set SHA-256 as the
default password hashing function, with SHA-512 being an option that
we provide out of the box.

In this regard, it’s also possible to add other hashing algorithms to
RabbitMQ via plugins. To add a new hashing algorithm you just need to
implement this Erlang behaviour
rabbit_password_hashing.erl
which exposes only one function: hash/1.

If you create a new password hashing plugin, don’t forget to announce
it on our mailing list:
rabbitmq-users.

Conclusion

As you can imagine, we are really happy with this new RabbitMQ
release, which has set the foundation on which we can continue to
improve RabbitMQ, but now even closer together with the community, by
having standard tools like Erlang.mk and a collaborative platform like
Github.

Don’t forget to take a look at our full release notes and learn about
all the new features and bug fixes that ship with RabbitMQ 3.6.0:
release notes

]]>http://www.rabbitmq.com/blog/2015/12/28/whats-new-in-rabbitmq-3-6-0/feed/0New Credit Flow Settings on RabbitMQ 3.5.5http://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
http://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/#commentsTue, 06 Oct 2015 14:56:37 +0000Alvarohttp://www.rabbitmq.com/blog/?p=648In order to prevent fast publishers from overflowing the broker with
more messages than it can handle at any particular moment, RabbitMQ
implements an internal mechanism called credit flow that will be
used by the various systems inside RabbitMQ to throttle down
publishers, while allowing the message consumers to catch up. In this
blog post we are going to see how credit flow works, and what we can
do to tune its configuration for an optimal behaviour.

The latest version of RabbitMQ includes a couple of new configuration
values that let users fiddle with the internal credit flow
settings. Understanding how these work according to your particular
workload can help you get the most out of RabbitMQ in terms of
performance, but beware, increasing these values just to see what
happens can have adverse effects on how RabbitMQ is able to respond
to message bursts, affecting the internal strategies that RabbitMQ has
in order to deal with memory pressure. Handle with care.

To understand the new credit flow settings first we need to understand
how the internals of RabbitMQ work with regards to message publishing
and paging messages to disk. Let’s see first how message publishing
works in RabbitMQ.

Message Publishing

To see how credit_flow and its settings affect publishing, let’s see
how internal messages flow in RabbitMQ. Keep in mind that RabbitMQ is
implemented in Erlang, where processes communicate by sending messages
to each other.

Whenever a RabbitMQ instance is running, there are probably hundreds
of Erlang processes exchanging messages to communicate with each
other. We have for example a reader process that reads AMQP frames
from the network. Those frames are transformed into AMQP commands that
are forwarded to the AMQP channel process. If this channel is handling
a publish, it needs to ask a particular exchange for the list of
queues where this message should end up going, which means the channel
will deliver the message to each of those queues. Finally if the AMQP
message needs to be persisted, the msg_store process will receive it
and write it to disk. So whenever we publish an AMQP message to
RabbitMQ we have the following erlang message flow[1]:

reader -> channel -> queue process -> message store.

In order to prevent any of those processes from overflowing the next
one down the chain, we have a credit flow mechanism in place. Each
process initially grants certain amount of credits to the process that
it’s sending them messages. Once a process is able to handle N of
those messages, it will grant more credit to the process that sent
them. Under default credit flow settings
(credit_flow_default_credit under rabbitmq.config) these values
are 200 messages of initial credit, and after 50 messages processed by
the receiving process, the process that sent the messages will be
granted 50 more credits.

Say we are publishing messages to RabbitMQ, this means the reader
will be sending one erlang message to the channel process per AMQP
basic.publish received. Each of those messages will consume one of
these credits from the channel. Once the channel is able to process 50
of those messages, it will grant more credit to the reader. So far so
good.

In turn the channel will send the message to the queue process that
matched the message routing rules. This will consume one credit from
the credit granted by the queue process to the channel. After the
queue process manages to handle 50 deliveries, it will grant 50 more
credits to the channel.

Finally if a message is deemed to be persistent (it’s persistent and
published to a durable queue), it will be sent to the message store,
which in this case will also consume credits from the ones granted by
the message store to the queue process. In this case the initial
values are different and handled by the msg_store_credit_disc_bound
setting: 2000 messages of initial credit and 500 more credits
after 500 messages are processed by the message store.

So we know how internal messages flow inside RabbitMQ and when credit
is granted to a process that’s above in the msg stream. The tricky
part comes when credit is granted between processes. Under normal
conditions a channel will process 50 messages from the reader, and
then grant the reader 50 more credits, but keep in mind that a channel
is not just handling publishes, it’s also sending messages to
consumers, routing messages to queues and so on.

What happens if the reader is sending messages to the channel at a
higher speed of what the channel is able to process? If we reach this
situation, then the channel will block the reader process, which will
result in producers being throttled down by RabbitMQ. Under default
settings, the reader will be blocked once it sends 200 messages to the
channel, but the channel is not able to process at least 50 of them,
in order to grant credit back to the reader.

Again, under normal conditions, once the channel manages to go through
the message backlog, it will grant more credit to the reader, but
there’s a catch. What if the channel process is being blocked by the
queue process, due to similar reasons? Then the new credit that
was supposed to go to the reader process will be deferred. The
reader process will remain blocked.

Once the queue process manages to go through the deliveries backlog
from the channel, it will grant more credit to the channel, unblocking
it, which will result in the channel granting more credit to the
reader, unblocking it. Once again, that’s under normal conditions,
but, you guessed it, what if the message store is blocking the queue
process? Then credit to the channel will be deferred, which will
remain blocked, deferring credit to the reader, leaving the reader
blocked. At some point, the message store will grant messages to the
queue process, which will grant messages back to the channel, and then
the channel will finally grant messages to the reader and unblock
the reader:

Having one channel and one queue process makes things easier to
undertand but it might not reflect reality. It’s common for RabbitMQ
users to have more than one channel publishing messages on the same
connection. Even more common is to have one message being routed to
more than one queue. What happens with the credit flow scheme we’ve
just explained is that if one of those queues blocks the channel,
then the reader will be blocked as well.

The problem is that from a reader standpoint, when we read a frame
from the network, we don’t even know to which channel it belongs
to. Keep in mind that channels are a logical concept on top of AMQP
connections. So even if a new AMQP command will end up in a channel
that is not blocking the reader, the reader has no way of knowing
it. Note that we only block publishing connections, consumers
connections are unaffected since we want consumers to drain messages
from queues. This is a good reason why it might be better to have
connections dedicated to publishing messages, and connections
dedicated for consumers only.

On a similar fashion, whenever a channel is processing message
publishes, it doesn’t know where messages will end up going, until it
performs routing. So a channel might be receiving a message that
should end up in a queue that is not blocking the channel. Since at
ingress time we don’t know any of this, then the credit flow strategy
in place is to block the reader until processes down the chain are
able to handle new messages.

One of the new settings introduced in RabbitMQ 3.5.5 is the ability to
modify the values for credit_flow_default_credit. This setting takes
a tuple of the form {InitialCredit, MoreCreditAfter}. InitialCredit
is set to 200 by default, and MoreCreditAfter is set to
50. Depending on your particular workflow, you need to decide if
it’s worth bumping those values. Let’s see the message flow scheme
again:

reader -> channel -> queue process -> message store.

Bumping the values for {InitialCredit, MoreCreditAfter} will mean
that at any point in that chain we could end up with more messages
than those that can be handled by the broker at that particular point
in time. More messages means more RAM usage. The same can be said
about msg_store_credit_disc_bound, but keep in mind that there’s
only one message store[2] per RabbitMQ instance, and there
can be many channels sending messages to the same queue
process. So while a queue process has a value of 2000 as
InitialCredit from the message store, that queue can be ingesting
many times that value from different channel/connection sources. So
200 credits as initial credit_flow_default_credit value could be
seen as too conservative, but you need to understand if according to
your workflow that’s still good enough or not.

Message Paging

Let’s take a look at how RabbitMQ queues store messages. When a
message enters the queue, the queue needs to determine if the message
should be persisted or not. If the message has to be persisted, then
RabbitMQ will do so right away[3]. Now even if a message was
persisted to disk, this doesn’t mean the message got removed from RAM,
since RabbitMQ keeps a cache of messages in RAM for fast access when
delivering messages to consumers. Whenever we are talking about
paging messages out to disk, we are talking about what RabbitMQ does
when it has to send messages from this cache to the file system.

When RabbitMQ decides it needs to page messages to disk it will call
the function reduce_memory_use on the internal queue implementation
in order to send messages to the file system. Messages are going to be
paged out in batches; how big are those batches depends on the current
memory pressure status. It basically works like this:

The function reduce_memory_use will receive a number called target ram count which tells RabbitMQ that it should try to page out
messages until only that many remain in RAM. Keep in mind that whether
messages are persistent or not, they are still kept in RAM for fast
delivery to consumers. Only when memory pressure kicks in, is when
messages in memory are paged out to disk. Quoting from our code
comments: “The question of whether a message is in RAM and whether it
is persistent are orthogonal”.

The number of messages that are accounted for during this chunk
calculation are those messages that are in RAM (in the aforementioned
cache), plus the number of pending acks that are kept in RAM (i.e.:
messages that were delivered to consumers and are pending
acknowledgment). If we have 20000 messages in RAM (cache + pending
acks) and then target ram count is set to 8000, we will have to page
out 12000 messages. This means paging will receive a quota of 12000
messages. Each message paged out to disk will consume one unit from
that quota, whether it’s a pending ack, or an actual message from the
cache.

Once we know how many messages need to be paged out, we need to decide
from where we should page them first: pending acks, or the message
cache. If pending acks is growing faster than messages the cache, ie:
more messages are being delivered to consumers than those being
ingested, this means the algorithm will try to page out pending acks
first, and then try to push messages from the cache to the file
system. If the cache is growing faster than pending acks, then
messages from the cache will be pushed out first.

The catch here is that paging messages from pending acks (or the cache
if that comes first) might result in the first part of the process
consuming all the quota of messages that need to be pushed to disk. So
if pending acks pushes 12000 acks to disk as in our example, this
means we won’t page out messages from the cache, and vice versa.

This first part of the paging process sent to disk certain amount of
messages (between acks + messages paged from the cache). The messages
that were paged out just had their contents paged out, but their
position in the queue is still in RAM. Now the queue needs to decide
if this extra information that’s kept in RAM needs to be paged out as
well, to further reduce memory usage. Here is were finally
msg_store_io_batch_size enters into play (coupled with
msg_store_credit_disc_bound as well). Let’s try to understand how
they work.

The settings for msg_store_credit_disc_bound affect how internal
credit flow is handled when sending message to disk. The
rabbitmq_msg_store module implements a database that takes care of
persisting messages to disk. Some details about the why’s of this
implementation can be found here:
RabbitMQ, backing stores, databases and disks.

The message store has a credit system for each of the clients that
send writes to it. Every RabbitMQ queue would be a read/write client
for this store. The message store has a credits mechanism to prevent a
particular writer to overflow its inbox it with messages. Assuming
current default values, when a writer starts talking to the message
store, it receives an initial credit of 2000 messages, and it will
receive more credit once 500 messages are processed. When is this
credit consumed then? Credit is consumed whenever we write to the
message store, but that doesn’t happen for every message. The plot
thickens.

Since version 3.5.0 it’s possible to embed small messages into the
queue index, instead of having to reach the message store for
that. Messages that are smaller than a configurable setting (currently
4096 bytes) will go to the queue index when persisted, so those
messages won’t consume this credit. Now, let’s see what happens with
messages that do need to go to the message store.

Whenever we publish a message that’s determined to be persistent
(persistent messages published to a durable queue), then that message
will consume one of these credits. If a message has to paged out to
disk from the cache mentioned above, it will also consume one
credit. So if during message paging we consume more credits than those
currently available for our queue, the first half of the paging
process might stop, since there’s no point in sending writes to the
message store when it won’t accept them. This means that from the
initial quota of 12000 that we would have had to page out, we only
managed to process 2000 of them (assuming all of them need to go to
the message store).

So we managed to page out 2000 messages, but we still keep their
position in the queue in RAM. Now the paging process will determine if
it needs to also page out any of these messages positions to disk as
well. RabbitMQ will calculate how many of them can stay in RAM, and
then it will try to page out the remaining of them to disk. For this
second paging to happen, the amount of messages that has to be paged
to disk must be greater than msg_store_io_batch_size. The bigger
this number is, the more message positions RabbitMQ will keep in RAM,
so again, depending on your particular workload, you need to tune this
parameter as well.

Another thing we improved significantly in 3.5.5 is the performance of
paging queue index contents to disk. If your messages are generally
smaller than queue_index_embed_msgs_below, then you’ll see the
benefit of these changes. These changes also affect how message
positions are paged out to disk, so you should see improvements in
this area as well. So while having a low msg_store_io_batch_size
might mean the queue index will have more work paging to disk, keep in
mind this process has been optimized.

Queue Mirroring

To keep the descriptions above a bit simpler, we avoided bringing queue
mirroring into the picture. Credit flows also affects mirroring from a
channel point of view. When a channel delivers AMQP messages to
queues, it sends the message to each mirror, consuming one credit from
each mirror process. If any of the mirrors is slow processing the
message then that particular mirror might be responsible for the
channel being blocked. If the channel is being blocked by a mirror,
and that queue mirror gets partitioned from the network, then the
channel will be unblocked only after RabbitMQ detects the mirror
death.

Credit flow also takes part when synchronising mirrored queues, but
this is something you shouldn’t care too much about, mostly because
there’s nothing you could do about it, since mirror synchronisation is
handled entirely by RabbitMQ.

Conclusion

In any case, we hope this blog post has been informative and helps you
with your RabbitMQ tuning. If you have comments or questions about the
new credit flow settings, don’t hesitate to contact us at the RabbitMQ
mailing list:
rabbitmq-users