Logagent Meets Apache Kafka

This is a guest post from Filippo Balicchia. Filippo contributed Logagent plugins for Apache Kafka the details of which he is sharing in this post. Filippo is a software engineer and a passionate coder interested in distributed and cloud technologies, working at Openmind, one of major System integrators in Italy specialized in full development life cycle of eCommerce solutions from inception to maintenance.

Introduction

Logagent is a lightweight, modern log shipper designed to be very simple to use, for people who have not used a log shipper before. Getting Logagent to ship data to Elasticsearch, Sematext Cloud, or other destinations takes just a couple of minutes. The introduction of Apache Kafka input and output plugins makes Logagent even easier to use in more scalable logging architectures that use Kafka without sacrificing the ease of use.

We can think of Kafka as a partitioned and replicated distributed log service. It provides the functionality of a messaging system and can be considered a log-based messagebroker like Amazon Kinesis stream and Apache DistributedLog.

Give me the details!

One of the main concepts in Kafka is a Topic. A topic, of which there can be many instances in a single Kafka cluster, is made up of partitions. Each partition is made of a log stored as a group of files in Kafka Broker. Each such partition can be replicated to more than one Broker.

Kafka Producers are the ones that publish data to the topics in Kafka Brokers. A Producer can choose which record to assign to which partition within a topic. Data is first written to leader Broker. The follower Brokers in the ISR (in-sync-replica) replicate records from the leader Broker. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function. The Producer can be configured to wait for the response from leader Broker or from all Brokers in the ISR. The latter is slower, of course.

At the time of this writing required.acks in logagent-output-kafka can be configured and can assume the following value:

0 – Producer never waits for an acknowledgment from the Broker

1 – Producer gets an acknowledgment after the leader replica has received data

-1 – Producer gets an acknowledgment after all in-sync replicas have received data

From a security standpoint Logagent can communicate with Kafka over SSL using a dedicated port, although this is not enabled by default. We’ll show how easy it is to do that via an example.

Opposite Producers, on the other side of Brokers, are Consumers. In case of Logagent the logagent-input-kafka plugin acts as a Consumer. There is also a notion of Consumer Group and each Consumer Group uses one Broker as a coordinator. Its main task is assigning a partition when a new member of a Consumer Group joins. In our case, when a new instance of Logagent with logagent-input-kafka shows up. Kafka Consumers support three messaging semantics types:

At least once

At most once

Exactly once, as of Kafka 0.11 release

At the time of this writing, logagent-input-kafka (version 1.0.4) supports “At least once” with enable.auto.commit=true. This triggers commits periodically with the interval defined in autoCommitIntervalMs, 5000 ms by default. By reducing the commit interval you can limit the amount of re-processing the Consumer must do in the event of a crash.

A duplicate message may occur when, for example, Logagent consumes and forwards the message but crashes before getting a chance to commit consume offset information to Kafka. If that happens, Logstash will, upon restart, re-consume data from the last known/committed offset, which means it may re-consume messages it has consumed prior to the crash.

Cool but show me how!

Let’s look at the example where we first publish data to Kafka, then consume it from there, and ship it to Logsene.

As this picture shows, we’ll use both Logagent plugins for Kafka and we will produce and consume data over SSL, also shipping data to Sematext Cloud or Elasticsearch via SSL to make it searchable.

Services

About

Contact

Apache Lucene, Apache Solr and their respective logos are trademarks of the Apache Software Foundation.
Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S.
and in other countries. Sematext Group, Inc. is not affiliated with Elasticsearch BV.