Labels

Tuesday, February 27, 2018

Message Consumption in Apache kafka

Critical concept to understand because it is how consumers can read messages at their own pace and process them independently.

Place holder, It is like a bookmark that maintains the last read position

In the case of kafka topic, it is the last read message.

The offset is entirely established and maintained by the consumer.Since the consumer is entirely responsible for reading the messages and processing them on its own.

Keep track of what it has read and has not read

Offset refers to a message identifier

STEPS INVOLVED

When a consumer wishes to read from a topic, it must establish a connection with a Broker

After establishing the connection, the consumer will decide what messages it wants to consume

If the consumer has not previously read from the topic , or it has to start over, it will issue a statement to read from the beginning of the topic (Consumer establishing that its message offset for the topic is 0)

0 1 2 3 4

4. As it reads through the sequence of messages, it will inevitably come to the last message in the topic and move it's offset accordingly

5. If another consumer is interested in the message from the topic,it could have already read the messages from the beginning and is simply waiting for more messages to arrive so it can read and process them. Note: It knows where it left off and can choose to advance from the position, stay put or go back and reread another previously read message, all without the producer, brokers or other consumers needing to know or care

NEW MESSAGES

When new messages arrive, the connected consumer will receive an event indicating there is a new message and it can advance its position one and it retrieves the new message

LAST MESSAGE

When the last message in the topic is read and processed, the consumer can set its offset, and at that point is caught up.

MESSAGE RETENTION POLICY

The time it can retain messages is configurable and is known as the message retention policy

All messages are retained by a Kafka cluster regardless if a single consumer has consumed a message.

The length of time in which messages are retained is configurable in hours.Default retention period is 168 hours or 7 days.Beyond that message would start to fall off

Note: Retention period is set for a per topic basis, which means that within a cluster, we could have hundreds of retention policy
Ability to retain message is corresponded to the available storage.