Also, this happens only for new topics (we have auto.create.topic set totrue), If retry sending message to existing topic, it works fine. Is thereany tweaking I need to do to broker or to producer to scale based on numberof partitions?

Search Discussions

11 responses

Guozhang WangHello Rajasekar, In 0.8 producers keep a cache of the partition -> leader_broker_id map which is used to determine to which brokers should the messages be sent. After new partitions are added, the cache on the producer has not populated yet hence it will throw this exception. The producer will then try to refresh its cache by asking the brokers "who are the leaders of these new partitions that I do not know of before". The brokers at the beginning also do not know this information, and will

In 0.8 producers keep a cache of the partition -> leader_broker_id mapwhich is used to determine to which brokers should the messages be sent.After new partitions are added, the cache on the producer has not populatedyet hence it will throw this exception. The producer will then try torefresh its cache by asking the brokers "who are the leaders of these newpartitions that I do not know of before". The brokers at the beginning alsodo not know this information, and will only get this information fromcontroller which will only propagation the leader information after theleader elections have all been finished.

If you set num.retries to 3 then it is possible that producer gives up toosoon before the leader info ever propagated to producers, hence toproducers also. Could you try to increase producer.num.retries and see ifthe producer can eventually succeed in re-trying?

Guozhang

On Tue, Aug 27, 2013 at 8:53 AM, Rajasekar Elango wrote:

Hello everyone,

We recently increased number of partitions from 4 to 16 and after thatconsole producer mostly fails with LeaderNotAvailableException and exitsafter 3 tries:

Also, this happens only for new topics (we have auto.create.topic set totrue), If retry sending message to existing topic, it works fine. Is thereany tweaking I need to do to broker or to producer to scale based on numberof partitions?

Neha NarkhedeAs Guozhang said, your producer might give up sooner than the leader election completes for the new topic. To confirm if your producer gave up too soon, you can run the state change log merge tool for this topic and see when the leader election finished for all partitions ./bin/kafka-run-class.sh kafka.tools.StateChangeLogMerger --logs --topic Note that this tool requires you to give the state change logs for all brokers in the cluster. Thanks, Neha

As Guozhang said, your producer might give up sooner than the leaderelection completes for the new topic. To confirm if your producer gave uptoo soon, you can run the state change log merge tool for this topic andsee when the leader election finished for all partitions

Note that this tool requires you to give the state change logs for allbrokers in the cluster.

Thanks,Neha

On Tue, Aug 27, 2013 at 9:45 AM, Guozhang Wang wrote:

Hello Rajasekar,

In 0.8 producers keep a cache of the partition -> leader_broker_id mapwhich is used to determine to which brokers should the messages be sent.After new partitions are added, the cache on the producer has not populatedyet hence it will throw this exception. The producer will then try torefresh its cache by asking the brokers "who are the leaders of these newpartitions that I do not know of before". The brokers at the beginning alsodo not know this information, and will only get this information fromcontroller which will only propagation the leader information after theleader elections have all been finished.

If you set num.retries to 3 then it is possible that producer gives up toosoon before the leader info ever propagated to producers, hence toproducers also. Could you try to increase producer.num.retries and see ifthe producer can eventually succeed in re-trying?

Also, this happens only for new topics (we have auto.create.topic set totrue), If retry sending message to existing topic, it works fine. Is thereany tweaking I need to do to broker or to producer to scale based on numberof partitions?

I am also seeing .log and .index files created for this topic in data dir.Also list topic command shows leaders, replicas and isrs for allpartitions. Do you still think increasing num of retries would help or isit some other issue..? Also console Producer doesn't seem to have optionto set num of retries. Is there a way to configure num of retries forconsole producer ?

Thanks,Raja.

On Tue, Aug 27, 2013 at 12:52 PM, Neha Narkhede wrote:

As Guozhang said, your producer might give up sooner than the leaderelection completes for the new topic. To confirm if your producer gave uptoo soon, you can run the state change log merge tool for this topic andsee when the leader election finished for all partitions

Note that this tool requires you to give the state change logs for allbrokers in the cluster.

Thanks,Neha

On Tue, Aug 27, 2013 at 9:45 AM, Guozhang Wang wrote:

Hello Rajasekar,

In 0.8 producers keep a cache of the partition -> leader_broker_id mapwhich is used to determine to which brokers should the messages be sent.After new partitions are added, the cache on the producer has not populatedyet hence it will throw this exception. The producer will then try torefresh its cache by asking the brokers "who are the leaders of these newpartitions that I do not know of before". The brokers at the beginning alsodo not know this information, and will only get this information fromcontroller which will only propagation the leader information after theleader elections have all been finished.

If you set num.retries to 3 then it is possible that producer gives up toosoon before the leader info ever propagated to producers, hence toproducers also. Could you try to increase producer.num.retries and see ifthe producer can eventually succeed in re-trying?

Guozhang

On Tue, Aug 27, 2013 at 8:53 AM, Rajasekar Elango <

relango@salesforce.com

wrote: Hello everyone,

We recently increased number of partitions from 4 to 16 and after thatconsole producer mostly fails with LeaderNotAvailableException and

exits

after 3 tries:

Here is last few lines of console producer log:

No partition metadata for topic test-41 due tokafka.common.LeaderNotAvailableException}] for topic [test-41]: classkafka.common.LeaderNotAvailableException(kafka.producer.BrokerPartitionInfo)[2013-08-27 08:29:30,271] ERROR Failed to collate messages by topic,partition due to: Failed to fetch topic metadata for topic: test-41(kafka.producer.async.DefaultEventHandler)[2013-08-27 08:29:30,271] INFO Back off for 100 ms before retrying

Guozhang WangHello Rajasekar, The remove fetcher log entry is normal under addition of partitions, since they indicate that some leader changes have happened so brokers are closing the fetchers to the old leaders. I just realized that the console Producer does not have the message.send.max.retries options yet. Could you file a JIRA for this and I will followup to add this option? As for now you can hard modify the default value from 3 to a larger number. Guozhang On Tue, Aug 27, 2013 at 12:37 PM, Rajasekar

The remove fetcher log entry is normal under addition of partitions, sincethey indicate that some leader changes have happened so brokers are closingthe fetchers to the old leaders.

I just realized that the console Producer does not have themessage.send.max.retries options yet. Could you file a JIRA for this and Iwill followup to add this option? As for now you can hard modify thedefault value from 3 to a larger number.

Guozhang

On Tue, Aug 27, 2013 at 12:37 PM, Rajasekar Elangowrote:

Thanks Neha & Guozhang,

When I ran StateChangeLogMerger, I am seeing this message repeated 16 timesfor each partition:

I am also seeing .log and .index files created for this topic in data dir.Also list topic command shows leaders, replicas and isrs for allpartitions. Do you still think increasing num of retries would help or isit some other issue..? Also console Producer doesn't seem to have optionto set num of retries. Is there a way to configure num of retries forconsole producer ?

wrote: As Guozhang said, your producer might give up sooner than the leaderelection completes for the new topic. To confirm if your producer gave uptoo soon, you can run the state change log merge tool for this topic andsee when the leader election finished for all partitions

Note that this tool requires you to give the state change logs for allbrokers in the cluster.

Thanks,Neha

On Tue, Aug 27, 2013 at 9:45 AM, Guozhang Wang wrote:

Hello Rajasekar,

In 0.8 producers keep a cache of the partition -> leader_broker_id mapwhich is used to determine to which brokers should the messages be

sent.

After new partitions are added, the cache on the producer has not populatedyet hence it will throw this exception. The producer will then try torefresh its cache by asking the brokers "who are the leaders of these

new

partitions that I do not know of before". The brokers at the beginning alsodo not know this information, and will only get this information fromcontroller which will only propagation the leader information after theleader elections have all been finished.

If you set num.retries to 3 then it is possible that producer gives up toosoon before the leader info ever propagated to producers, hence toproducers also. Could you try to increase producer.num.retries and see

if

the producer can eventually succeed in re-trying?

Guozhang

On Tue, Aug 27, 2013 at 8:53 AM, Rajasekar Elango <

relango@salesforce.com

wrote: Hello everyone,

We recently increased number of partitions from 4 to 16 and after

that

console producer mostly fails with LeaderNotAvailableException and

exits

after 3 tries:

Here is last few lines of console producer log:

No partition metadata for topic test-41 due tokafka.common.LeaderNotAvailableException}] for topic [test-41]: classkafka.common.LeaderNotAvailableException(kafka.producer.BrokerPartitionInfo)[2013-08-27 08:29:30,271] ERROR Failed to collate messages by topic,partition due to: Failed to fetch topic metadata for topic: test-41(kafka.producer.async.DefaultEventHandler)[2013-08-27 08:29:30,271] INFO Back off for 100 ms before retrying

The remove fetcher log entry is normal under addition of partitions, sincethey indicate that some leader changes have happened so brokers are closingthe fetchers to the old leaders.

I just realized that the console Producer does not have themessage.send.max.retries options yet. Could you file a JIRA for this and Iwill followup to add this option? As for now you can hard modify thedefault value from 3 to a larger number.

Guozhang

On Tue, Aug 27, 2013 at 12:37 PM, Rajasekar Elangowrote:

Thanks Neha & Guozhang,

When I ran StateChangeLogMerger, I am seeing this message repeated 16 timesfor each partition:

I am also seeing .log and .index files created for this topic in data dir.Also list topic command shows leaders, replicas and isrs for allpartitions. Do you still think increasing num of retries would help or isit some other issue..? Also console Producer doesn't seem to have optionto set num of retries. Is there a way to configure num of retries forconsole producer ?

Note that this tool requires you to give the state change logs for allbrokers in the cluster.

Thanks,Neha

On Tue, Aug 27, 2013 at 9:45 AM, Guozhang Wang <wangguoz@gmail.com>

wrote:

Hello Rajasekar,

In 0.8 producers keep a cache of the partition -> leader_broker_id

map

which is used to determine to which brokers should the messages be

sent.

After new partitions are added, the cache on the producer has not populatedyet hence it will throw this exception. The producer will then try torefresh its cache by asking the brokers "who are the leaders of these

new

partitions that I do not know of before". The brokers at the

beginning

also

do not know this information, and will only get this information fromcontroller which will only propagation the leader information after

the

leader elections have all been finished.

If you set num.retries to 3 then it is possible that producer gives

up

too

soon before the leader info ever propagated to producers, hence toproducers also. Could you try to increase producer.num.retries and

see

if

the producer can eventually succeed in re-trying?

Guozhang

On Tue, Aug 27, 2013 at 8:53 AM, Rajasekar Elango <

relango@salesforce.com

wrote: Hello everyone,

We recently increased number of partitions from 4 to 16 and after

that

console producer mostly fails with LeaderNotAvailableException and

exits

after 3 tries:

Here is last few lines of console producer log:

No partition metadata for topic test-41 due tokafka.common.LeaderNotAvailableException}] for topic [test-41]:

Guozhang WangCool! You can follow the process of creating a JIRA here: http://kafka.apache.org/contributing.html And submit patch here: https://cwiki.apache.org/confluence/display/KAFKA/Git+Workflow It will be great if you can also add an entry for this issue in FAQ since I think this is a common question: https://cwiki.apache.org/confluence/display/KAFKA/FAQ Guozhang -- -- Guozhang

The remove fetcher log entry is normal under addition of partitions, sincethey indicate that some leader changes have happened so brokers are closingthe fetchers to the old leaders.

I just realized that the console Producer does not have themessage.send.max.retries options yet. Could you file a JIRA for this and Iwill followup to add this option? As for now you can hard modify thedefault value from 3 to a larger number.

Guozhang

On Tue, Aug 27, 2013 at 12:37 PM, Rajasekar Elangowrote:

Thanks Neha & Guozhang,

When I ran StateChangeLogMerger, I am seeing this message repeated 16 timesfor each partition:

I am also seeing .log and .index files created for this topic in data dir.Also list topic command shows leaders, replicas and isrs for allpartitions. Do you still think increasing num of retries would help or

is

it some other issue..? Also console Producer doesn't seem to have

option

to set num of retries. Is there a way to configure num of retries forconsole producer ?

Thanks,Raja.

On Tue, Aug 27, 2013 at 12:52 PM, Neha Narkhede <

neha.narkhede@gmail.com

wrote: As Guozhang said, your producer might give up sooner than the leaderelection completes for the new topic. To confirm if your producer

I am also seeing .log and .index files created for this topic in data dir.Also list topic command shows leaders, replicas and isrs for allpartitions. Do you still think increasing num of retries would help

or

is

it some other issue..? Also console Producer doesn't seem to have

option

to set num of retries. Is there a way to configure num of retries forconsole producer ?

Neha NarkhedeRajasekar, We are trying to minimize the number of patches in 0.8 to critical bug fixes or broken tooling. If the patch involves significant code changes, we would encourage taking it on trunk. If you want to just fix the console producer to take the retry argument, I would think it is small enough to consider taking it on 0.8 branch since it affects the usability of the console producer. Thanks, Neha

We are trying to minimize the number of patches in 0.8 to critical bugfixes or broken tooling. If the patch involves significant code changes, wewould encourage taking it on trunk. If you want to just fix the consoleproducer to take the retry argument, I would think it is small enough toconsider taking it on 0.8 branch since it affects the usability of theconsole producer.

Thanks,Neha

On Wed, Aug 28, 2013 at 8:36 AM, Rajasekar Elango wrote:

Guozhang ,

*The documentation says I need to work off of trunk. Can you confirm If Ishould be working in trunk or different branch.****Thanks,**Raja.*

Thanks, This is small fix to ConsoleProducer.scala only. Will use 0.8branch.

Thanks,Raja.

On Wed, Aug 28, 2013 at 12:49 PM, Neha Narkhede wrote:

Rajasekar,

We are trying to minimize the number of patches in 0.8 to critical bugfixes or broken tooling. If the patch involves significant code changes, wewould encourage taking it on trunk. If you want to just fix the consoleproducer to take the retry argument, I would think it is small enough toconsider taking it on 0.8 branch since it affects the usability of theconsole producer.