ConsoleConsumer throws InvalidConfigException for "." in client id

Details

Type: Bug

Status:Closed

Priority: Major

Resolution:
Fixed

Affects Version/s:
None

Fix Version/s:
None

Component/s:
None

Labels:

None

Description

kafka.common.InvalidConfigException: client.id console-consumer-16946-ConsumerFetcherThread-console-consumer-16946_XXXX-host18.corp-1358006528116-991e3f2d-0-1 is illegal, contains a character other than ASCII alphanumerics, _ and -
at kafka.common.Config$class.validateChars(Config.scala:32)
at kafka.consumer.ConsumerConfig$.validateChars(ConsumerConfig.scala:25)
at kafka.consumer.ConsumerConfig$.validateClientId(ConsumerConfig.scala:55)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:88)
at kafka.server.AbstractFetcherThread.<init>(AbstractFetcherThread.scala:44)
at kafka.consumer.ConsumerFetcherThread.<init>(ConsumerFetcherThread.scala:27)
at kafka.consumer.ConsumerFetcherManager.createFetcherThread(ConsumerFetcherManager.scala:93)
at kafka.server.AbstractFetcherManager.addFetcher(AbstractFetcherManager.scala:44)
at kafka.consumer.ConsumerFetcherManager$$anon$1$$anonfun$doWork$3.apply(ConsumerFetcherManager.scala:75)
at kafka.consumer.ConsumerFetcherManager$$anon$1$$anonfun$doWork$3.apply(ConsumerFetcherManager.scala:72)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:80)
at scala.collection.Iterator$class.foreach(Iterator.scala:631)
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:161)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:194)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:80)
at kafka.consumer.ConsumerFetcherManager$$anon$1.doWork(ConsumerFetcherManager.scala:72)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:50)

Activity

Currently we allow only alphanumerics, - and _ in the clientId, topic name and groupId. The topic name validation criteria were decided in KAFKA-495 and later while deciding to validate clientId and groupId in KAFKA-646, we tried to maintain consistency with the topic name validation.

As discussed in KAFKA-495, we don't have any strong reason to disallow dot in the validation, but we did not see any strong reason to make special case exceptions at the time as well and decided to make the validation criteria strict.

For now, can we change the system test to not use dot? We can add it to the set of legal characters later though if more use cases arise.

Swapnil Ghike
added a comment - 12/Jan/13 22:09 Currently we allow only alphanumerics, - and _ in the clientId, topic name and groupId. The topic name validation criteria were decided in KAFKA-495 and later while deciding to validate clientId and groupId in KAFKA-646 , we tried to maintain consistency with the topic name validation.
As discussed in KAFKA-495 , we don't have any strong reason to disallow dot in the validation, but we did not see any strong reason to make special case exceptions at the time as well and decided to make the validation criteria strict.
For now, can we change the system test to not use dot? We can add it to the set of legal characters later though if more use cases arise.

In kafka-683 we changed client.id for a consumer to be config.clientId + "-" + consumerIdString + fetcherId + sourceBroker.id. ConsumerIdString has the format of consumerHostName + currentTimeMillis + randomUUID. The problem is that consumerHostName may have ".", which breaks the naming convention. However, such a client.id is too verbose. For debugging, we need to know the client IP address. We can fix that for every type of request in a separate jira. With client IP address and correlation id, we should have enough information for debugging. So, for now, just attach a patch to revert client.id to just config.ClientId.

Jun Rao
added a comment - 13/Jan/13 21:51 In kafka-683 we changed client.id for a consumer to be config.clientId + "-" + consumerIdString + fetcherId + sourceBroker.id. ConsumerIdString has the format of consumerHostName + currentTimeMillis + randomUUID. The problem is that consumerHostName may have ".", which breaks the naming convention. However, such a client.id is too verbose. For debugging, we need to know the client IP address. We can fix that for every type of request in a separate jira. With client IP address and correlation id, we should have enough information for debugging. So, for now, just attach a patch to revert client.id to just config.ClientId.

Unfortunately, this rule of not allowing a '.' in the clientid seems very restrictive. Here, it is useful for the client id to have the hostname of the fetcher thread. This allows us to look at the kafka request log, take that client id, go the box, take a thread dump and correlate the behavior of the thread from the thread dump and the log4j using the client id string. This saves time required for troubleshooting. Since we have this restriction, we will have to take the client IP from the request log, convert it to a hostname, construct the thread name from this information and then try to understand the thread behavior from the thread dump and log4j. Also, one needs to know Kafka code to figure out how to construct the thread name from all this information, which is even worse.

I think this is making things more difficult for us, which is a down side to disallowing '.' in the client id.

Neha Narkhede
added a comment - 15/Jan/13 00:39 - edited Unfortunately, this rule of not allowing a '.' in the clientid seems very restrictive. Here, it is useful for the client id to have the hostname of the fetcher thread. This allows us to look at the kafka request log, take that client id, go the box, take a thread dump and correlate the behavior of the thread from the thread dump and the log4j using the client id string. This saves time required for troubleshooting. Since we have this restriction, we will have to take the client IP from the request log, convert it to a hostname, construct the thread name from this information and then try to understand the thread behavior from the thread dump and log4j. Also, one needs to know Kafka code to figure out how to construct the thread name from all this information, which is even worse.
I think this is making things more difficult for us, which is a down side to disallowing '.' in the client id.

Thanks for your comments. I agree that allowing dot will make our lives easier.

In the attached patch, I have added dot to the regexes in Topic and Config classes. The regexes have also been modified to escape the last hyphen so that in the future if someone accidentally added a character at the end of the regex group, then the hyphen will not be treated as a range identifier.

Topic has been modified to disallow "." and ".." as topic names.

TopicTest and ConfigTest have been modified accordingly.

After including this patch, we can keep Jun's patch or discard it depending on what is convenient to keep in clientId.

John, can you please check if this patch solves the problem (after reverting Jun's patch so that the root cause is reintroduced)?

Swapnil Ghike
added a comment - 15/Jan/13 01:38 Thanks for your comments. I agree that allowing dot will make our lives easier.
In the attached patch, I have added dot to the regexes in Topic and Config classes. The regexes have also been modified to escape the last hyphen so that in the future if someone accidentally added a character at the end of the regex group, then the hyphen will not be treated as a range identifier.
Topic has been modified to disallow "." and ".." as topic names.
TopicTest and ConfigTest have been modified accordingly.
After including this patch, we can keep Jun's patch or discard it depending on what is convenient to keep in clientId.
John, can you please check if this patch solves the problem (after reverting Jun's patch so that the root cause is reintroduced)?

The patch is sanity tested in a single box in which the hostname doesn't have a "." and in a distributed env where the host names have a "." (besides the parent domain name). In both cases, the patch works fine.

John Fung
added a comment - 15/Jan/13 17:05 Thanks Swapnil for the latest patch.
The patch is sanity tested in a single box in which the hostname doesn't have a "." and in a distributed env where the host names have a "." (besides the parent domain name). In both cases, the patch works fine.