max.uncommitted.offsets defines the maximum number of polled offsets
(records) that can be pending commit before another poll can take place. When this
limit is reached, no more offsets can be polled until the next successful commit sets
the number of pending offsets below the threshold. To set this parameter, use the
KafkaSpoutConfig set method setMaxUncommittedOffsets.

Note that these two parameters trade off memory versus time:

When
offset.commit.period.ms is set to a low value, the
spout commits to Kafka more often. When the spout is committing to
Kafka, it is not fetching new records nor processing new tuples.

When max.uncommitted.offsets increases, the
memory footprint increases. Each offset uses eight bytes of memory,
which means that a value of 10000000 (10MB) uses about 80MB of
memory.

It is possible to achieve good performance with a low commit period and small
memory footprint (a small value for max.uncommitted.offsets), as well
as with a larger commit period and larger memory footprint. However, you should
avoid using large values for offset.commit.period.ms with a low value
for max.uncommitted.offsets.

Kafka consumer configuration parameters can also have an impact on the KafkaSpout
performance. The following Kafka parameters are most likely to have the strongest
impact on KafkaSpout performance:

Kafka consumer parameter fetch.min.bytes specifies the
minimum amount of data the server returns for a fetch request. If the
minimum amount is not available, the request waits until the minimum amount
accumulates before answering the request.

Kafka consumer parameter fetch.max.wait.ms specifies the
maximum amount of time the server will wait before answering a fetch
request, when there is not sufficient data to satisfy
fetch.min.bytes.

Important

For HDP 2.5.0 clusters in production use, you should override the default values of KafkaSpout
parameters offset.commit.period and
max.uncommitted.offsets, and Kafka consumer parameter
poll.timeout.ms, as follows:

Set poll.timeout.ms to 200.

Set offset.commit.period.ms to 30000 (30
seconds).

Set max.uncommitted.offsets to 10000000 (ten
million).

Performance also depends on the structure of your Kafka cluster, the distribution
of the data, and the availability of data to poll.

Log Level Performance Impact

Storm
supports several logging levels, including Trace, Debug, Info, Warn, and Error.
Trace-level logging has a significant impact on performance, and should be avoided
in production. The amount of log messages is proportional to the number of records
fetched from Kafka, so a lot of messages are printed when Trace-level logging is
enabled.

Trace-level logging is most useful for debugging pre-production
environments under mild load. For debugging, if necessary, you can throttle how many
messages are polled from Kafka by setting the max.partition.fetch.bytes
parameter to a low number that is larger than than the largest single message stored
in Kafka.

Logs with Debug level will have slightly less performance impact than
Trace-level logs, but still generate a lot of messages. This setting can be useful
for assessing whether the Kafka spout is properly tuned.