Component Tuning Levers

There are a number of services that you can use to tune the performance of your Metron
cluster. These services include Kafka, Storm, and HDFS. Within these services, you can modify
parsers, enrichment, and indexing (Elasticsearch or Solr).

When you consider tuning your HCP architecture, it is important to note where you can
modify settings. For example, Storm gives you the ability to independently set tasks in
executors for parser topologies. This is important if you want to set the number of tasks
higher than the number of executors to accommodate for future performance tuning and
rebalancing without the need to bring down your topologies. However, for enrichment and
indexing topologies, HCP uses Flux, and there is no method for specifying the number of
tasks from the number of executors in Flux. By default, the number of tasks equals the
number of executors.

The following lists the major properties for each service that you can modify to tune your
cluster:

Kafka

Number partitions

Storm

Kafka spout

Polling frequency

Polling timeouts

Offset commit period

Max uncommitted offsets

Number workers (OS processes)

Number executors (threads in a process)

Number ackers

Max spout pending

Spout and bolt parallelism

HDFS

Replication factor

Indexing

Elasticsearch

Solr

Kafka TuningThe main lever you can adjust to tune Kafka throughput is the number of partitions.

Storm TuningThere are several Storm properties you can adjust to tune your Storm topologies. Achieving the desired performance can be iterative and will take some trial and error.

Parser TuningYou can modify certain parser properties to tune your HCP architecture using the Management module. Modifying properties using the Management module is simple and can be performed by any user.

Enrichment TuningBecause all of the data is coming together in enrichments, you will probably need larger enrichments settings than your parallelism settings. Enrichment settings focus more on the compute workload than on the mapping workload in parsers or the IO driven workload in indexing. Enrichment makes significant use of caching for performance.

Index TuningIndexing is primarily IO driven. Tuning indexing tends to focus on the search index (Solr or Elasticsearch). Problems with indexing running too slow will often manifest as Kafka not commiting in time. This results from the indexing backing up so that it fails batches and the poll interval in Kafka is exceeded. The issue is actually with the index rather than Kafka.