Parser Tuning

You can modify certain parser properties to tune your HCP architecture using the
Management module. Modifying properties using the Management module is simple and can be
performed by any user.

Parsers tend to vary a lot. Some will be very high volume receiving thousands of messages
per second and others will be much lower. Rather than using a standard setting for the
number of partitions and parallelism, you should base your settings on the expected data
volume. That said, use the following guidelines:

The spout parallelism should be roughly the same as your Kafka partitions.

Consider data flow when assigning Kafka partitions to parsers.

Keep in mind the aggregate number of partitions when assigning them to partitions.
You do not want to assign the maximum number of partitions to each parser because
that can overload your system.

The parser topologies are deployed by a builder pattern that takes parameters from the CLI
as set by the Management module. The parser properties materialize as follows:

Management UI -> parser json config and CLI -> Storm

The following table lists the parser properties you can modify in the Management
module:

Category

Management UI Property Name

CLI Option

Storm topology config

Num Workers

-nw,--num_workers <NUM_WORKERS>

Num Ackers

--na,--num_ackers <NUM_ACKERS>

Storm Config

<JSON_FILE>, e.g., { "topology.max.spout.pending" : NUM
}

Kafka

Spout Parallelism

-sp,--spout_p <SPOUT_PARALLELISM_HINT>

Spout Num Tasks

-snt,--spout_num_tasks <NUM_TASKS>

Spout Config

<JSON_FILE>, e.g., { "spout.pollTimeoutMs" : 200
}

Spout Config

<JSON_FILE>, e.g., { "spout.maxUncommittedOffsets" : 10000000
}

Spout Config

<JSON_FILE>, e.g., { "spout.offsetCommitPeriodMs" : 30000
}

Parser bolt

Parser Num Tasks

-pnt,--parser_num_tasks <NUM_TASKS>

Parser Parallelism

-pp,--parser_p <PARALLELISM_HINT>

Parser Parallelism

-pp,--parser_p <PARALLELISM_HINT>

All of the Storm parameters are available in the STORM SETTINGS section of the
Management module.

For the Storm config and Spout config properties, you enter the JSON_FILE
information in the appropriate field using the JSON format supplied in the following
table.