To lay your Python code out in topologies which can be automatically
parallelized in a Storm cluster of machines. This lets you scale your
computation horizontally and avoid issues related to Python’s GIL. See
Parallelism and Workers.

After you create a streamparse project using sparsequickstart, you’ll have
a fabfile.py in that directory. In that file, you can specify two
functions (pre_submit and post_submit) which are expected to accept four arguments:

topology_name: the name of the topology being submitted

env_name: the name of the environment where the topology is being
submitted (e.g. "prod")

env_config: the relevant config portion from the config.json file for
the environment you are submitting the topology to

options: the fully resolved Storm options

Here is a sample fabfile.py file that sends a message to IRC after a
topology is successfully submitted to prod.

streamparse assumes your Storm servers have Python, pip, and virtualenv
installed. After that, the installation of all required dependencies (including
streamparse itself) is taken care of via the config.json file for the
streamparse project and the sparsesubmit command.

No, the Java requirements for streamparse are identical to that of Storm itself.
Storm requires Java and bundles Clojure as a requirement, so you do not need
to do any separate installation of Clojure. You just need Java on all Storm
servers.

It is highly recommended that you just modify your ~/.ssh/config file if you
need to tweak settings for setting up the SSH tunnel to your Nimbus server, but
you can also set your SSH password or port in your config.json by setting
the ssh_password or ssh_port environment settings.

{"topology_specs":"topologies/","virtualenv_specs":"virtualenvs/","envs":{"prod":{"user":"somebody","ssh_password":"THIS IS A REALLY BAD IDEA","ssh_port":52,"nimbus":"streamparse-box","workers":["streamparse-box"],"virtualenv_root":"/data/virtualenvs"}}}

In a small cluster it’s sufficient to specify the list of workers in config.json.
However, if you have a large or complex environment where workers are numerous
or short-lived, streamparse supports querying the nimbus server for a list of hosts.

An undefined list (empty or None) of workers will trigger the lookup.
Explicitly defined hosts are preferred over a lookup.

Lookups are configured on a per-environment basis, so the prod environment
below uses the dynamic lookup, while beta will not.