When reading the items, the DataStore will return a generator. The
items will be ordered lexicographically according to their name.

There is a serialization protocol to store objects in the datastore.
An object is serializable if it has a method __toh5__ returning
an array and a dictionary, and a method __fromh5__ taking an array
and a dictionary and populating the object.
For an example of use see openquake.hazardlib.site.SiteCollection.

A callable object built on top of a dictionary of functions, used
as a smart registry or as a poor man generic function dispatching
on the first argument. It is typically used to implement converters.
Here is an example:

format_attrs(fmt, obj) calls the correct underlying function
depending on the fmt key. If the format is unknown a KeyError is
raised. It is also possible to set a keymissing function to specify
what to return if the key is missing.

For a more practical example see the implementation of the exporters
in openquake.calculators.export

Takes an array with duplicate values and categorize it, i.e. replace
the values with codes of length nchars in base64. With nchars=2 4096
unique values can be encoded, if there are more nchars must be increased
otherwise a ValueError will be raised.

If module_or_package is a module, just import it; if it is a package,
recursively imports all the modules it contains. Returns the names of
the modules that were imported as a set. The set can be empty if
the modules were already in sys.modules.

A class to serialize a set of parameters in HDF5 format. The goal is to
store simple parameters as an HDF5 table in a readable way. Each
parameter can be retrieved as an attribute, given its name. The
implementation treats specially dictionary attributes, by storing
them as attrname.keyname strings, see the example below:

This module defines a Node class, together with a few conversion
functions which are able to convert NRML files into hierarchical
objects (DOM). That makes it easier to read and write XML from Python
and viceversa. Such features are used in the command-line conversion
tools. The Node class is kept intentionally similar to an
Element class, however it overcomes the limitation of ElementTree: in
particular a node can manage a lazy iterable of subnodes, whereas
ElementTree wants to keep everything in memory. Moreover the Node
class provides a convenient dot notation to access subnodes.

The Node class is instantiated with four arguments:

the node tag (a mandatory string)

the node attributes (a dictionary)

the node value (a string or None)

the subnodes (an iterable over nodes)

If a node has subnodes, its value should be None.

For instance, here is an example of instantiating a root node
with two subnodes a and b:

The lazytree object defined here consumes no memory, because the
nodes are not created a instantiation time. They are created as
soon as you start iterating on the lazytree. In particular
list(lazytree) will generated all of them. If your goal is to
store the tree on the filesystem in XML format you should use
a writing routine converting a subnode at the time, without
requiring the full list of them. The routines provided by
ElementTree are no good, however commonlib.writers
provide an StreamingXMLWriter just for that purpose.

Lazy trees should not be used unless it is absolutely necessary in
order to save memory; the problem is that if you use a lazy tree the
slice notation will not work (the underlying generator will not accept
it); moreover it will not be possible to iterate twice on the
subnodes, since the generator will be exhausted. Notice that even
accessing a subnode with the dot notation will avance the
generator. Finally, nodes containing lazy nodes will not be pickleable.

A class to make it easy to edit hierarchical structures with attributes,
such as XML files. Node objects must be pickleable and must consume as
little memory as possible. Moreover they must be easily converted from
and to ElementTree objects. The advantage over ElementTree objects
is that subnodes can be lazily generated and that they can be accessed
with the dot notation.

There are several good libraries to manage parallel programming in Python, both
in the standard library and in third party packages. Since we are not
interested in reinventing the wheel, OpenQuake does not provide any new
parallel library; however, it does offer some glue code so that you
can use over your library of choice. Currently threading, multiprocessing,
zmq and celery are supported. Moreover,
openquake.baselib.parallel offers some additional facilities
that make it easier to parallelize scientific computations,
i.e. embarrassingly parallel problems.

Typically one wants to apply a callable to a list of arguments in
parallel, and then combine together the results. This is known as a
MapReduce problem. As a simple example, we will consider the problem
of counting the letters in a text, by using the following count
function:

A Starmap object is an iterable: when iterating over it produces
task results. It also has a reduce method similar to functools.reduce
with sensible defaults:

the default aggregation function is add, so there is no need to specify it

the default accumulator is an empty accumulation dictionary (see
openquake.baselib.AccumDict) working as a Counter, so there
is no need to specify it.

You can of course override the defaults, so if you really want to
return a Counter you can do

>>> res=Starmap(count,arglist).reduce(acc=collections.Counter())

In the engine we use nearly always callables that return dictionaries
and we aggregate nearly always with the addition operator, so such
defaults are very convenient. You are encouraged to do the same, since we
found that approach to be very flexible. Typically in a scientific
application you will return a dictionary of numpy arrays.

The parallelization algorithm used by Starmap will depend on the
environment variable OQ_DISTRIBUTE. Here are the possibilities
available at the moment:

OQ_DISTRIBUTE not set or set to “processpool”:

use multiprocessing

OQ_DISTRIBUTE set to “no”:

disable the parallelization, useful for debugging

OQ_DISTRIBUTE set to “celery”:

use celery, useful if you have multiple machines in a cluster

OQ_DISTRIBUTE set tp “zmq”

use the zmq concurrency mechanism (experimental)

There is also an OQ_DISTRIBUTE = “threadpool”; however the
performance of using threads instead of processes is normally bad for the
kind of applications we are interested in (CPU-dominated, which large
tasks such that the time to spawn a new process is negligible with
respect to the time to perform the task), so it is not recommended.

If you are using a pool, is always a good idea to cleanup resources at the end
with

>>> Starmap.shutdown()

Starmap.shutdown is always defined. It does nothing if there is
no pool, but it is still better to call it: in the future, you may change
idea and use another parallelization strategy requiring cleanup. In this
way your code is future-proof.

A major feature of the Starmap API is the ability to monitor the time spent
in each task and the memory allocated. Such information is written into an
HDF5 file that can be provided by the user or autogenerated. To autogenerate
the file you can use openquake.baselib.datastore.hdf5new() which
will create a file named calc_XXX.hdf5 in your $OQ_DATA directory
(if the environment variable is not set, the engine will use $HOME/oqdata).
Here is an example of usage:

After the calculation, or even while the calculation is running, you can
open the calculation file for reading and extract the performance information
for it. The engine provides a command to do that, oq show performance,
but you can also get it manually, with a call to
openquake.baselib.performance.performance_view(h5) which will return
the performance information as a numpy array:

The Starmap class has a very convenient classmethod Starmap.apply
which is used in several places in the engine. Starmap.apply is useful
when you have a sequence of objects that you want to split in homogenous chunks
and then apply a callable to each chunk (in parallel). For instance, in the
letter counting example discussed before, Starmap.apply could
be used as follows:

The API of Starmap.apply is designed to extend the one of apply,
a builtin of Python 2; the second argument is the tuple of arguments
passed to the first argument. The difference with apply is that
Starmap.apply returns a Starmap object so that nothing is
actually done until you iterate on it (reduce is doing that).

How many chunks will be produced? That depends on the parameter
concurrent_tasks; it it is not passed, it has a default of 5 times
the number of cores in your machine - as returned by os.cpu_count() -
and Starmap.apply will try to produce a number of chunks close to
that number. The nice thing is that it is also possible to pass a
weight function. Suppose for instance that instead of a list of
letters you have a list of seismic sources: some sources requires a
long computation time (such as ComplexFaultSources), some requires a
short computation time (such as PointSources). By giving an heuristic
weight to the different sources it is possible to produce chunks with
nearly homogeneous weight; in particular PointSource tasks will
contain a lot more sources than tasks with ComplexFaultSources.

It is essential in large computations to have a homogeneous task
distribution, otherwise you will end up having a big task dominating
the computation time (i.e. you may have 1000 cores of which 999 are free,
having finished all the short tasks, but you have to wait for days for
the single core processing the slow task). The OpenQuake engine does
a great deal of work trying to split slow sources in more manageable
fast sources.

An utility to manually pickling/unpickling objects.
The reason is that celery does not use the HIGHEST_PROTOCOL,
so relying on celery is slower. Moreover Pickled instances
have a nice string representation and length giving the size
of the pickled bytestring.

Apply a task to a tuple of the form (sequence, *other_args)
by first splitting the sequence in chunks, according to the weight
of the elements and possibly to a key (see :func:
openquake.baselib.general.split_in_blocks).

Convert an iterable of objects into a list of pickled objects.
If the iterable contains copies, the pickling will be done only once.
If the iterable contains objects already pickled, they will not be
pickled again.

Call the given function with the given arguments safely, i.e.
by trapping the exceptions. Return a pair (result, exc_type)
where exc_type is None if no exceptions occur, otherwise it
is the exception class and the result is a string containing
error message and traceback.

Measure the resident memory occupied by a list of processes during
the execution of a block of code. Should be used as a context manager,
as follows:

withMonitor('do_something')asmon:do_something()printmon.mem

At the end of the block the Monitor object will have the
following 5 public attributes:

.start_time: when the monitor started (a datetime object)
.duration: time elapsed between start and stop (in seconds)
.exc: usually None; otherwise the exception happened in the with block
.mem: the memory delta in bytes

The behaviour of the Monitor can be customized by subclassing it
and by overriding the method on_exit(), called at end and used to display
or store the results of the analysis.

NB: if the .address attribute is set, it is possible for the monitor to
send commands to that address, assuming there is a
multiprocessing.connection.Listener listening.

A simple way to define command processors based on argparse.
Each parser is associated to a function and parsers can be
composed together, by dispatching on a given name (if not given,
the function name is used).