When working with dask collections, you will rarely need to
interact with scheduler get functions directly. Each collection has a
default scheduler, and a built-in compute method that calculates the output
of the collection:

You may wish to compute results from multiple dask collections at once.
Similar to the compute method on each collection, there is a general
compute function that takes multiple collections and returns multiple
results. This merges the graphs from each collection, so intermediate results
are shared:

For most cases, the default settings are good choices. However, sometimes you
may want to use a different scheduler. There are two ways to do this.

Using the get keyword in the compute method:

>>> x.sum().compute(get=dask.multiprocessing.get)

Using dask.set_options. This can be used either as a context manager, or to
set the scheduler globally:

# As a context manager>>>withdask.set_options(get=dask.multiprocessing.get):...x.sum().compute()# Set globally>>>dask.set_options(get=dask.multiprocessing.get)>>>x.sum().compute()

Additionally, each scheduler may take a few extra keywords specific to that
scheduler. For example, the multiprocessing and threaded schedulers each take a
num_workers keyword, which sets the number of processes or threads to use
(defaults to number of cores). This can be set by passing the keyword when
calling compute:

# Compute with 4 threads>>>x.compute(num_workers=4)

Alternatively, the multiprocessing and threaded schedulers will check for a
global pool set with dask.set_options:

Debugging parallel code can be difficult, as conventional tools such as pdb
don’t work well with multiple threads or processes. To get around this when
debugging, we recommend using the synchronous scheduler found at
dask.get. This runs everything serially, allowing it to work
well with pdb: