This page describes various ways to set up Dask on different hardware, either
locally on your own machine or on a distributed cluster. If you are just
getting started then this page is unnecessary. Dask does not require any setup
if you only want to use it on a single computer.

Dask has two families of task schedulers:

Single machine scheduler: This scheduler provides basic features on a
local process or thread pool. This scheduler was made first and is the
default. It is simple and cheap to use. It can only be used on a single
machine and does not scale.

Distributed scheduler: This scheduler is more sophisticated, offers
more features, but also requires a bit more effort to set up. It can
run locally or distributed across a cluster.

If you import Dask, set up a computation, and then call compute then you
will use the single-machine scheduler by default. To use the dask.distributed
scheduler you must set up a Client

importdask.dataframeasdddf=dd.read_csv(...)df.x.sum().compute()# This uses the single-machine scheduler by default

fromdask.distributedimportClientclient=Client(...)# Connect to distributed cluster and override defaultdf.x.sum().compute()# This now runs on the distributed system

Note that the newer dask.distributed scheduler is often preferable even on
single workstations. It contains many diagnostics and features not found in
the older single-machine scheduler. The following pages explain in more detail
how to set up Dask on a variety of local and distributed hardware.

Single Machine:

Default Scheduler: The no-setup default.
Uses local threads or processes for larger-than-memory processing

Dask.distributed: The sophistication of
the newer system on a single machine. This provides more advanced
features while still requiring almost no setup.

Distributed computing:

Manual Setup: The command line interface to set up
dask-scheduler and dask-worker processes. Useful for IT or
anyone building a deployment solution.