Dynamic task scheduling optimized for computation. This is similar to
Airflow, Luigi, Celery, or Make, but optimized for interactive
computational workloads.

“Big Data” collections like parallel arrays, dataframes, and lists that
extend common interfaces like NumPy, Pandas, or Python iterators to
larger-than-memory or distributed environments. These parallel collections
run on top of the dynamic task schedulers.

fromdaskimportdelayedL=[]forfninfilenames:# Use for loops to build up computationdata=delayed(load)(fn)# Delay execution of functionL.append(delayed(process)(data))# Build connections between variablesresult=delayed(summarize)(L)result.compute()

Dask represents parallel computations with task graphs. These
directed acyclic graphs may have arbitrary structure, which enables both
developers and users the freedom to build sophisticated algorithms and to
handle messy situations not easily managed by the map/filter/groupby
paradigm common in most data engineering frameworks.

We originally needed this complexity to build complex algorithms for
n-dimensional arrays but have found it to be equally valuable when dealing with
messy situations in everyday problems.

Internally Dask encodes algorithms in a simple format involving Python dicts,
tuples, and functions. This graph format can be used in isolation from the
dask collections. Working directly with dask graphs is rare unless you intend
to develop new modules with Dask. Even then, dask.delayed is
often a better choice. If you are a core developer, then you should start here.