exportAIRFLOW_HOME=~/airflow
# install from pypi using pip
pip install apache-airflow
# initialize the database
airflow initdb
# start the web server, default port is 8080
airflow webserver -p 8080
----------------------------
notice SQLlite is the default DB , it is not possible to run parallel on it. it is used to get started
can you mysql
scales out with mesos.
DAG - is just a container script to connect all task, cant pass data betwean task via the dag, there is s spesific module for it.
operator - what runs defacto

In addition to these basic building blocks, there are many more specific operators: DockerOperator, HiveOperator, S3FileTransferOperator, PrestoToMysqlOperator, SlackOperator… you get the idea!

Quick summary of the terms used by Airflow

TASK- a running operator…Hooks are interfaces to external platforms and databases like Hive, S3, MySQL, Postgres, HDFS, and Pig

Airflow pools can be used to limit the execution parallelism on arbitrary sets of tasks.

The connection information to external systems is stored in the Airflow metadata database and managed in the UI

Queues

When using the CeleryExecutor, the celery queues that tasks are sent to can be specified. queue is an attribute of BaseOperator,

XComs let tasks exchange messages

Variables are a generic way to store and retrieve arbitrary content or settings as a simple key value store within Airflow.

Branching

Sometimes you need a workflow to branch, or only go down a certain path based on an arbitrary condition which is typically related to something that happened in an upstream task. One way to do this is by using the BranchPythonOperator.

SubDAGs

SubDAGs are perfect for repeating patterns. Defining a function that returns a DAG object is a nice design pattern when using Airflow.

SLAs

Service Level Agreements, or time by which a task or DAG should have succeeded, can be set at a task level as a timedelta

Trigger Rules

Though the normal workflow behavior is to trigger tasks when all their directly upstream tasks have succeeded, Airflow allows for more complex dependency settings.

Jinja Templating

Airflow leverages the power of Jinja Templating and this can be a powerful tool to use in combination with macros (see the Macros section).