Our customers are interested in increasing profitability through Data Driven Marketing and not so much in the IT side of it. How hard it actually is to ensure smooth operations is solved by our tooling, making sure the customer can focus on what matters most.

Last week we reached a milestone. Our automated daily workflow processed the one millionth job for one of our customers. At HDA, we automatically run thousands of daily ETL jobs for all our customers, connecting to many different data sources, retrieving, cleaning and transforming data to store for reporting and visualization. The biggest challenge in this is to keep track of all jobs, whether they succeeded or failed, and in case of failure automatically (or sometimes manually) retry individual jobs to ensure all data will be available for that day.

The tool is developed in house as none of the commercially available data orchestration tooling fitted the requirements:

To schedule recurring and ongoing jobs; streaming, hourly, daily, weekly or more intricate ones like only on the second Monday of the month before 9:15.

Keep track of job states, retry if data sources are temporarily off line, report success and alert when there is a failure.

The tooling we use is our “elastic Big Data Box” or eBDB. The tool is highly scalable and runs in parallel on a few dozen machines in Amazon Web Services (AWS). It automatically scales up and down, based on the available workload. As AWS EC2 is pay per use, the tool is setup to ensure an affordable solution for our customers.