Azure Automation – the beginning of cloud ETL on Azure?

I have maintained a watching brief on what I refer to as “cloud ETL”, that is the ability build ETL routines in a cloud environment and therefore leverage all the benefits that the cloud model brings*. Thus far my main opinion piece in this area is What would a cloud-based ETL tool look like? in which I laid out what features I thought a cloud ETL tool should have:

Data transformation would be done “in the cloud” i.e. I wouldn’t need to own my own hardware in order to run it

Given that I have spent the majority of my career working with Microsoft technologies (in particular their ETL tool, SSIS) I am interested to know whether Microsoft will offer a cloud ETL tool. With that in mind I was interested to discover a new service on Azure that is currently in preview called Azure Automation (read Announcing Microsoft Azure Automation Preview). Azure Automation is essentially a a cloud-based workflow tool and, as I said above, workflow is a feature that I believe a cloud-based ETL tool should encompass:

Workflow. (e.g. Do this, then do that. Do these things in parallel. Only do this if some condition is true. Restart from here in case of failure.)

SSIS developers will of course be aware that SSIS has its own workflow tool (termed the Control Flow). It always kind of bugged me that different Microsoft tools had their own workflow technology. SSIS had one, I believe BizTalk had one, there was another called Windows Workflow Foundation (WWF) and in fact there was a possibility within the SQL Server 2008 timeframe that SSIS would replace its Control Flow with WWF (that never happened and the Program Manager that wanted to do it has since left the SSIS product team).

Azure Automation is built upon Powershell Workflow which in turn is built upon WWF (now simply called Workflow Foundation – WF). It certainly seems as though WF is becoming the foundational workflow technology to rule them all within Microsoft and that is no bad thing in my opinion – it seems foolish to reinvent the wheel every time. Powershell Workflow has the following cmdlets for building workflows:

A checkpoint is a snapshot of the current state of the workflow, including the current values of variables, and any output generated up to that point, and it saves it to disk. You can add multiple checkpoints to a workflow by using different checkpoint techniques. Windows PowerShell automatically uses the data in newest checkpoint for the workflow to recover and resume the workflow if the workflow is interrupted, intentionally or unintentionally. http://technet.microsoft.com/en-us/library/jj574114.aspx

It certainly appears to me that Azure Automation could satisfy my desire for a workflow engine for the purposes of cloud-ETL. Now if only Microsoft were working on cloud-based dataflows too we’d have something akin to SSIS-in-the-cloud .