These 5 slaves will report to the master and explain where they can be reached. So all we need to do in our PDI job/transformation is create a master slave configuration:

To top it off, we define MASTER_HOST and MASTER_PORT as parameters in the job and transformation…

So all that’s left to do is specify these parameters when you execute the job…

As you can see from the dialog, we pass the complete job (including sub-jobs and sub-transformations) over to the “Cluster Master” slave server prior to execution because it is not possible nor needed for Spoon to contact the various slave servers directly. That is because they report with their internal IP addresses. We wouldn’t want it otherwise since that offers the best performance (and costs less).

Matt, fantastic work as usual. Between this and the resource exporter, you’re evolving with the users needs, as always. With the advent of cloud computing and MPP databases, this kind of utility is sooooooo useful, whether you’re “cloud computing” with amazon or your own dedicated servers. I’m excited to put this to work and hope to be able to provide useful feedback. Working now on a project using Vertica (MPP) and our own dedicated cloud of servers. That management of the cloud servers will be a “growing” concern, and all this dynamic management helps so much. Awesomeeeeeeeeee!

I sent you an email at your Pentaho, but is kettle started as soon as you run the instances? I’m receiving some connection refused messages (from Pentaho running on Windows to the AMI’s which are Linux) – I have opened the correct port (8080) and still receive the messages.

This is probably caused by the fact that we start slave servers on the Amazon internal network. Spoon can’t directly access them. We do this for speed and $$ (internal traffic is fast and free of charge). That is why we included the “pass export to remote server” option in the job execution dialog.