Talend Integration Cloud & AWS: 3 Ways to Automate Big Data (Part 2)

Talend Integration Cloud & AWS: 3 Ways to Automate Big Data (Part 2)

Mark Balkenende is a Sales Solution Architects Manager at Talend. Prior to joining Talend, Mark has had a long career of mastering and integrating data at a number of companies, including Motorola, Abbott Labs and Walgreens. Mark holds an Information Systems Management degree and is also an extreme cycling enthusiast.

In our last installment, we looked at how to easily configure your data warehouse or datamart to spin up and spin down automatically so that you don’t need to waste valuable compute resources (or money!) running databases when they’re not in use. Now let’s take a look at how you can automate your AWS Redshift cluster environments.

Don’t Let Your Data Warehouse Sit Idle

The other service that Talend Integration Cloud helps you automate is AWS Redshift, which is the data warehouse service on AWS. Remember when I asked in the first installment of this blog if it would be useful to stop a data warehouse or data mart if it was going to just sit idle for days? Well, check out the image below.

This is the tAmazonRedshiftManage component configuration details. If you are looking to manage your settings in order to have your Redshift clusters spin up when in use and spin down when they’re not, then all you need to do is go through a very simple and straight forward process to have the tAmazonRedshiftManage component automatically start or stop an AWS Redshift cluster. In the diagram above, I am showing how to start a cluster, but from a previous snapshot so that the cluster is fully configured and has the historical data that may be needed for processing new data.

This is yet another great example of how automating and building dynamic AWS database resources and integration flows together into a single, cohesive process can not only dramatically simplify IT management, but also reduce costs. Oh, by the way, the Redshift Manage component is also included in the free download of Talend Open Studio for Big Data!

Like PB&J, Talend and AWS are Better Together

The power that comes from pooling all these AWS database processes with Talend’s other data integration and processing capabilities is mind blowing! Take for example, the figure below, where I am starting an EMR cluster with Spark and then running a Spark Recommendation Model Refresh process using data from AWS S3 (S3 is AWS’s data storage service). Once the model is complete I can spin down the EMR cluster so that I am not wasting resources and adding unneeded expenses to my AWS account. Brilliant! This will make my boss very happy!

The figure above is a view of the Talend Integration Cloud environment where I have combined the EMR start and stop functions with a Spark Machine Learning process on the EMR cluster. I then build and refresh a Spark Recommendation model and, once complete, the last step is spinning down or terminating the EMR cluster.

With absolutely NO CODING needed, I have achieved the goal of starting computing resources when needed, completing my analytics processing job, and then shutting down the database and Hadoop processing resources.

Manage Your Data and Reports on Your Time

The Spring Release of Talend Integration Cloud now allows you to benefit from the cost efficiencies that come from using AWS cloud environments by automatically starting EMR and Redshift clusters when integration jobs are ready to execute, and spinning them down when jobs are complete. Now, IT departments can better manage the cost of Hadoop and data warehousing jobs, while improving productivity and agility. Welcome, my friends, to big data automation nation!