Hadoop v1.x On-Premise Templates

As promised, here are the templates that support the talk at Innovate and the recent Webinar on Hadoop Automation. To watch the Webcast on demand, gohere.

These templates are provided to you as is. There is no support for these templates. Additionally, Automic holds no liability for their use. We ask if you improve on them that you share those improvements with the community.

If you have any questions on these templates, please use the community to discuss them.

Exports contain examples for Hadoop installed as a Linux/Unix Cluster. All of the templates rely on the Linux/Unix OS Agent. They assume you are running the agent on a box that has access to the the cluster. These templates use FileList and SQLi Variables. We use a best practice of putting the prompt set on a workflow definition rather than directly on the objects which allows you to quickly swap jobs from say a Windows box into the workflow without having to recreate the prompt set. Use the JOBP when assembling a workflow.

Hive job to execute a Hive Script: TEMPLATE.HIVE.RUN_SCRIPT

To configure for your environment:

Edit the Agent, Login and Path parameters to fit your environment.

Edit the 'HADOOP_SCRIPT_LIST' variable object to point to the directory that contains your Hive Scripts.

Pig Job to execute a Pig Script: TEMPLATE.PIG.RUN_SCRIPT

To configure for your environment:

If you didn't do this already for your Hive scripts, edit the 'HADOOP_SCRIPT_LIST' variable object to point to the directory that contains your Hive Scripts.

Edit the Agent, Login, and Path parameters to fit your environment.

HDFS Jobs: CHECK_FILE (Use this like you would a file event in HDFS), LIST_FILES, DELETE_A_FILE, DELETE_DIRECTORY_ALL, TRANSFER_FROM_LOCAL_UNIX, TRANSFER_TO_LOCAL_UNIX

No customization needed for these jobs.

Sqoop:IMPORT,EXPORT

To configure for your environment. We only created one Sqoop job for mySQL as an example. If you are using a different database, you need to change the connection string syntax.

Make sure that the mySQL JDBC driver is in the correct path and the agent on the machine where the job is running is able to access the mySQL database server.