Re: Oozie - Create a Spark workflow

As you said, when it's executed, we don't know on which yarn node the command will be executed. So, with this storage bay mounted on each node, it doesn't matter on which node it's executed (I think :p)

Correct

Regarding the rest. First of all you don't have to be a scala developer to schedule a script in oozie :)Your command should be "./runConsolidator.sh". Important is that your script has execute permissions and you define it in "Files".How it works: this shell action will be a YARN job, so YARN will create a temp folder e.g. "/yar/nm/some_id/another_id". All files defined in "Files" of this action, will automatically downloaded into this directory. This directory will be your working directory, so you should run your command with "./" in front, since by default, "./" is not defined in PATH.

NOTE: If your script is using jar files etc. then you should define all of them in "Files", so they will copied to the working directory.

I suggest to proceed with this approach. Setting the xml will mess things and you need some experience to do it and avoid mistakes. Once you create a working job from HUE, you can export the xml and start playing.

Re: Oozie - Create a Spark workflow

Regarding the rest. First of all you don't have to be a scala developer to schedule a script in oozie :)

Right, but as System / Big Data administrator it's usually not in my scope, but it's better to know for sure :p

So, now It works with this xml syntax (workflow.xml), I found the correct way with shell action two days ago, and i've implemented several jobs with this workflow as it's a generic syntax as well with variables :

I have my job.properties with my variables locally, I put the workflow.xml in HDFS in the specified directory, and the jar into a subdirectory named "lib". It execute the spark shell without error, it's perfect !

Thank you for your support for the workflow, it's not easy to understand at the beginning.

I still don't understand all the syntax, for example I need to add an action into my workflow for send emails, I found email actions but it does not work (I don't have a smtp server for the moment, but it can fail on email sending it doesn't matter, I juste want to have the correct syntax without oozie errors first).

Now, I try to schedule a job with a coordinator, but it doesn't work, I have an issue with timezone, I put "Europe/Paris" in the timezone field, but it's not the correct time for execution, always a difference between what I want and the time printed.

I've already configured the timezone in hue configuration, but there is a difference, Hue and Oozie not printed the same time (I think Oozie is UTC, and probably HUE in my timezone)

And my last question, is there a way to record my submitted job into Hue ? For example, I've write on my own the workflow and I want to submit it to Hue (easier for the customer).

Re: Oozie - Create a Spark workflow

You mean how the user can submit the job from HUE? If you save the file in HDFS as "workflow.xml", go to File browser using HUE. You will notice that if you select the check button of this file, you a "Submit" action button will appear. So the user can just hit it.

There are various cases to send an e-mail. If you need an e-mail if an error is encountered, or action takes too much time, then you have to enable SLAs on this action and define the recipient.

If you need an e-mail that the workflow executed successfully, then add a mail action just before the end of the workflow. If any previous action fails, the e-mail action will not executed. Unless you have modified the kill in one of your actions of course, and lead it to this e-mail action.

Re: Oozie - Create a Spark workflow

The developer (customer side) who work with me on the cluster try to use Apache Airflow, and after one week, he can do what we need (workflow, emailing / alerting, re-run, ...) without the load of files into hdfs, Apache airflow is running in standalone mode and the web UI is better than Oozie UI.

It seems a better solution than oozie, what do you think about this ?

As it is an incubating project, I don't know if it's a good idea, but the web UI is good, it looks easy to manage, I didn't know this new project but I think Oozie is outdated compare to Airflow.

For the moment Oozie is in stand-by, they will make a choice between oozie and airflow, but I must admit that Airflow looks a better solution.