OOZIE_CHECK_OWNER
: If set to true
, Oozie setup/start/run/stop scripts will check that the
owner of the Oozie installation directory matches the user invoking the script. The default
value is undefined and interpreted as a false
.

If Oozie is configured to use HTTPS (SSL), then the following environment variables are also used:

The original oozie.war
file is in the Oozie server installation directory.

After the Hadoop JARs and the ExtJS library has been added to the oozie.war
file Oozie is ready to run.

Delete any previous deployment of the oozie.war
from the servlet container (if using Tomcat, delete
=oozie.war= and oozie
directory from Tomcat's webapps/
directory)

Deploy the prepared oozie.war
file (the one that contains the Hadoop JARs adn the ExtJS library) in the
servlet container (if using Tomcat, copy the prepared oozie.war
file to Tomcat's webapps/
directory).

IMPORTANT:
Only one Oozie instance can be deployed per Tomcat instance.

Database Configuration

HSQL is normally used for testcases as it is an in-memory database and all data is lost everytime Oozie is stopped.

If using Derby, MySQL, Oracle or PostgreSQL, the Oozie database schema must be created using the ooziedb.sh
command
line tool.

If using MySQL or Oracle, the corresponding JDBC driver JAR file mut be copied to Oozie's libext/
directory and
it must be added to Oozie WAR file using the bin/addtowar.sh
or the oozie-setup.sh
scripts using the -jars
option.

The SQL database used by Oozie is configured using the following configuration properties (default values shown):

NOTE:
If the oozie.db.schema.create
property is set to true
(default value is false
) the Oozie tables
will be created automatically without having to use the ooziedb
command line tool. Setting this property to
true
it is recommended only for development.

NOTE:
If the oozie.db.schema.create
property is set to true, the oozie.service.JPAService.validate.db.connection
property value is ignored and Oozie handles it as set to false
.

Once oozie-site.xml
has been configured with the database configuration execute the ooziedb.sh
command line tool to
create the database:

Oozie User Authentication Configuration

Anonymous access (*default*) does not require the user to authenticate and the user ID is obtained from
the job properties on job submission operations, other operations are anonymous.

Pseudo/simple authentication requires the user to specify the user name on the request, this is done by
the PseudoAuthenticator class by injecting the user.name
parameter in the query string of all requests.
The user.name
parameter value is taken from the client process Java System property user.name
.

The token.validity
indicates how long (in seconds) an authentication token is valid before it has
to be renewed.

The signature.secret
is the signature secret for signing the authentication tokens. If not set a random
secret is generated at startup time.

The oozie.authentication.cookie.domain
The domain to use for the HTTP cookie that stores the
authentication token. In order to authentiation to work correctly across all Hadoop nodes web-consoles
the domain must be correctly set.

The simple.anonymous.allowed
indicates if anonymous requests are allowed. This setting is meaningful
only when using 'simple' authentication.

The kerberos.principal
indicates the Kerberos principal to be used for HTTP endpoint.
The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification.

The kerberos.keytab
indicates the location of the keytab file with the credentials for the principal.
It should be the same keytab file Oozie uses for its Kerberos credentials for Hadoop.

Oozie Hadoop Authentication Configuration

Oozie works with Hadoop versions which support Kerberos authentication.

Oozie Hadoop authentication is configured using the following configuration properties (default values shown):

User ProxyUser Configuration

Proxyuser enables other systems that are Oozie clients to submit jobs on behalf of other users.

Because proxyuser is a powerful capability, Oozie provides the following restriction capabilities
(similar to Hadoop):

Proxyuser is an explicit configuration on per proxyuser user basis.

A proxyuser user can be restricted to impersonate other users from a set of hosts.

A proxyser user can be restricted to impersonate users belonging to a set of groups.

There are 2 configuration properties needed to set up a proxyuser:

oozie.service.ProxyUserService.proxyuser.#USER#.hosts: hosts from where the user #USER# can impersonate other users.

oozie.service.ProxyUserService.proxyuser.#USER#.groups: groups the users being impersonated by user #USER# must belong to.

Both properties support the '*' wildcard as value. Although this is recommended only for testing/development.

User Authorization Configuration

Oozie has a basic authorization model:

Users have read access to all jobs

Users have write access to their own jobs

Users have write access to jobs based on an Access Control List (list of users and groups)

Users have read access to admin operations

Admin users have write access to all jobs

Admin users have write access to admin operations

If security is disabled all users are admin users.

Oozie security is set via the following configuration property (default value shown):

oozie.service.AuthorizationService.security.enabled=false

NOTE: the old ACL model where a group was provided is still supported if the following property is set
in oozie-site.xml
:

oozie.service.AuthorizationService.default.group.as.acl=true

Admin users are determined from the list of admin groups, specified in
oozie.service.AuthorizationService.admin.groups
property. Use commas to separate multiple groups, spaces, tabs
and ENTER characters are trimmed.

If the above property for admin groups is not set, then the admin users are the users specified in the
conf/adminusers.txt
file. The syntax of this file is:

One user name per line

Empty lines and lines starting with '#' are ignored

Oozie System ID Configuration

Oozie has a system ID that is is used to generate the Oozie temporary runtime directory, the workflow job IDs, and the
workflow action IDs.

Two Oozie systems running with the same ID will not have any conflict but in case of troubleshooting it will be easier
to identify resources created/used by the different Oozie systems if they have different system IDs (default value
shown):

oozie.system.id=oozie-${user.name}

Filesystem Configuration

Oozie lets you to configure the allowed Filesystems by using the following configuration property in oozie-site.xml:

The above value, hdfs
, which is the default, means that Oozie will only allow HDFS filesystems to be used. Examples of other
filesystems that Oozie is compatible with are: hdfs, hftp, webhdfs, and viewfs. Multiple filesystems can be specified as
comma-separated values. Putting a * will allow any filesystem type, effectively disabling this check.

HCatalog Configuration

Refer to the Oozie HCatalog Integration
document for a overview of HCatalog and
integration of Oozie with HCatalog. This section explains the various settings to be configured in oozie-site.xml on
the Oozie server to enable Oozie to work with HCatalog.

Adding HCatalog jars to Oozie war:

For Oozie server to talk to HCatalog server, HCatalog and hive jars need to be in the server classpath.
hive-site.xml which has the configuration to talk to the HCatalog server also needs to be in the classpath.

The oozie-[version]-hcataloglibs.tar.gz in the oozie distribution bundles the required hcatalog and hive jars that
needs to be placed in the Oozie server classpath. If using a version of HCatalog bundled in
Oozie hcataloglibs/, copy the corresponding HCatalog jars from hcataloglibs/ to the libext/ directory. If using a
different version of HCatalog, copy the required HCatalog jars from such version in the libext/ directory.
This needs to be done before running the oozie-setup.sh
script so that these jars get added to the Oozie WAR file.

The above configuration defines the different uri handlers which check for existence of data dependencies defined in a
Coordinator. The default value is org.apache.oozie.dependency.FSURIHandler
. FSURIHandler supports uris with
schemes defined in the configuration oozie.service.HadoopAccessorService.supported.filesystems
which are hdfs, hftp
and webhcat by default. HCatURIHandler supports uris with the scheme as hcat.

PartitionDependencyManagerService and HCatAccessorService are required to work with HCatalog and support Coordinators
having HCatalog uris as data dependency. If the HCatalog server is configured to publish partition availability
notifications to a JMS compliant messaging provider like ActiveMQ, then JMSAccessorService needs to be added
to oozie.services.ext
to handle those notifications.

Configure JMS Provider JNDI connection mapping for HCatalog:

<property>
<name>oozie.service.HCatAccessorService.jmsconnections</name>
<value>
hcat://hcatserver.colo1.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.colo1.com:61616,
default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://broker.colo.com:61616;connectionFactoryNames#ConnectionFactory
</value>
<description>
Specify the map of endpoints to JMS configuration properties. In general, endpoint
identifies the HCatalog server URL. "default" is used if no endpoint is mentioned
in the query. If some JMS property is not defined, the system will use the property
defined jndi.properties. jndi.properties files is retrieved from the application classpath.
Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers.
hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616
</description>
</property>

Currently HCatalog does not provide APIs to get the connection details to connect to the JMS Provider it publishes
notifications to. It only has APIs which provide the topic name in the JMS Provider to which the notifications are
published for a given database table. So the JMS Provider's connection properties needs to be manually configured
in Oozie using the above setting. You can either provide a default
JNDI configuration which will be used as the
JMS Provider for all HCatalog servers, or can specify a configuration per HCatalog server URL or provide a
configuration based on a rule matching multiple HCatalog server URLs. For example: With the configuration of
hcat://${1}.${2}.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}.com:61616,
request URL of hcat://server1.colo1.com:8020 will map to tcp://broker.colo1.com:61616, hcat://server2.colo2.com:8020
will map to tcp://broker.colo2.com:61616 and so on.

If there is no JMS Provider configured for a HCatalog Server, then oozie polls HCatalog based on the frequency defined
in oozie.service.coord.input.check.requeue.interval
. This config also applies to HDFS polling.
If there is a JMS provider configured for a HCatalog Server, then oozie polls HCatalog based on the frequency defined
in oozie.service.coord.push.check.requeue.interval
as a fallback.
The defaults for oozie.service.coord.input.check.requeue.interval
and oozie.service.coord.push.check.requeue.interval
are 1 minute and 10 minutes respectively.

Notifications Configuration

Oozie supports publishing notifications to a JMS Provider for job status changes and SLA met and miss events. For
more information on the feature, refer JMS Notifications
documentation. Oozie can also send email
notifications on SLA misses.

Message Broker Installation
:

For Oozie to send/receive messages, a JMS-compliant broker should be installed. Apache ActiveMQ is a popular JMS-compliant
broker usable for this purpose. See here
for instructions on
installing and running ActiveMQ.

Services
:

Add/modify oozie.services.ext
property in oozie-site.xml
to include the following services.

Add oozie.jms.producer.connection.properties
property in oozie-site.xml
. Its value corresponds to an
identifier (e.g. default) assigned to a semi-colon separated key#value list of properties from your JMS broker's
=jndi.properties= file. The important properties are java.naming.factory.initial
and java.naming.provider.url
.

As an example, if using ActiveMQ in local env, the property can be set to

JMS consumers listen on a particular "topic". Hence Oozie needs to define a topic variable with which to publish messages
about the various jobs.

<property>
<name>oozie.service.JMSTopicService.topic.name</name>
<value>
default=${username}
</value>
<description>
Topic options are ${username} or a fixed string which can be specified as default or for a
particular job type.
For e.g To have a fixed string topic for workflows, coordinators and bundles,
specify in the following comma-separated format: {jobtype1}={some_string1}, {jobtype2}={some_string2}
where job type can be WORKFLOW, COORDINATOR or BUNDLE.
Following example defines topic for workflow job, workflow action, coordinator job, coordinator action,
bundle job and bundle action
WORKFLOW=workflow,
COORDINATOR=coordinator,
BUNDLE=bundle
For jobs with no defined topic, default topic will be ${username}
</description>
</property>

Another related property is the topic prefix.

<property>
<name>oozie.service.JMSTopicService.topic.prefix</name>
<value></value>
<description>
This can be used to append a prefix to the topic in oozie.service.JMSTopicService.topic.name. For eg: oozie.
</description>
</property>

Setting Up Oozie with HTTPS (SSL)

IMPORTANT
:
The default HTTPS configuration will cause all Oozie URLs to use HTTPS except for the JobTracker callback URLs. This is to simply
configuration (no changes needed outside of Oozie), but this is okay because Oozie doesn't inherently trust the callbacks anyway;
they are used as hints.

You can use either a certificate from a Certificate Authority or a Self-Signed Certificate. Using a self-signed certificate
requires some additional configuration on each Oozie client machine.

To use a Self-Signed Certificate

There are many ways to create a Self-Signed Certificate, this is just one way. We will be using the keytool program, which is
included with your JRE. If its not on your path, you should be able to find it in $JAVA_HOME/bin.

1. Run the following command (as the Oozie user); you will be asked a series of questions in an interactive prompt. It will create
the keystore file, which will be named .keystore
and located in the Oozie user's home directory.

keytool -genkey -alias tomcat -keyalg RSA

The password you enter for "keystore password" and "key password for " must be the same; Oozie is configued to use
"password" by default. If you want to use a password other than "password", you will need to change the OOZIE_HTTPS_KEYSTORE_PASS
environment variable.
The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the Oozie Server will be
running.

2. Run the following command (as the Oozie user) to export a certificate file from the keystore file:

To use a Certificate from a Certificate Authority

1. You will need to make a request to a Certificate Authority in order to obtain a proper Certificate; please consult a Certificate
Authority on this procedure.

2. Once you have your .cert file, run the following command (as the Oozie user) to create a keystore file from your certificate:

keytool -import -alias tomcat -file path/to/certificate.cert

The keystore file will be named .keystore
and located in the Oozie user's home directory.

Configure the Oozie Server to use SSL (HTTPS)

1. Make sure the Oozie server isn't running

2. Run the following command (as the Oozie user):

oozie-setup.sh prepare-war -secure

This will configure Oozie to use HTTPS instead of HTTP. To revert back to HTTP, simply rerun the command without -secure
.

3. Start the Oozie server

Configure the Oozie Client to connect using SSL (HTTPS)

The first two steps are only necessary if you are using a Self-Signed Certificate; the third is required either way.
Also, these steps must be done on every machine where you intend to use the Oozie Client.

1. Copy or download the .cert file onto the client machine

2. Run the following command (as root) to import the certificate into the JRE's keystore. This will allow any Java program,
including the Oozie client, to connect to the Oozie Server using your self-signed certificate.

Where ${JRE_cacerts} is the path to the JRE's certs file. It's location may differ depending on the Operating System, but its
typically called cacerts and located at ${JAVA_HOME}/lib/security/cacerts but may be under a different directory in ${JAVA_HOME}
(you may want to create a backup copy of this file first). The default password is changeit
.

3. When using the Oozie Client, you will need to use https://oozie.server.hostname:11443/oozie
instead of
http://oozie.server.hostname:11000/oozie -- Java will not automatically redirect from the http address to the https address.

Connect to the Oozie Web UI using SSL (HTTPS)

1. Use https://oozie.server.hostname:11443/oozie
though most browsers should automatically redirect you if you use http://oozie.server.hostname:11000/oozie

IMPORTANT
: If using a Self-Signed Certificate, your browser will warn you that it can't verify the certificate or something
similar. You will probably have to add your certificate as an exception.

Oozie Share Lib

The Oozie sharelib TAR.GZ file bundled with the distribution contains the necessary files to run Oozie map-reduce streaming, pig,
hive, sqooop, and distcp actions. There is also a sharelib for HCatalog. The sharelib is required for these actions to work; any
other actions (mapreduce, shell, ssh, and java) do not require the sharelib to be installed.

As of Oozie 4.0, the following property is included. If true, Oozie will create and ship a "launcher jar" that contains classes
necessary for the launcher job. If false, Oozie will not do this, and it is assumed that the necessary classes are in their
respective sharelib jars or the "oozie" sharelib instead. When false, the sharelib is required for ALL actions; when true, the
sharelib is only required for actions that need additional jars (the original list from above). The main advantage of setting this
to false is that launching jobs should be slightly faster.

Oozie Coordinators/Bundles Processing Timezone

By default Oozie runs coordinator and bundle jobs using UTC
timezone for datetime values specified in the application
XML and in the job parameter properties. This includes coordinator applications start and end times of jobs, coordinator
datasets initial-instance, bundle applications kick-offtimes. In addition, coordinator dataset instance URI templates
will be resolved using datetime values of the Oozie processing timezone.

It is possible to set the Oozie processing timezone to a timezone that is an offset of UTC, alternate timezones must
expressed in using a GMT offset ( GMT+/-####
). For example: GMT+0530
(India timezone).

To change the default UTC
timezone, use the oozie.processing.timezone
property in the oozie-site.xml
. For example:

MapReduce Workflow Uber Jars

For Map-Reduce jobs (not including streaming or pipes), additional jar files can also be included via an uber jar. An uber jar is a
jar file that contains additional jar files within a "lib" folder (see
Workflow Functional Specification
for more information). Submitting a workflow with an uber jar
requires at least Hadoop 2.2.0 or 1.2.0. As such, using uber jars in a workflow is disabled by default. To enable this feature, use
the oozie.action.mapreduce.uber.jar.enable
property in the oozie-site.xml
(and make sure to use a supported version of Hadoop).