Specify whether the source data is local on HDFS or on
Microsoft Azure or Amazon S3 in the cloud. If your target is
Azure or S3, you can only use HDFS for the source.

Source Cluster

Select an existing cluster entity.

Source Path

Enter the path to the source data.

Target
Location

Specify whether the mirror target is local on HDFS or on
Microsoft Azure or Amazon S3 in the cloud. If your target is
Azure or S3, you can only use HDFS for the source.

Target Cluster

Select an existing cluster entity to serve as target for the
mirrored data.

Target Path

Enter the path to the directory that will contain the
mirrored data.

Run job here

Choose whether to execute the job on the source or on the
target cluster.

Validity Startand End

Combined with the frequency value to determine the window of
time in which a Falcon mirror job can execute. The workflow job
starts executing after the schedule time and when all the inputs
are available. The workflow ends before the
specified end time, so there is not a workflow instance at end
time. Also known as run duration.

Frequency

How often the process is generated. Valid frequency types are
minutes, hours, days, and months.

Timezone

The timezone is associated with the start and end times.
Default timezone is UTC.

Send alerts to

A comma-separated list of email addresses to which alerts are
sent, in the format
name@company.com.

The maximum number of maps used during replication. This
setting impacts performance and throttling. Default is
5.

Max Bandwidth (MB)

The bandwidth in MB/s used by each mapper during replication.
This setting impacts performance and throttling. Default is 100
MB.

Retry Policy

Defines how the workflow failures should be handled. Options
are Periodic, Exponential Backoff, and Final.

Delay

The time period after which a retry attempt is made. For
example, an Attempt value of 3 and Delay value of 10 minutes
would cause the workflow retry to occur after 10 minutes, 20
minutes, and 30 minutes after the start time of the
workflow. Default is 30 minutes.

Attempts

How many times the retry policy should be implemented
before the job fails. Default is 3.

Access Control
List

Specify the HDFS owner, group, and access permissions for the
cluster. Default permissions are 755 (rwx/r-x/r-x).

Click Next to view a summary of your entity
definition.

(Optional) Click Preview XML to review or edit the entity
definition in XML.

After verifying the entity definition, click Save.

The entity is automatically submitted for verification, but it is not
scheduled to run.

Verify that you successfully created the entity.

Type the entity name in the Falcon web UI
Search field and press
Enter.

If the entity name appears in the search results, it was
successfully created.

Select existing cluster entities, one to serve as source for
the mirrored data and one to serve as target for the mirrored
data.

Cluster entities must be available in Falcon before a mirror
job can be created.

HiveServer2 Endpoint, Source &
Target

Enter the location of data to be mirrored on the source and
the location of the mirrored data on the target.

The format is
hive2://localhost:1000.

Hive2 Kerberos Principal, Source &
Target

This field is automatically populated with the value of the
service principal for the metastore Thrift server.

The value is displayed in Ambari at Hive > Config >
Advanced > Advanced hive-site >
hive.metastore.kerberos.principal and must be
unique.

Meta Store URI, Source &
Target

Used by the metastore client to connect to the remote
metastore.

The value is displayed in Ambari at Hive > Config >
Advanced > General > hive.metastore.uris.

Kerberos Principal, Source &
Target

The field is automatically populated.

Property=dfs.namenode.kerberos.principal
and Value=nn/_HOST@EXAMPLE.COM and must be
unique.

Run job here

Choose whether to execute the job on the source cluster or on
the target cluster.

None

I want to copy

Select to copy one or more Hive databases or copy one or more
tables from a single database. You must identify the specific
databases and tables to be copied.

None

Validity Startand End

Combined with the frequency value to determine the window of
time in which a Falcon mirror job can execute.

The workflow job starts executing after the schedule time and
when all the inputs are available. The workflow ends
before the specified end time, so there
is not a workflow instance at end time. Also known as
run duration.

For example, an Attempt value of 3 and Delay value of 10
minutes would cause the workflow retry to occur after 10
minutes, 20 minutes, and 30 minutes after the start time of the
workflow. Default is 30 minutes.

Attempts

How many times the retry policy should be implemented
before the job fails.

Default is 3.

Access Control
List

Specify the HDFS owner, group, and access permissions for the
cluster.

Default permissions are 755 (rwx/r-x/r-x).

Click Next to view a summary of your entity
definition.

(Optional) Click Preview XML to review or edit the entity
definition in XML.

After verifying the entity definition, click Save.

The entity is automatically submitted for verification, but it is not
scheduled to run.

Verify that you successfully created the entity.

Type the entity name in the Falcon web UI
Search field and press
Enter.

If the entity name appears in the search results, it was
successfully created.

Snapshot-based mirroring is an efficient data backup method because only updated
content is actually transferred during the mirror job. You can mirror snapshots from a
single source directory to a single target directory. The destination directory is the
target for the backup job.

Prerequisites

Source and target clusters must run Hadoop 2.7.0 or higher.

Falcon does not validate versions.

Source and target clusters should both be either secure or unsecure.

This is a recommendation, not a requirement.

Source and target clusters must have snapshot capability enabled (the default
is "enabled").

The user submitting the mirror job must have access permissions on both the
source and target clusters.

To mirror snapshot data with the Falcon web
UI:

Ensure that you have set permissions correctly, enabled snapshot mirroring,
and defined required entities as described in Preparing to Mirror Data.

At the top of the Falcon web UI page, click Create >
Mirror > Snapshot.

On the New Snapshot Based Mirror page, specify the values for the following
properties:

Select an existing source cluster entity. At least one
cluster entity must be available in Falcon.

Target,
Cluster

Select an existing target cluster entity. At least one
cluster entity must be available in Falcon.

Source, Directory

Enter the path to the source data.

Source, Delete Snapshot
After

Specify the time period after which the mirrored snapshots
are deleted from the source cluster. Snapshots are retained past
this date if the number of snapshots is less than the Keep Last
setting.

Source, Keep Last

Specify the number of snapshots to retain on the source
cluster, even if the delete time has been reached. Upon reaching
the number specified, the oldest snapshot is deleted when the
next job is run.

Target, Directory

Enter the path to the location on the target cluster in which
the snapshot is stored.

Target, Delete Snapshot
After

Specify the time period after which the mirrored snapshots
are deleted from the target cluster. Snapshots are retained
past this date if the number of snapshots is less than the
Keep Last setting.

Target, Keep Last

Specify the number of snapshots to retain on the target
cluster, even if the delete time has been reached. Upon
reaching the number specified, the oldest snapshot is
deleted when the next job is run.

Run job here

Choose whether to execute the job on the source or on the
target cluster.

Run Duration Startand
End

Combined with the frequency value to determine the window of
time in which a Falcon mirror job can execute. The workflow job
starts executing after the schedule time and when all the inputs
are available. The workflow ends before the
specified end time, so there is not a workflow instance at end
time. Also known as validity time.

Frequency

How often the process is generated. Valid frequency types are
minutes, hours, days, and months.

Enable to encrypt data at rest. See "Enabling Transparent
Data Encryption" in Using Advanced Features for more
information.

Retry Policy

Defines how the workflow failures should be handled. Options
are Periodic, Exponential Backup, and Final.

Delay

The time period after which a retry attempt is made. For
example, an Attempt value of 3 and Delay value of 10 minutes
would cause the workflow retry to occur after 10 minutes, 20
minutes, and 30 minutes after the start time of the workflow.
Default is 30 minutes.

Attempts

How many times the retry policy should be implemented
before the job fails. Default is 3.

Max Maps

The maximum number of maps used during DistCp replication.
This setting impacts performance and throttling. Default is
5.

Max Bandwidth (MB)

The bandwidth in MB/s used by each mapper during replication.
This setting impacts performance and throttling. Default is 100
MB.

Send alerts to

A comma-separated list of email addresses to which alerts are
sent, in the format name@xyz.com.

Access Control
List

Specify the HDFS owner, group, and access permissions for the
cluster. Default permissions are 755 (rwx/r-x/r-x).

Click Next to view a summary of your entity
definition.

(Optional) Click Preview XML to review or edit the entity
definition in XML.

After verifying the entity definition, click Save.

The entity is automatically submitted for verification, but it is not
scheduled to run.

Verify that you successfully created the entity.

Type the entity name in the Falcon web UI
Search field and press
Enter.

If the entity name appears in the search results, it was
successfully created.

In HDP 2.6, Falcon client-side recipes were deprecated and replaced with more
extensible server-side extensions. Existing Falcon workflows that use client-side
recipes are still supported, but any new mirror job must use the server-side extensions.