Data Lifecycle Manager
terminology

Data Lifecycle Manager (DLM)
is a UI service that is enabled through DPS Platform. From the DLM UI you can create and
manage replication and disaster recovery policies and jobs.

DLM App or Service

The web UI that runs on the DPS platform host. The corresponding agent needs to be
installed on the clusters.

DLM Engine

The agent required for DLM. Also referred to as the Beacon engine, this replication
engine must be installed as a management pack on each cluster to be used in data
replication jobs. The engine maintains, in a configured database, information about
clusters and policies that are involved in replication.

data center

The facility that contains the computer, server, and storage systems and associated
infrastructure, such as routers, switches, and so forth. Corporate data is stored,
managed, and distributed from the data center. In an on-premise environment, a data
center is often composed of a single HDP cluster. However, a single data center can
contain multiple HDP clusters.

IaaS cluster

A full HDP cluster on cloud VMs with Apache services running, such as HDFS, YARN,
Ambari, Hiveserver2, Ranger, Atlas, and DLM Engine. Replication behavior is similar
to on-premise cluster replication.

The data is on local HDFS.

cloud data lake or data lake

An HDP cluster on the cloud, using VMs, with data retained on cloud storage. A cloud
data lake requires minimal services for metadata and governance, such as Hive
metastore, Ranger, Atlas, and DLM Engine.

The data is on the cloud.

cloud storage

Any storage retained in a cloud account, such as Amazon S3 web service.

on-premise cluster

A full HDP cluster in a data center, with Apache services running, such as HDFS,
Yarn, HMS, hiveserver2, Ranger, Atlas and Beacon. Replication behavior is similar to
IaaS cluster replication.

The data is on local HDFS.

policy

A set of rules applied to a replication relationship. The rules include which
clusters serve as source and destination, the type of data to replicate, the schedule
for replicating data, and so on.

job

An instance of a policy that is running or has run.

source cluster

The cluster that contains the source data that will be replicated to a destination
cluster. Source data could be an HDFS dataset or a Hive database.

destination cluster

The cluster to which an HDFS dataset or Hive database is replicated.

target

The path on the destination cluster to which the HDFS dataset or Hive database is
replicated.