Caching terminology

A cache directive defines the path to cache while a cache pool manages groups of
cache directives.

Cache directive

Defines the path to be cached. Paths can point either directories or files.
Directories are cached non-recursively, meaning only files in the
first-level listing of the directory will be cached.

Cache directives also specify additional parameters, such as the cache
replication factor and expiration time. The replication factor specifies the
number of block replicas to cache. If multiple cache directives refer to the
same file, the maximum cache replication factor is applied.

The expiration time is specified on the command line as a time-to-live (TTL),
which represents a relative expiration time in the future. After a cache
directive expires, it is no longer taken into consideration by the NameNode
when making caching decisions.

Cache pool

An administrative entity that manages groups of cache cirectives. Cache pools
have UNIX-like permissions that restrict which users and groups have access
to the pool. Write permissions allow users to add and remove cache
directives to the pool. Read permissions allow users to list the Cache
Directives in a pool, as well as additional metadata. Execute permissions
are unused.

Cache pools are also used for resource management. Cache pools can enforce a
maximum memory limit, which restricts the aggregate number of bytes that can
be cached by directives in the pool. Normally, the sum of the pool limits
will approximately equal the amount of aggregate memory reserved for HDFS
caching on the cluster. Cache pools also track a number of statistics to
help cluster users track what is currently cached, and to determine what
else should be cached.

Cache pools can also enforce a maximum time-to-live. This restricts the maximum
expiration time of directives being added to the pool.