Using Apache Solr for Ranger Audits

Apache Solr is an open-source enterprise search platform. Apache Ranger can use Apache
Solr to store audit logs, and Solr can also to provide a search capability of the audit logs
through the Ranger Admin UI.

It is recommended that Ranger audits be written to both Solr and HDFS. Audits to Solr are
primarily used to enable search queries from the Ranger Admin UI. HDFS is a long-term
destination for audits -- audits stored in HDFS can be exported to any SIEM system, or to
another audit store.

Apache Ranger uses Apache Solr to store audit logs and provides UI searching through the
audit logs. Solr must be installed and configured before installing Ranger Admin or any of
the Ranger component plugins. The default configuration for Ranger Audits to Solr uses the
shared Solr instance provided under the Ambari Infra service. Solr is both memory and CPU
intensive. If your production system has high volume of access requests, make sure that the
Solr host has adequate memory, CPU, and disk space.

SolrCloud is the preferred setup for production usage of Ranger. SolrCloud, which is
deployed with the Ambari Infra service, is a scalable architecture that can run as a single
node or multi-node cluster. It has additional features such as replication and sharding,
which is useful for high availability (HA) and scalability. You should plan your deployment
based on your cluster size. Because audit records can grow dramatically, plan to have at
least 1 TB of free space in the volume on which Solr will store the index data. Solr works
well with a minimum of 32 GB of RAM. You should provide as much memory as possible to the
Solr process.

It is highly recommended to use SolrCloud with at least two Solr nodes running on different
servers with replication enabled. You can use the information in this section to configure
additional SolrCloud instances.

Configuration Options

Ambari Infra Managed Solr (default) -- Audits to Solr defaults to use the shared
Solr instance provided under the Ambari Infra service. There are no additional
configuration steps required for this option. SolrCloud, which is deployed with the
Ambari Infra service, is a scalable architecture which can run as a single node or
multi-node cluster. This is the recommended configuration for Ranger. By default, a
single-node SolrCloud installation is deployed when the Ambari Infra Service is chosen
for installation. Hortonworks recommends that you install multiple Ambari Infra Solr
Instances in order to provide distributed indexing and search for Atlas, Ranger, and
LogSearch (Technical Preview). This can be accomplished by simply adding additional
Ambari Infra Solr Instances to existing cluster hosts by selecting Actions > Add
Service on the Ambari dashboard.

Externally Managed SolrCloud -- You can also install and manage an external
SolrCloud that can run as single or multi-node cluster. It includes features such as
replication and sharding, which are useful for high availability (HA) and scalability.
With SolrCloud, customers need to plan the deployment based on the cluster size.

Externally Managed Solr Standalone -- Solr Standalone is NOT recommended for
production use, and should be only used for testing and evaluation. Solr Standalone is a
single instance of Solr that does not require ZooKeeper.

Note

Solr Standalone is NOT recommended and support for this configuration will
be deprecated in a future release.

SolrCloud for Kerberos -- This is the recommended configuration for SolrCloud in
Kerberos environments.