DataStax Enterprise 5.0 Analytics includes integration with Apache Spark. Starting with this version Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are also
deprecated.

DataStax Enterprise 5.0 Analytics includes integration with Apache Spark. Starting with this version Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are also
deprecated.

Use DSE Analytics to analyze huge databases. DSE Analytics includes integration with Apache Spark. BYOH (bring
your own Hadoop) and DSE Hadoop are deprecated for use with DataStax Enterprise and will be removed in DataStax
Enterprise 5.1.

You can run analytics on Cassandra data using Hadoop that is integrated into DataStax Enterprise. The Hadoop component
in DataStax Enterprise enables analytics to be run across the DataStax Enterprise distributed, shared-nothing architecture.
Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are also
deprecated.

Configuring the Spark history server

Load the event logs from Spark jobs that were run with event logging enabled.

The Spark history server provides a way to load the event logs from Spark jobs that
were run with event logging enabled. The Spark history server works only when files
were not flushed before the Spark Master attempted to build a history user
interface.

Procedure

To enable the Spark history server:

Create a directory for event logs in the Cassandra file system (CFS):

dse hadoop fs -mkdir /spark/events

On each node in the cluster, edit the spark-defaults.conf
file to enable event logging and specify the directory for event logs:

#Turns on logging for applications submitted from this machine
spark.eventLog.dir cfs:/spark/events
spark.eventLog.enabled true
#Sets the logging directory for the history server
spark.history.fs.logDirectory cfs:/spark/events

Start the Spark history server on one of the nodes in the cluster:

The Spark history server is a front-end application that displays logging
data from all nodes in the Spark cluster. It can be started from any node in
the cluster.

dse spark-history-server start

Note: The Spark Master web UI does not show the historical logs. To work around
this known issue, access the history from port 18080.

When event logging is enabled, the default behavior is for all logs to be
saved, which causes the storage to grow over time. To enable automated cleanup
edit spark-defaults.conf and edit the following options: