DSE Graph and Graph Analytics

DSE Graph allows you to perform OLAP queries using Spark.

Many local graph traversals can be executed in real time at high transactional loads. When the
density of the graph is too high or the branching factor too large (the number of connected nodes
at each level of the graph), the memory and computation requirements to answer OLTP queries go
beyond what is acceptable under typical application workloads. These type of queries are called
deep queries.

Scan queries are queries that touch either an entire graph or large parts of the
graph. They typically traverse a large number of vertices and edges. For example, a query on a
social network graph that searches for posts by users between 25 and 40 years old is a scan
query.

For applications that use deep and scan queries, using a OLAP query will result in better
performance.

Performing OLAP queries using DSE Graph

Every graph created in DSE Graph has an OLAP traversal source a that is
available to gremlin-console and DataStax Studio. This traversal source
uses the SparkGraphComputer to analyze queries and execute them against
the underlying DSE Analytics nodes. The nodes must be started with Graph and Spark enabled
to access the OLAP traversal source. For one-off or single-session OLAP queries, alias
database.a to g and create the
query. For example in the Gremlin console:

gremlin> :remote config alias g database.a
gremlin> g.V().count()

If you are performing multiple queries against different parts of the graph, use
graph.snapshot() to return an OLAP traversal source for each part of the
graph. The returned OLAP traversal source is a persisted RDD. For example, in the Gremlin
console:

gremlin> categories = graph.snapshot().vertices('category').create()

When to use analytic OLAP queries

On large graphs, OLAP queries typically perform better for deep queries. However, executing
deep queries as part of an OLTP load may make sense if they are rarely performed. For
example, on online payment provider will favor OLTP queries to process payments quickly, but
may require a deep query if there are indications of fraud in the transaction. While the
deep query may take much longer as an OLTP workload, on the whole the performance of the
application will be faster than segmenting the application into OLTP and OLAP queries.

Long running and periodic processes like recommendation engines and search engines that
analyze an entire graph are the ideal use cases for OLAP queries. However, one-off data
analysis operations that involve deep queries or that scan the entire database also can
benefit from being run as OLAP queries. See DSE Graph, OLTP and OLAP for detailed
information on performance differences between OLTP and OLAP queries.

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.