DSE Search simplifies using search applications for data that is stored in a Cassandra database. DSE Search is an
enterprise grade search solution that is scalable to work across multiple datacenters and the cloud. DSE Search integrates
Solr to manage search indexes with a persistent store.

DSE Search architecture

In a distributed environment, such as DataStax Enterprise and Cassandra, the data is spread
over multiple nodes. Deploy DSE Search nodes in their own datacenter to run DSE Search on all
nodes.

Data is written to Cassandra first, and then Cassandra updates indexes:

When you update a table using CQL, the Solr document is updated. Indexing occurs
automatically after an update. Writes are durable. All writes to a replica node are recorded
in memory and in a commit log before they are acknowledged as a success. If a crash or server
failure occurs before the memory tables are flushed to disk, the commit log is replayed on
restart to recover any lost writes.

DSE Search terms

In DSE Search, there are several names for an index of documents on a single node:

A search core

A collection

One shard of a collection

How DSE Search works

Each document in a search core is unique and contains a set of fields that adhere to a
user-defined schema.

The schema lists the field types and defines how they should be indexed.

DSE Search maps search cores to Cassandra tables.

Each table has a separate search core on a particular node.

Solr documents are mapped to Cassandra rows, and document fields to columns.

A shard is indexed data for a subset of the Cassandra data on the local node.

The Cassandra keyspace is a prefix for the name of the search core and has no
counterpart in Solr.

The search request is routed to enough nodes to cover all token ranges.

The query is sent to all token ranges in order to get all possible results.

The search engine considers the token ranges that each node is responsible for,
taking into account the replication factor (RF), and computes the minimum number of
nodes that is required to query all ranges.

On DSE Search nodes, the shard selection algorithm for distributed queries
uses a series of criteria to route sub-queries to the nodes most capable of handling them.
The shard routing is token aware, but is
not limited unless the search query specifies a specific token range.

With Cassandra replication, a Cassandra node or search core contains more than one
partition (shard) of table (collection) data.

Unless the replication factor equals
the number of cluster nodes, the Cassandra node or search core contains only a portion
of the data of the table or collection.

DataStax Enterprise 5.0 Analytics includes integration with Apache Spark. Starting with this version Hadoop is deprecated for use with DataStax Enterprise. DSE Hadoop and BYOH (Bring Your Own Hadoop) are also
deprecated.

DSE Search simplifies using search applications for data that is stored in a Cassandra database. DSE Search is an
enterprise grade search solution that is scalable to work across multiple datacenters and the cloud. DSE Search integrates
Solr to manage search indexes with a persistent store.

Changing a Solr type mapper is rarely if ever done and is not recommended; however, for particular circumstances, such as
converting the Solr LongField to TrieLongField, configure the dseTypeMappingVersion using the force option.

DSE Search supports the stored=false copyField directive in the schema.xml file. Ingested data is copied by the copy field mechanism to the destination field for search, but is not stored
in Cassandra.

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.