What’s new for Search in DSE 5.1

I am pleased to announce the general availability of DataStax Enterprise (DSE) 5.1 as of April 18th, 2017. We are especially excited about this release for DSE Search which is built on top of the best distribution of Apache Cassandra™. Let me provide a quick tour of some of the enhancements and features found in DSE Search 5.1. For the 5.0 release, there was a major focus on improving performance and stability as well as eliminating complexity for users. We are continuing with those themes.

First and foremost for DSE Search, 5.1 delivers an upgraded, production certified version of Apache Solr 6™. We skipped the Solr 5.x line completely and instead integrated Apache Solr 6.0.1™.

Component upgrades are important for various reasons including new functionality. For DSE, while features were one of the important reasons, the bigger drive for this upgrade is to incorporate a number of improvements, bug fixes and optimizations. The Solr upgrade certainly does deliver that for DSE Search. Along with our own improvements and bug fixes, DSE 5.1 has garnered considerable performance improvements across the board, from querying to indexing.

One of the Solr features that does warrant highlighting is the new JSON Facet API for what was formally facets along with the StatsComponent. Introduced in Apache Solr 5 , the new & re-architected API for performance allows users to easily execute aggregation queries to build statistical analysis style search queries in intuitive JSON format. This functionality is available through Solr’s native HTTP API in DSE Search. Traditional facet searches are still supported through CQL as well but they maintain the previous and simpler API better suited for situations like product catalogue groupings.

Arguably, the most exciting feature that comes to us in DSE Search 5.1 is the CQL based search index management. Both DSE Search and Apache Solr users know that there are configuration files involved with building a Solr Core or DSE Search index. These configurations define the behavior, functionality and even performance of your search capabilities.

As we work to make search functionality more native to the DSE platform, managing a CQL table’s search index is a great place to start. With DSE 5.1, not only can configuration files be inferred and automatically generated for you, but modifying your index configuration and schema is much easier through the new CQL integration.

Instead of discussing all of the rich functionalities available, let’s walk through an example. Let’s create a search index on an existing CQL table to provide more flexible simple boolean queries but not full-text capabilities.

Starting with this simple CQL schema:

CREATE TABLE amazon.metadata (

asin text PRIMARY KEY,

also_bought set<text>,

buy_after_viewing set<text>,

categories set<text>,

imurl text,

price double,

title text

);

We’ll begin by creating a default search index on this table.

cqlsh:> CREATE SEARCH INDEX IF NOT EXISTS ON amazon.metadata;

We can validate what this command does by executing a CQL DESCRIBE on the table.

We can see a few things have happened with our simple CREATE command. We’ve generated a Solr configuration file and a Solr schema file inferred from our CQL DDL. We’ve posted the files to Solr and created the Core and we’ve also issued a indexing command to index our current data. As you can see, the process would have been much more complex without the CQL command.

At this point, the table is configured for full-text search and any data inserted into DSE will be indexed as well. This is a very nice way to get up and running but the index configuration is performing more functionalities than the use case requires. Index functionality directly affects the storage requirements. To do more, you will simply need to store more in terms of the data in your search index. By re-configuring our search index to only provide basic indexing functionality, we can reduce the storage requirements as well as increase the indexing performance of our system.

Consider a scenario where you want to leverage DSE Search for basic indexing for boolean queries instead of full-text search. Let’s walk through an example of setting up a more advanced search configuration using the new CQL syntax.

Here, we’ve created a new search index using one of the available index profile options to reduce the index data size as much as possible since our use case does not require text-analysis, phrase searches or even joins. Next, we’ve enabled docValues for all of our indexed fields to greatly improve our sorting & even faceting performance using the column options.

So far so good but we can do more! Now let’s configure this index to be a real-time (RT) indexing table. RT indexing is a feature introduced in DSE 4. for high throughput, low latency searches. To enable live indexing, first we set the config option to true using a shortcut directive.

Validating our changes reveals we have successfully configured this CQL table for an optimized search index to provide boolean CQL queries on any field defined in our search schema, the entire configuration can be done in a matter of a few minutes. To see our pending changes, we need to run a command to get the PENDING configuration as opposed to the current and ACTIVE configuration.

When we’re satisfied with our changes and ready to apply the new configuration, we will need to issue a RELOAD command to the index to apply the configuration and/or schema as the new ACTIVE configuration.

cqlsh> RELOAD SEARCH INDEX ON amazon.metadata;

Similarly, if there are schema changes, we will need to issue a REBUILD command to rebuild the index to the new configuration.

cqlsh> REBUILD SEARCH INDEX ON amazon.metadata;

This step was not required as part of our changes since we dropped the index earlier and rebuilt it with the profile options. We can now verify the hand build configuration is applied to the active index.

cqlsh> DESCRIBE ACTIVE SEARCH INDEX CONFIG ON amazon.metadata;

Executing a few queries shows that we are able to execute a query on any column but with a strict lookup versus full-text search.