Collections

Fusion collections are Solr collections managed by Fusion. A Solr collection is a distributed index defined by a named configuration stored in ZooKeeper, with these properties:

Number of shards

Documents are distributed across this number of partitions.

Document routing strategy

How documents are assigned to shards.

Replication factor

How many copies of each document in the collection.

Replica placement strategy

Where to place replicas in the cluster.

When you first install Fusion, a collection called "default" is created automatically.
You can view the simplest collection configuration by using the Collections API endpoint at http://localhost:8764/api/collections/default/, if you haven’t modified the default collection yet.

Solr is the underlying engine which indexes, stores, and searches your data.
Fusion manages Solr collections, manipulates data and queries before passing them to Solr, and provides analytics and monitoring features.

Primary and Auxiliary Collections

In Fusion, the "Primary" collection is the collection which contains your application data,
that is, the set of documents over which search and indexing happens.
Fusion registers the collection name and information about the
Solr cluster that manages this collection.

Note

All collection names should be considered to be case-insensitive,
even though Fusion preserves case in referring to these collections.

If your application uses Fusion’s signals, analytics, or monitoring services,
then Fusion will create a set of auxiliary collections in which to store
signals, query, and other logfiles.
Naming conventions relate auxiliary collections with the primary collection.
Auxiliary collections have the same base name as the name of the primary collection
plus a suffix which indicates the kind of auxiliary collection, e.g., the suffix
for a query logs auxiliary collection is "_logs" so that for a primary collection named "COLL",
Fusion creates an auxiliary collection named "COLL_logs".
These auxiliary collections include:

Do not create primary collections with names that end in suffix "_logs", "_signals", or "_signals_aggr".
Such names can only be used for Fusion auxiliary collections, which are created and managed by Fusion directly.

Fusion maintains a set of Solr collections which store Fusion’s own
logfiles and other internal information.
These are called System Collections, described below.

Note

Do not create primary collections named "logs", or which begin with "system_".
These names are reserved for Fusion system collections.

Fusion uses ZooKeeper to register information about all collections,
and the Fusion components and services related to a collection.
The Fusion components associated with a collection include:

system_metrics stores information about the running process itself,
such as the amount of memory in the system, the average response time for services, Solr heap size, etc.
The data is polled at regular intervals according to the internal configuration variable: com.lucidworks.apollo.metrics.poll.seconds.
This collection doesn’t appear until after the first set of metrics are collected.

Collection Configuration Properties

Collections have three configurable properties which are set to default values in the Fusion UI.
They can be configured as appropriate for your application
by creating the collection using the
Fusion API service Collections API.

Property

Description

signals

Property signals determines whether or not to create an auxiliary collections "_signals" and "_signals_aggr".
When creating a collection in the Fusion UI, this property defaults to true.
When creating a collection using Fusion’s API services, this property defaults to false.

searchLogs

Property searchLogs determines whether or not to create an auxiliary search query logs collection with suffix "_logs".
When creating a collection in the Fusion UI, this property defaults to true.
When creating a collection using Fusion’s API services, this property defaults to false.

dynamicSchema

Property dynamicSchema always defaults to false.
When dynamicSchema is true,
Fusion and Solr use schemaless mode
to administer search and indexing over that collection.

Signals are events with timestamps that can be used to improve search results.
For more information about signals in Fusion, see the section Signals.

Search logs data is used for Search Query Reporting.
The set of reports available includes most popular documents, queries that generated less than a minimum number of results, and search histograms.

The name schemaless mode is misleading: Solr always uses a schema when managing a collection.
In schemaless mode, if a document contains a field not currently in the Solr schema, Solr processes the field value to determine what the field type should be defined as, and then adds a new field to the schema with the field name and field type.
This behavior may be convenient during preliminary application development, but is rarely appropriate in a production environment, therefore the default is false.

Collection Profiles

Profiles are used to create pipeline aliases for a specific collection. In Fusion, index and query pipelines are not connected to a specific collection by default so that pipeline can be created once and re-used in several collections. This complicates the way that pipelines are used with collections. Profiles provide a shortcut.

Index Profiles work with index pipelines for getting content into the system.