A name is optional. It is used to map an encoded id into a readable label that
is used in the Admin tool Indices tab and, as of version 2.9, when requested,
is returned in the indexValues query response. This allows a code to be
used to index and then to translate that code to a human-readable value.

Note that parent elements don’t have to exist in your dataset; they
can be used purely to create structure. In the above example, let’s
assume that Asian exists solely as a grouping element, in other words
the Asian value doesn’t exist in the dataset but we want to use it to
create structure. If someone performs a search for Vietnamese,
Vietnamese users are returned first, followed by Japanese and Chinese
(in no particular order) and then all other ethnicities. The id of
elements that exist solely as parents is irrelevant as long as it
doesn’t conflict with any other element’s id.

As of version 2.9, tree dimension searches can benefit from the engine’s highlighting feature. You
can request that the engine highlight all matches against your tree dimension text and
present those matches in a clear fashion. For more information on how to specify
highlighting in the query request, refer to Overview and
Highlighting Criterion.

If you have a dimension whose elements have little or no discernible
structure, you can use the tree type as well. For example:

A dimension whose values are mutually exclusive of one another.
For example, gender could be considered mutually exclusive (a search
for “Male” would explicitly exclude items associated with “Female”).

String values can be represented by keyword type dimensions. Keyword dimensions are
best used for standardized single word values or tags. When they are indexed, the whole
string is used and no further textual analysis is done.

<dimensionid="us_state"type="keyword"/>

Unless explicitly disabled, each distinct value in a keyword type dimension
is available for faceting unless maxBuckets is hit, otherwise there is no
limit to the number of facet values.

For more information on faceting with keyword type dimensions refer to Faceting.

maxBuckets = "0"

This attribute will limit the number of facets that can be created for the dimension. If there are more
buckets than that number, all facet counts will be removed. To disable
any facet counts, set maxBuckets to 0.

The following attribute will impact how the values in your time dimension
are indexed.

format= "unix" | "<format specification>"

Time dimension values are converted from a date/time format specification.
To specify the format of the data, use the format attribute.
If your date format does not include a time zone, then you are strongly encouraged to explicitly
specify a timeZone on your dimension element.
The only limitation to the type of format to use is that the format should not include a space.
This restriction may be lifted in a later release.

Specify “unix” if your time values are in Unix date/time format, otherwise compose
a date/time format string by referencing the following table.

To include a string in the format, enclose the string with single quotes (‘).

Default: Default US local formatting.

Letter

Date or Time Component

Presentation

Examples

G

Era designator

Text

AD

y

Year

Year

2010; 78

M

Month in year

Month

July; Jul; 07

d

Day in month

Number

10

a

Am/pm marker

Text

PM

H

Hour in day (0-23)

Number

0

k

Hour in day (1-24)

Number

24

K

Hour in am/pm (0-11)

Number

0

h

Hour in am/pm (1-12)

Number

12

m

Minute in hour

Number

30

s

Second in minute

Number

55

S

Millisecond

Number

978

z

Time zone

General time zone

Eastern Standard Time; EST; GMT-05:00

Z

Time zone

RFC 822 time zone

-0800

timeZone="<timezone specification>"

In general, we recommend always specifying a timeZone.

If your format is not unix and does not include a time zone (“Z” or “z”), then you should provide a timeZone
to use when indexing or querying dates and times. Note that when you index a date such as “2013-01-01”, the time of 00:00 is implicit.
Because of this, timeZone is relevant event if you are only indexing dates.

The time zone UTC is used if you do not specify a timeZone.

Some example time zones are UTC, GMT+1, America/New_York, US/Eastern, US/Pacific.

In the above example changesets would contain, for each item,
properties named “user.longitude”, “user.latitude”, and
“user.zipcode”. The property names default to “latitude”, and
“longitude”, and “zipcode” if left unspecified. The longitude and
latitude values in the changeset must be specified in degrees.

The engine allows for several kinds of text searches, keyword and
text. A text dimension can be searched
using full free text searching capabilities. On the other hand, a keyword
dimension can only be used for case-sensitive exact string matches, but
avoids the overhead incurred in a full text search.

For example, a changeset property which has a known set of constant values such
as US state abbreviations (‘NY’, ‘MA’, ‘CA’, ‘AK’, etc.) would fit nicely
into a keyword dimension, whereas a paragraph containing text written
by a user about herself would fit into a text dimension.

Text dimension searches can benefit from the engine’s highlighting feature. You
can request that the engine highlight all matches against your search text and
present those matches in a clear fashion. For more information on how to specify
highlighting in the query request, refer to Overview and
Highlighting Criterion.

In addition to using keyword dimensions for single constant values, it can also
be applied to data with multiple values. In this case, an optional delimiters
attribute can be used to determine how the underlying data gets tokenized. In the
following example, the data from the changeset is tokenized by splitting the
text on ‘,’.

<dimensionid="tags"type="keyword"delimiters=","/>

For more information about text dimensions in general, including features related
to internationalization, refer to About Text Dimensions.

The following attributes will impact how the text in your dimension is indexed.

noAnalysis-ref= "<id of word set>"

If certain words should be excluded from the analysis process, then a customer
can create list of words to be excluded. For more information on
creating a word set for stemming exclusion, refer to Defining Word Sets.
For more information on the text analysis process, refer to Analysis.

New in version 2.8.6.

stemming= "true" | "false"

Stemming is enabled by default for all Text dimensions. Stemming is the process of
finding the root of all familiar words such that full text searches return more
matches. For example, “dog”, “dogs”, “doggy” might all be converted to “dog” when
stemming is enabled. For more exact matching, set stemming to “false”. For more information,
refer to Stemming.

stemmingExclusion-ref= "<id of word set>"

If stemming is enabled, then a customer can define a list of words that should
be excluded from stemming. For more information on
creating a word set for stemming exclusion, refer to Defining Word Sets.

New in version 2.8.6.

accentFolding = "true" | "false"

Accent folding in enabled by default for all text dimensions but can also
be set for keyword dimensions as well. Accent folding only applies to certain
types of Unicode encoding systems with specially accented characters. Accent
folding can help improve matches.

For example, when accent folding is enabled, a property with value “Montréal” can be
found using a criteria value of “Montreal”. Conversely, a criteria value of “Töt”
could find a property value of “Tot”.

For most English language users, accent folding may not be required. For more exact
matching, set accentFolding to false.

Stop words are commonly occurring words such as pronouns, articles and conjunctions
that may serve to muddy or bloat a full-text search index. When indexing large blocks
of text, including stop words may help improve text matching, particularly for phrase
matching.

To not use any stop words, specify stopWords="".

For more information on stop words, refer to Stop Words and for more information on
creating a word set for stop words, refer to Defining Word Sets.

fieldPositionIncrementGap = "<position increment gap>"

When the indexer combines changeset properties listed in the dimension’s key
attribute, it separates the text in each property in an attempt to avoid
proximity phrase matches on words at the end of one changeset property with words
at the beginning of the next property.

There may be cases in which you do want proximity phrase matching to cross
changeset properties. For example, if you include both a city and state property
in a text dimension, if fieldPositionIncrementGap is set to 0, then
you could do a proximity search on “Salem, Oregon”.

Default: 100.

New in version 2.8.6.

synonyms-ref = "<id of thesaurus to use for synonym dictionary>"

This attribute will determine the optional synonym dictionary to use when indexing text.

Set this attribute to “true” to enable the Did You Mean? query suggestion feature. When
enabled, the query response will include a didYouMean field that may include suggested
queries in response to what the user searched for.

Default: false.

New in version 2.9.

didYouMeanDictionary = "<Comma-delimited Stop Words>"
didYouMeanDictionary-ref = "<id of word set to use for dictionary>"

If didYouMean has been enabled, this attribute will determine the optional
spelling correction dictionary to use as part of the Did You Mean? suggestion
process.

For more information on creating a word set for a Did You Mean? dictionary,
refer to Defining Word Sets.

New in version 2.9.

makeParts = "words" | "numbers" | "both" | "none"

This attribute applies when the word delimiter tokenizer is used. It determines which
parts of words are generated from compound words. To generate word parts from compound
or delimited words, use words. To generate number parts out of compound words, use
numbers. To generate both word and number parts, use both.

Default: words.

concatParts = "words" | "numbers" | "both" | "none"

This attribute applies when the word delimiter tokenizer is used. It determines which
parts of words are concatenated to form new words. For example, the product number
XMY-MD-89-9001 could have three parts: 2 words and 2 numbers. Using value
words will create a XMYMD and using value numbers will generate 899001.

Use attribute concatenateAll to concatenate both number and word parts into a single word.

Default: words.

New in version 2.9.

concatenateAll = "true" | "false"

This attribute applies when the word delimiter tokenizer is used. It determines which
parts of words are concatenated to form new words. For example, the product number
XMY-MD-89-9001 could have 2 words and 2 numbers. With concatenateAll enabled
the indexer will create``XMYMD899001`.

Default: true.

New in version 2.9.

splitParts = "case" | "numbers" | "both" | "none"

This attribute applies when the word delimiter tokenizer is used. To enable or
disable splitting words at case changes or letter-number transitions
use this attribute. With case transitions, McDonald becomes Mc and
Donald.

Default: case.

New in version 2.9.

stemEnglishPossessive = "true" | "false"

This attribute applies when the word delimiter tokenizer is used. When it is
enabled, then O'Neil's becomes O'Neil.

Default: true.

New in version 2.9.

stripHtml = "true" | "false"

If the text for the dimension includes HTML tags, this attribute will remove HTML tags
before indexing. Character entities are also converted to UTF-8.

This attribute is used to apply a phonetic algorithm during the analysis phase of indexing.
For more information about phonetic algorithms, refer to Phonetic Analysis.

New in version 2.8.3.

ignoreFieldLength = "true" | "false"

In order to adjust internal relevance adjustments, if the length of the field is to be ignored
at query time, this attribute must be set to true. The default is false.

New in version 2.8.2.

ignoreInverseDocumentFrequency = "true" | "false"

This attribute is used to define the default value for query relevance adjustments. Its presence
does not change index behavior. If the frequency of terms (in the universe of values
for the dimension) is low, then the engine will automatically increase relevance in proportion to
the infrequency. This value can be overridden as part of a query criterion.

Default false for releases < 3.7

Default true for releases >= 3.7

New in version 2.8.2.

tokenizer = "wordDelimiter" | "standard" | "whitespace" | "smartcn"

Determines which tokenizer to use when analyzing the text. For more information, refer to
Tokenization.

Default: wordDelimiter

New in version 2.8.6.

A text dimension also allows changeset properties to be assigned to specific
fields. At query time, the fields can be individually searched and their contributing
relevancies adjusted on a field-by-field level.

The field element is functionally similar to a text dimension element. All of the attributes
that apply to a textdimension element can be used. Any attributes appearing on the parent
dimension element automatically apply to all field elements, unless overridden on the
individual field element itself.

A dimension whose sole purpose is to be used in a grouped query. The id or values of the
indexed data is used to group the results of a query. In most aspects, a groupBy dimension
is close to a keyword type dimension.

To use a groupBy dimension, create a dimension on the values you want to group by. When the
engine executes the query, it will group the results by the values in the associated groupBy dimension.

The query response can include property data for the grouped by value as well as property data for the best matches
items within each groupBy group.

NOTE:groupBy dimensions and queries are only supported in a single-server configuration.
There is currently no support for multi-server support.

GroupBy is particularly useful for real estate solutions for New Homr or Rental Communities in
which a customer searches for a particular floor plan or model but the results are
shown by grouping the results into a particular Community, Complex or other Development.

<dimensionid="floorplans"type="groupBy"key="floorplan_id"/>

indexesItemId = "true"

Indicates that the groupBy changeset key is an item id. When this attribute is enabled,
the query result will include changeset properties for the groupBy value since it
has been identified as an item id.

Default: false

legacyGroupBy = "true"

Indicates the query response for this group by should use the format prior to version 3.0.
This attribute exists for use by customers who were using groupBy prior to version 3.0 and
wish to upgrade their engine without being required to upgrade their query processing logic.