Built-in optimization

The Splunk software includes built-in optimizations that analyze and process your searches for maximum efficiency.

One goal of these optimizations is to filter results as early as possible. Filtering reduces the amount of data to process for the search. Early filtering avoids unnecessary lookups and evaluation calculations for events that are not part of the final search results.

Another goal is to process as much as possible in parallel on the indexers. The built-in optimizations can reorder search processing, so that as many commands as possible are run in parallel on the indexers before sending the search results to the search head for final processing. You can see the reordered search using the Job Inspector. See Analyze search optimizations.

Predicate optimization

Predicates act as filters to remove unnecessary events as your search is processed. The earlier in the search pipeline that a predicate is applied, the more efficient is the use of system resources. If a predicate is applied early, when events are fetched, only the events that match the predicate are fetched. If the same predicate is applied later in the search, then resources are wasted fetching more events than are necessary.

There are several built-in optimizations that focus on predicates:

Predicate merge

Predicate pushdown

Predicate splitting

Types of predicates

There are two types of predicates, simple predicates and complex predicates.

A simple predicate is a single conditional of the form field = value.

A complex predicate is a combination of several conjunctions (AND and OR) and disjunctions (NOT). For example, Field1 = Value1 OR Field2 = Value2 AND Field3 = Value3.

A complex predicate can be merged or split apart to optimize a search.

In Splunk SPL, there are two commands that perform predicate-based filtering, where and search.

An example of using the where command for filtering is:

index="_internal" | where bytes > 10

An example of using the search command for filtering is:

index="_internal" | search bytes > 10

Search-based predicates are a subset of where-based predicates. In other words, search-based predicates can be replaced by where-based predicates. However, where-based predicates cannot be replaced by search-based predicates.

Predicate merge

The predicate merge optimization takes two predicates and merges the predicates into one predicate. For example, consider the following search:

index=main | search a > 10 AND fieldA = "New"

There are two search commands in this example. The search command before the pipe, which is implied at the beginning of every search, and the explicit search command after the first pipe. With the predicate merge optimization, the predicates in the second search command are merged with the predicates in the first search command. For example:

index=main AND a > 10 AND fieldA = "New"

In some cases, predicates used with the where command can also be merged. For example, consider the following search:

index=main | where fieldA = "New"

With the predicate merge optimization, the predicates specified with the where command are merged with the predicates specified with the search command.

Example of a simple predicate pushdown

The sourcetype=access* (status=401 OR status=403) portion of the search retrieves 50,000 events. The lookup is performed on all 50,000 events. Then the where command is applied and filters out events that are not src_category="email_server" . The result is that 45,000 events are discarded and 5,000 events are returned in the search results.

If the search criteria used with the where command is applied before the lookup, more events are filtered out of the results. Only 5,000 events are retrieved from disk before the lookup is performed.

With predicate pushdown, the search is reordered for processing. The where command is eliminated by moving the search criteria src_category="email_server" before the first pipe.

Example of a complex predicate pushdown

In this situation, the eval command is assigning a default value when a field is null or empty. This is a common pattern in Common Information Model (CIM) data models.

The built-in optimization reorganizes the search criteria before processing the search. The where command is moved before the eval command.

... | where x = "hello" | eval x=if(isnull(x) OR x=="", "missing", x)

Predicate splitting

Predicate splitting is the action of taking a predicate and dividing, or splitting, the predicate into smaller predicates. The predicate splitting optimizer can then move the smaller predicates, when possible, to an earlier place in the search.

Because field a is generated as part of the eval command, it cannot be processed earlier in the search. However, field x exists in the events and can be processed earlier. The predicate splitting optimization separates the components of the where portion of the search. The search is reordered to process eligible components earlier.

The portion of the where command x < 10 is moved to before the eval command. This move reduces the number of results that the eval command must process.

Projection elimination

This optimization analyzes your search and determines if any of the generated fields specified in the search are not used to produce the search results. If generated fields are identified that can be eliminated, an optimized version of the search is run. Your search syntax remains unchanged.

For example, consider the following search:

index=_internal | eval c = x * y / z | stats count BY a, b

The c field that is generated by eval c = x * y / z is not used in the stats calculation. The c field can be eliminated from the search.

Your search syntax remains:

index=_internal | eval c = x * y / z | stats count BY a, b

But the optimized search that is run is:

index=_internal | stats count BY a, b

Here is another example:

| tstats count FROM datamodel=internal_audit_logs

For buckets whose data models are not built, this expands to the following fallback search:

Event types and tags optimization

Event typing and tagging can take a significant amount of search processing time. This is especially true with Splunk Enterprise Security where there are many event types and tags defined. The more event types and tags that are defined in the system, the greater the annotation costs.

Transforming searches

When a transforming search is run without the built-in optimization, all of the event types and tags are uploaded from the configuration system to the search processor. This upload happens whether the search requires the event types and tags or not. The search processing attempts to apply the event types and tags to each event.

For example, consider the following search:

index=_internal | search tag=foo | stats count by host

This search needs only the tag foo. But without the optimization, all of the event types and tags are uploaded.
The built-in optimization solves this problem by analyzing the search and only uploading the event types and tags that are required. If no event types or tags are needed, none are uploaded which improves search performance.

For transforming searches, this optimization is controlled by several settings in the limits.conf file, under the [search_optimization::required_field_values] stanza. These settings are turned on by default. There is no need to change these settings, unless you need to troubleshoot an issue.

Data model acceleration searches

Another problem that can be solved by the event type and tags optimization is specifically targeted at data model acceleration. With a configuration setting in the datamodels.conf file, you can specify a list of tags that you want to use with the data model acceleration searches. You specify the list under the [dm.name] stanza with the tags_whitelist setting.

Analyze search optimizations

You can analyze the impact of the built-in optimizations by using the Job Inspector.

Determine search processing time

You can use the Job Inspector to determine whether the built-in search optimizations are helping your search to complete faster.

Run your search.

From the search action buttons, select Job > Inspect job.

In the Job Inspector window, review the message at the top of the window. The message is similar to "The search has completed and returned X results by scanning X events in X seconds." Make note of the number of seconds to complete the search.

Close the Job inspector window.

In the Search bar, add |noop search_optimization=false to the end of your search. This turns off the built-in optimizations for this search.

Run the search.

Inspect the job and compare the message at the top of the Job Inspector window with the previous message. This message specifies how many seconds the search processing took without the built-in optimizations.

View the optimized search

You can compare your original search with the optimized search that is created when the built-in optimizations are turned on.

Run your search.

From the search action buttons, select Job, Inspect job.

In the Job Inspector window, expand the Search job properties section and scroll to the normalizedSearch property. This property shows the internal search that is created based on your original search.

Scroll to the optimizedSearch property. This property shows the search that is created, based on the normalizedSearch, when the built-in optimizations are applied.

Optimization settings

By default, the built-in optimizations are turned on.

Turn off optimization for a specific search

In some very limited situations, the optimization that is built into the search processor might not optimize a search correctly. In these situations, you can troubleshoot the problem by turning off the search optimization for that specific search.

Turning off the search optimization enables you to determine if the cause of unexpected or limited search results is the search optimization.

You turn off the built-in optimizations for a specific search by using the noop command.

You add noop search_optimization=false at the end of your search. For example:

Enter your email address, and someone from the documentation team will respond to you:

Send me a copy of this feedback

Please provide your comments here. Ask a question or make a suggestion.

Feedback submitted, thanks!

You must be logged into splunk.com in order to post comments.
Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic.
If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk,
consider posting a question to Splunkbase Answers.

0
out of 1000 Characters

Your Comment Has Been Posted Above

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »