Reduce tsidx disk usage

The tsidx retention policy determines how long the indexer retains the tsidx files that it uses to search efficiently and quickly across its data. By default, the indexer retains the tsidx files for all its indexed data for as long as it retains the data itself. By adjusting the policy to remove tsidx files associated with older data, you can set the optimal trade-off between storage costs and search performance.

The indexer stores tsidx files in buckets alongside the rawdata files. The tsidx files are vital for efficient searching across large amounts of data. They also occupy substantial amounts of storage.

For data that you are regularly running searches across, you absolutely need the tsidx files. However, if you have data that requires only infrequent searching as it ages, you can adjust the tsidx retention policy to reduce the tsidx files once they reach a specified age. This allows you to reduce the disk space that your indexed data occupies.

The tsidx reduction process eliminates the full-size tsidx files and replaces them with mini versions of those files that contain essential metadata. The rawdata files and some other metadata files remain untouched. You can continue to search across the aged data, if necessary, but such searches will exhibit significantly worse performance. Rare term searches, in particular, will run slowly.

To summarize, the main use case for tsidx reduction is for environments where most searches run against recent data. In that case, fast access to older data might not be worth the cost of storing the tsidx files. By reducing tsidx files for older data, you incur little performance hit for most searches while gaining large savings in disk usage.

Estimate the storage savings

Tsidx reduction replaces a bucket's full-size tsidx files with smaller versions of those files, known as mini-tsidx files. It also eliminates the bucket's merged_lexicon.lex file.

The full-size tsidx files usually constitute a large portion of the overall bucket size. The exact amount depends on the type of data. Data with many unique terms requires larger tsidx files. As a general guideline, the tsidx reduction process decreases bucket size by approximately one-third to two-thirds. For example, a 1GB bucket decreases in size to somewhere between 350MB and 700MB.

To make a rough estimate of a bucket's reduction potential, look at the size of its merged_lexicon.lex file. The merged_lexicon.lex file is an indicator of the number of unique terms in a bucket's data. Buckets with larger merged_lexicon.lex files have tsidx files that reduce to a greater degree, because of the greater number of unique terms.

The size of a mini-tsidx file is generally about 5% to 10% of the size of the corresponding original, full-size file. As mentioned earlier, however, the overall reduction in bucket size is less than that - typically, one-third to two-thirds. This is because, in addition to the mini-tsidx files, the reduced bucket retains the rawdata file and a number of metadata files.

How tsidx reduction works

When you enable tsidx reduction, you specify a reduction age, on a per-index basis. When buckets in that index reach the specified age, the indexer reduces their tsidx files.

The reduction process

The tsidx reduction process runs, by default, every ten minutes. It checks each bucket in the index and reduces the tsidx files in any bucket whose most recent event is at least the specified reduction age.

The reduction process runs on only a single bucket at a time. If multiple buckets are ready for reduction, the process handles them sequentially.

The reduction process is fast. For example, when running on a 1GB bucket, it typically completes in just a few seconds.

Once a tsidx file is reduced, it stays reduced. If you disable the tsidx reduction setting or increase the reduction age, the change affects only buckets that are not already reduced. If necessary, however, there is a way to convert reduced buckets back into buckets with full tsidx files. See Restore reduced buckets to their orginal state.

Effect of reduction on bucket files

The tsidx reduction process eliminates the full-size tsidx files from each targeted bucket after replacing them with mini versions that contain only essential metadata. The mini-tsidx file consists of the header of the original tsidx file, which contains metadata about each event. In addition, tsidx reduction eliminates the bucket's merged_lexicon.lex file.

The bucket retains its rawdata file, along with the mini-tsidx files and certain other metadata files, including the bloomfilter file.

Full size tsidx files have a .tsidx filename extension. Mini-tsidx files use the .mini.tsidx extension.

Note: The full-size version of the tsidx file gets deleted only after the mini version has been created. This means that the bucket will briefly contain both versions of the file, with the commensurate increase in disk usage.

Effect of reduction on in-progress searches

If a search is in progress on a particular bucket that qualifies for tsidx reduction, the reduction for that bucket will be delayed until the search on the bucket completes. The mini-tsidx files will be created but deletion of the full-size files will await the search completion.

Note: If the indexer is performing a search that ranges across multiple buckets, including one that is ready for reduction, reduction of the bucket might complete before the search reaches it. As expected, when the search does reach the reduced bucket, it will run slowly on that bucket.

Searches across reduced buckets

Once a bucket has undergone tsidx reduction, you can run searches across the bucket, but they will take much longer to complete. Since the indexer searches the most recent buckets first, it will return results from all non-reduced buckets before it reaches the reduced buckets.

When the search hits the reduced buckets, a message appears in Splunk Web to warn users of a potential delay in search completion: "Search on most recent data has completed. Expect slower search speeds as we search the minified buckets."

A few search commands do not work with reduced buckets. These include tstats and typeahead. A warning is added to search.log if such a search touches a reduced bucket: "The full buckets will return results and the reduced buckets will return 0 results." In addition, for the tstats
command only, the following message appears in Splunk Web: "Reduced buckets were found in index={index}. Tstats searches are not supported on reduced buckets. Search results will be incorrect."

Note: Tsidx reduction does not touch tsidx files for accelerated data models, which are maintained in their own directories, separate from the index buckets. Therefore, tstats commands that are restricted to an accelerated data model will continue to function normally and are not affected by this feature.

Configure the tsidx retention policy

By default, the indexer retains all tsidx files for the life of the buckets. To change the policy, you must enable tsidx reduction.

You can also change the tsidx retention period from its default of seven days. A bucket gets reduced only when all events in the bucket exceed the retention period.

Configure in indexes.conf

You can enable tsidx reduction by directly editing indexes.conf. You can enable reduction for one or more indexes individually or for all indexes globally.

To enable tsidx reduction for a single index, place the relevant attributes under the index's stanza in indexes.conf. For example, to enable reduction for the "newone" index and to set the retention period to 10 days:

Performance impact when you first enable tsidx reduction

Once you enable tsidx reduction, the indexer begins to look for buckets to reduce. It reduces all buckets that exceed the specified retention period. The indexer reduces only one bucket at a time, so performance impact should be minimal.

Determine whether a bucket is reduced

Run the dbinspect search command:

| dbinspect index=_internal

The tsidxState field in the results specifies "full" or "mini" for each bucket.

Tsidx reduction and indexer clusters

An indexer cluster runs tsidx reduction in the same way, and according to the same rules and settings, as a standalone indexer. However, since only searchable bucket copies have tsidx files to begin with, reduction only occurs on searchable copies. With tsidx reduction enabled, a searchable bucket copy can contain either a full-size or a mini tsidx file, depending on the age of the bucket.

You must push changes to the tsidx reduction settings by means of the configuration bundle method. This ensures that all peer nodes use the same settings. Tsidx reduction then occurs at approximately the same time for all searchable copies of a reduction-ready bucket, no matter what peers they reside in.

If, post-reduction, the cluster must convert a non-searchable copy of a reduced bucket to searchable to meet the search factor, there are two ways that the conversion can proceed:

If another searchable copy of the bucket exists in the cluster, the cluster will stream that copy's mini-tsidx files to the non-searchable copy. When streaming is complete, the copy is considered searchable.

If no other searchable copy of the bucket exists, the cluster has no mini-tsidx files available for streaming to the non-searchable copy. In that case, the cluster must first build full-size tsidx files from the non-searchable copy's rawdata file and then reduce the full-size files. There is no way to create mini-tsidx files directly from a rawdata file.

For more information on how an indexer cluster makes non-searchable copies of a bucket searchable, see Bucket-fixing scenarios.

Restore reduced buckets to their original state

You cannot restore reduced buckets to their original state merely by increasing the age setting for tsidx reduction. That setting does not affect buckets that have already been reduced.

Instead, to revert a bucket with mini-tsidx files to full-size tsidx files:

1. Stop the indexer.

2. In indexes.conf, either disable tsidx reduction or increase the age setting for tsidx reduction beyond the age of the buckets that you want to restore. Otherwise, the bucket will be reduced for a second time soon after you revert it.

Enter your email address, and someone from the documentation team will respond to you:

Send me a copy of this feedback

Please provide your comments here. Ask a question or make a suggestion.

Feedback submitted, thanks!

You must be logged into splunk.com in order to post comments.
Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic.
If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk,
consider posting a question to Splunkbase Answers.

0
out of 1000 Characters

Your Comment Has Been Posted Above

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »