Performance degraded in a search head pooling environment

Symptoms

In a pool environment, you notice that searches are taking longer than they used to. How do you figure out where your performance degradation is coming from? This topic suggests a few tests you can run.

Time some simple commands

Try some basic commands outside of Splunk Enterprise. If either of these operating system commands takes more than ten or so seconds to complete, it indicates an issue on the shared storage.

On the search head, in the pooled location, at the *nix command line,

time find /path/to/pool/dir | wc -l

measures the time to find the things in .../dir and then count them.

Another simple command to try is:

time ls -lR /path/to/pool/dir | wc -l,

which measures how long it takes to count items in the pool.

If you do not have shell access, other tests you can run include:

logging in (which uses a shared token)

accessing knowledge objects.

Compare searches in and out of search head pooling

Run a simple search (for example, index=_internal source=*splunkd.log | tail 20) with and without search head pooling enabled. Compare the timings.

Use Splunk Enterprise log files

In splunkd.log
searchstats

index=_internal source=*splunkd_access.log NOT rtsearch spent>29999

any search taking over 30 seconds to return is a slow search.

If

the only slow things are searches (but not, for example, bundle replication), then your problem might be with your mount point. Run some commands outside of Splunk Enterprise to validate that your mount point is healthy.

accessing knowledge objects takes a long time, search in metrics.log for the load_average:

index=_internal source=*metrics.log load_average

look in metrics for 2-5 minutes before and after the duration of the slow-running search

If you see this is high, and you have SoS installed, refer to the same period of time and look at the CPU graphs on SoS to make sure you're not seeing a system load.

If the problem is with the mount point, the box is not going to be challenged.

If the problem is with the search load, the CPU usage will be high for the duration of the slow search.

Is it a search load problem?

Start turning off field extractions. Is it still slow?

Next turn off real-time all-time and wildcards in your searches.

If you have the Splunk on Splunk app, check the search load view. If you have the Distributed Management Console, check the Search Activity views.

Consider search scheduling. Have you scheduled many searches to run at the same time? Use the Distributed Management Console Search Activity view to identify search scheduling issues. If you've identified issues, move some of your scheduled searches to different minutes past the hour.

Enter your email address, and someone from the documentation team will respond to you:

Send me a copy of this feedback

Please provide your comments here. Ask a question or make a suggestion.

Feedback submitted, thanks!

You must be logged into splunk.com in order to post comments.
Log in now.

Please try to keep this discussion focused on the content covered in this documentation topic.
If you have a more general question about Splunk functionality or are experiencing a difficulty with Splunk,
consider posting a question to Splunkbase Answers.

0
out of 1000 Characters

Your Comment Has Been Posted Above

We use our own and third-party cookies to provide you with a great online experience. We also use these cookies to improve our products and services, support our marketing campaigns, and advertise to you on our website and other websites. Some cookies may continue to collect information after you have left our website.
Learn more (including how to update your settings) here »