Welcome to Splunk Answers, a Q&A forum for users to find answers to questions about deploying, managing, and using Splunk products. Contributors of all backgrounds and levels of expertise come here to find solutions to their issues, and to help other users in the Splunk community with their own questions.

This quick tutorial will help you get started with key features to help you find the answers you need. You will receive 10 karma points upon successful completion!

Refine your search:

Is the Transaction command suitable for large volumes of data and what is the benefit of using this command?

1

After reading various questions/answers on the topic and the relevant Splunk documentation I am still unsure whether the Transaction command is suitable for large volumes of data (say for example 500,000+ events across two sources).

When I have tested this command on my data sources, I notice that it is only seemingly able to process a portion of the search, with the remaining time period returning zero results.

Would this be due to a high demand on the server resources being required?

Related question:

Does the Transaction command actually concatenate the raw data + fields from various events and combine them into a single event in your search results?

What is the benefit of using this command - how does it assist in aggregating related events in terms of the data returned?

Is it not just as feasible to use the stats command to display your related event data (combined with other eval commands such as coalesce)?

People who like this

3 Answers

Transaction is just one of several event correlation mechanisms that SPL offers. However, application of transaction is use case specific. You can choose between stats, joins, append, appendcols, lookup, transaction, sub-query. Refer to following documentation on choosing between various correlation mechanisms.

In your case transaction is dropping events because there are too many events to correlate. You can add keepevicted=true however, query performance will degrade as Splunk will not mark any event as orphaned unless search completes.

In your case stats might work out better. Try to provide the sourcetypes in base search so that events are brought together and then aggregate based on your common fields. If you can provide sample or mocked data example from both sources then, we would be able to assist you further.

Transaction command is very heavy on resources. So the first thing is how much processing power you have. Yes, it can be used on large amount of data but you need to carefully examine your data and its behaviour and then select the Transaction Definition options including startswith, endswith, maxspan, maxevents, maxpause, keeporphans, unifyends etc;

The main problem with transaction is that the work must be done on the search head which means you have 2 very important downsides. First, there is no map-reduce; the indexers, instead of sharing the processing load and reducing the size of the partial results instead are merely fileservers for the data. Second, even more important, nothing can be discarded along the way which means that as you build aggregate events out of the raw events, you have an incredible explosion of RAM consumption on the search head. Exhausting RAM is what causes transaction to give up and the worst part is that it does so with no direct indication of it (silently). The good news is that you can almost always use stats instead. Check out Nick Mealy's Virtual Conf session (March 2016) here: