Welcome to Splunk Answers, a Q&A forum for users to find answers to questions about deploying, managing, and using Splunk products. Contributors of all backgrounds and levels of expertise come here to find solutions to their issues, and to help other users in the Splunk community with their own questions.

This quick tutorial will help you get started with key features to help you find the answers you need. You will receive 10 karma points upon successful completion!

All of a sudden today (after it has been running fine for many months) these searches started returning zero results and thus the the Dashboard showed no values and the Alert triggered! The Application team(For the application we're using Splunk to monitor) told me their application was running just fine when I checked with them to see if the alert was correct in reporting the application down. On performing further checks, it turns out the Alert was actually false. I realized the problem only happens when I specify the time range using (earliest=-60m@h latest=@h) or (earliest=-2m@m and latest=-1m@m) on the search app. I ultimately decided to restart the Splunk server but this did not seem to resolve the issue as the problem persisted for some 20 minutes after the server restart. This false alerts lasted about 1 hour and a half hour then the problem auto-magically resolved itself!

Any one ever experience this odd phenomena? Any ideas on what may have suddenly caused this strange behavior and how I can prevent it from happening in the future?

People who like this

Just to be clear, are you saying that the original source data kept arriving during this time? Is it not the case that you were missing events? Since you're relying on UDP its quite possible that the application could be available but no log data was received

Indeed the incoming data was still following when this odd phenomena took place. When I specified the time using the "Time Range Picker" and NOT the search field I could see the events for the desired time range. I'm not missing any events at all and UDP was not the culprit. Thank you very much for the swift response. Much appreciated.

Those results look alright though? thats just the latency and its pretty small - unless theres other bits you haven't pasted :) I have some UDP data and sometimes if theres a blip in the network then latency can be introduced, if Splunk searches before the data arrives then it will rightly think there is an issue.

There is in fact a "diff" value gap between 12h20 and 14h30 that I didn't show in the previous comment above. So there was an apparent loss of data in that time span as the reporting also reflect no results for that period. So perhaps as you suggested, UDP misbehaved or the network had some glitch during that moment. Thank you very much for helping get to the root cause of the problem. Tomorrow I'll try AGAIN to find out from networks if there is anything odd they picked up about the network during that time span that may have resulted in data not reaching the Splunk server.

1 Answer

Are you sure that the data arrived when it should have? My next thought would be to run something like sourcetype=blah | eval IndexTime=_indextime | eval diff=_time-IndexTime | timechart avg(diff) or something similar for your data over the hour that you experienced problems to see if the events did arrive, but perhaps were indexed later due to latency or some other hiccup

Ok, bad news! The problem is recurring now as I type. And the data is following in through UDP port 515 as reflected in the tcpdump output. So the problem is NOT the network dropping data packets as we had concluded. :(

This problem is becoming a daily pain. Done some log analysis and it turns out the problems happens when the following entry in the log happen. Unfortunately, it doesn't happen all the time. Help!=====================+10-22-2013 07:37:00.603 +0200 WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded - data_source="udp :515", data_host="172.17.100.75", data_sourcetype="f5ltm"10-22-2013 09:37:00.586 +0200 WARN AggregatorMiningProcessor - Breaking event because limit of 256 has been exceeded - data_source="udp =====================+