Pinned topicSAM high CPU usage

‏2013-10-14T20:41:32Z
|Tags:

Answered question
This question has been answered.

Unanswered question
This question has not been answered yet.

Two of our instances in two different servers were showing high cpu utilization, specifically the Streams Application Manager (SAM) pid. 70-80% CPU usage. These servers are getting data through the TCPSource operator and we were seeing some strange behavior compared to our other instances. The average data rate for this data source is about 200 tuples/sec. The weird pattern that we were seeing on those two servers was that sometimes the data rate would go down to 0 and stay there for a couple of minutes and then come back with a rate over 500 tuples/sec like if the data was being buffered in memory for some reason. The other servers that ingest this data source have never shown this behavior. Since the SAM pid CPU usage was high I went to look at the SAM log and this is the message that was being logged over and over again:

[instancename@instanceowner] [HostIPAddress] [:::AAS] CDISR2539E The session is not valid, or the session has expired.

I stopped one of the instances that were showing SAM high CPU usage and I got this error:

Stop instance instancename@instanceowner failed.

The operation failed. Exit code = 255

Refer to the Streams Studio Console for more information.

I used the force option and the instance finally stopped. I started the instance once again and now the SAM CPU usage is below 1 %.

Does the SAM high CPU utilization explains why we were seeing those dropouts on the TCPSource side?

Re: SAM high CPU usage

Without more trace information, it is hard to know the root reason of this CDISR2539E. But most possibly this is the reason of SAM using high CPU and also affect application.

Since somehow the session is not valid, SAM keep retrying, make it using high CPU and also "normal" stopinstance failed. If this ever happen again, please enable trace dynamically (without stopping instance) using "streamtool setprop -i <iid> InfrastructureTraceLevel=trace", and let it run for several more minutes, and collect log (by streamtool getlog -i <iid>) and send us a copy, so I can investigate.