Bill,
What OS are the IOCs using?
What is the failure frequency?
Can you reproduce it with other channel access clients, like caget from EPICS base? You could run caget inside a shell script and it would emulate what your software is doing.
Mark
________________________________________
From: tech-talk-bounces@aps.anl.gov [tech-talk-bounces@aps.anl.gov] on behalf of Bill Lavender [lavender@agni.phys.iit.edu]
Sent: Friday, May 11, 2012 3:21 PM
To: tech-talk@aps.anl.gov
Subject: Channel Access monitoring tools
One of the installations I am responsible for is having trouble with
unexplained PV connection failures. What I have is a program that needs
to do things in a serialized fashion. In other words, for each of
several PVs, it needs to start a connection to a PV, wait for the
connection to complete, and then immediately use that PV.
At present, I am doing something like this:
1. Invoke ca_create_channel() with a connection state change handler.
The connection state change handler sets a flag to indicate that
the connection has complete.
2. Call ca_pend_io() to make sure the request is sent on its way.
3. I then wait inside a loop periodically calling ca_poll() until
my connection flag has been sent.
4. If the loop has been looping for too long, I declare a timeout
and call ca_clear_channel() to get rid of the existing unconnected
PV andn then call ca_pend_io() to send that request on the way.
My code did not originally have this ca_clear_channel() call,
but I added it to see if it would help and in the name of
preventing memory leaks. It didn't help.
5. I then go back to step one to create a new channel.
6. If I execute the outer loop from step 1 to step 5 too many times
then I give up and tell the user that I have timed out.
Increasing the timeouts has not helped. For debugging, ,I have tried
timeouts as long as 10 seconds and have not seen a change in the frequency
of connection timeouts.
At present, the only client platform that I have that has these timeouts
is Debian 6.0 Linux (Squeeze). The same hardware running Debian 5.0
did not have these problems. The clients are using EPICS Base 3.14.10
and the IOC is using 3.14.12.1.
I am assuming that what I need to do here is to monitor the network
traffic between the Debian 6.0 machine and the IOC and compare it
to the traffic between a Debian 5.0 machine and the same IOC. I see
that there is a Channel Access plugin for Wireshark that I hope will
be helpful. Are there other things that I should be trying?
The Channel Access code is wrapped in some code of my own, so it will
not look quite the same as raw Channel Access code, but if you want
to look at it anyway, look at the function mx_epics_pv_connect()
in this file
http://svn.csrri.iit.edu/mx/trunk/modules/epics/mx_epics.c
Thanks.
Bill Lavender
lavender@agni.phys.iit.edu