some IO ops with large latency in SDA fc perf

Hopefully this is the right category, I'm not used to this new interface.

Summary:

Using SDA FC PERFORMANCE, I am seeing some IO operations taking a REALLY long time to complete? Up to SECONDS of latency? Anybody else seeing this?

Background:

I am trying to comission a new EVA p6500 (fully loaded 450 600G sas drives). I added vms volumes to an application cluster (via HB shadowing). After about a day user complaints prompted us to drop these shadow members. FC PERF indicated really poor latency for a small (but user noticeable) percentage of IOs. I have also noticed the problem on our older EVAs, this one just seems to have it much worse.

I have since tried to simply troubleshooting by eliminating possible SAN buffer credit, 8G fillword issues, etc... by plugging this array directly into the (4g) switches where the hosts are connected. I added a few non-critical shadow members back, and I'm still seeing the latency. The only tool I can see this with is SDA FC perf, because all other tools (evaperf, portperfshow, etc...)deal with averages, so the problem is masked.

I have been working the problem with HP storage, but thought I would also post here, as little progress is being made. What do your numbers look like? Are they anything like this? THANKS!

Re: some IO ops with large latency in SDA fc perf

not sure if this is useful but the attached file shows some stats for one node of a Blade cluster using EVA 4100.

There are 28 physical disks, all scsi, and I think 10,000 rpm.

No HB shadowing - all logical disks are configured as mirror sets in the EVA.

I'd guess you've already checked this but are all cluster nodes directly attached (via fibre) to the EVA - that is none accessing via MSCP for example? We had one case where a node lost the fibre connections and was being served by another node.

Re: some IO ops with large latency in SDA fc perf

Yes Mark, very useful thanks...

I can see you have some of the same problem I do (but maybe not as bad)... I would love to find out what causes an IO to take 512ms, 1 sec, or even 2 seconds to complete. Our application is possibly sensitive to this latency, we have intermittent failures to get a database lock before it times out.

Re: some IO ops with large latency in SDA fc perf

These are complex controllers, and implementation details such as "thin provisioning" may well cause some I/O operations to have unusually longer latencies, as might parallel operations and I/O activity from other FC SAN hosts sharing the controller.

Re: some IO ops with large latency in SDA fc perf

Hi Hoff,

I have been working it with HP for quite some time, but progress has been slow, to understate the situation.

This array is new, not using thin provisioning, was using it for one app cluster, and a mail server that does 100 ops/sec when it's busy... As it stands I can't keep it in production on this one app cluster because of sluggish response.

no fancy stuff at the moment just basic vraid1

My primary motivation for posting here is to get other people's (who have p6500s, or p6300) input, and see their fc perf outputs.