Subject: Re: [dm-devel] SCSI Hardware Handler and slow failover with large number of LUNS

Date: Mon, 06 Apr 2009 10:43:45 -0500

Chandra Seetharaman wrote:

Hello All,
During testing with the latest SCSI DH Handler on a rdac storage, Babu
found that the failover time with 100+ luns takes about 15 minutes,
which is not good.
We found that the problem is due to the fact that we serialize activate
in dm on the work queue.

I thought we talked about this during the review?

We can solve the problem in rdac handler in 2 ways
1. batch up the activates (mode_selects) and send few of them.
2. Do mode selects in async mode.

I think most of the ugliness in the original async mode was due to
trying to use the REQ_BLOCK* path. With the scsi_dh_activate path, it
should now be easier because in the send path we do not have to worry
about queue locks being held and context.

I think we could just use blk_execute_rq_nowait to send the IO. Then we
would have a workqueue/thread per something (maybe per dh module I
thought), that would be queued/notified when the IO completed. The
thread could then process the IO and handle the next stage if needed.

Why use the thread you might wonder? I think it fixes another issue with
the original async mode, and makes it easier if the scsi_dh module has
to send more IO. When using the thread it would not have to worry about
the queue_lock being held in the IO completion path and does not have to
worry about being run from more restrictive contexts.