> On 06/11/2012 04:32 PM, Boaz Harrosh wrote:> > > On 06/11/2012 03:39 PM, Jeff Layton wrote:> > > >>>> >>> But I'm guessing we were wrong to assume that existing setups that> >>> people perceived as working would have that path, because the failures> >>> in the absence of that path were probably less obvious.> >>>> > > One more thing, the most important one. We have already fixed that in the> past and I was hoping the lesson was learned. Apparently it was not, and> we are doomed to do this mistake for ever!!> > What ever crap fails times out and crashes, in the recovery code, we don't> give a dam. It should never affect any Server-client communication.> > When the grace periods ends the clients gates opens period. *Any* error> return from state recovery code must be carefully ignored and normal> operations resumed. At most on error, we move into a mode where any> recovery request from client is accepted, since we don't have any better> data to verify it.> > Please comb recovery code to make sure any catastrophe is safely ignored.> We already did that before and it used to work.>

That's not the case, and hasn't ever been AFAICT. The code has changeda bit recently, but the existing behavior in this regard was preserved.From nfs4_check_open_reclaim:

return nfsd4_client_record_check(clp) ? nfserr_reclaim_bad : nfs_ok;

...if there is no client record, then the reclaim request fails. Doesn'tthe RFC mandate that?