Comments

The firmware flash update is conducted using an RTAS call, that is serialized
by lock_rtas() which uses spin_lock. rtasd keeps scanning for the RTAS events
generated on the machine. This is performed via a delayed workqueue, invoking
an RTAS call to scan the events.
The flash update takes a while to complete and during this time, any other
RTAS call has to wait. In this case, rtas_event_scan() waits for a long time
on the spin_lock resulting in a soft lockup.
Approaches to fix the issue :
Approach 1: Stop all the other CPUs before we start flashing the firmware.
Before the rtas firmware update starts, all other CPUs should be stopped.
Which means no other CPU should be in lock_rtas(). We do not want other CPUs
execute while FW update is in progress and the system will be rebooted anyway
after the update.

On 08/30/11 11:33, Benjamin Herrenschmidt wrote:
> On Wed, 2011-07-27 at 17:39 +0530, Ravi K. Nittala wrote:>> The firmware flash update is conducted using an RTAS call, that is serialized>> by lock_rtas() which uses spin_lock. rtasd keeps scanning for the RTAS events>> generated on the machine. This is performed via a delayed workqueue, invoking>> an RTAS call to scan the events.>>>> The flash update takes a while to complete and during this time, any other>> RTAS call has to wait. In this case, rtas_event_scan() waits for a long time>> on the spin_lock resulting in a soft lockup.>>>> Approaches to fix the issue :>>>> Approach 1: Stop all the other CPUs before we start flashing the firmware.>>>> Before the rtas firmware update starts, all other CPUs should be stopped.>> Which means no other CPU should be in lock_rtas(). We do not want other CPUs>> execute while FW update is in progress and the system will be rebooted anyway>> after the update.>> Shouldn't we resume the event scan after the flash ?>
The flash operation is performed in the reboot path at the very end.
So, even if we restart the event scan, the thread may not be able to process
the events. Hence we thought we would leave it stopped.
Again, we do not have much expertise in deciding which is the best thing to do.
We could resume the event scan, if you think that is needed.
Thanks for the review.
Suzuki

On Tue, 2011-08-30 at 11:47 +0530, Suzuki Poulose wrote:
> >> > The flash operation is performed in the reboot path at the very end.> So, even if we restart the event scan, the thread may not be able to> process> the events. Hence we thought we would leave it stopped.> > Again, we do not have much expertise in deciding which is the best> thing to do.> We could resume the event scan, if you think that is needed.> > Thanks for the review.
No that's ok, I'll merge the patch as-is then.
Cheers,
Ben.

On Tue, 2011-08-30 at 16:19 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2011-08-30 at 11:47 +0530, Suzuki Poulose wrote:> > >> > > > The flash operation is performed in the reboot path at the very end.> > So, even if we restart the event scan, the thread may not be able to> > process> > the events. Hence we thought we would leave it stopped.> > > > Again, we do not have much expertise in deciding which is the best> > thing to do.> > We could resume the event scan, if you think that is needed.> > > > Thanks for the review. > > No that's ok, I'll merge the patch as-is then.
Actually, please dbl check you get the dependencies right. The event
scan stuff is only compiled if CONFIG_PPC_RTAS_DAEMON is set, but the
rtas flash code depends on a different config option that can be set
independently.
So at the very least you need an ifdef to guard the cross-call
Cheers,
Ben.

On 08/30/11 11:51, Benjamin Herrenschmidt wrote:
> On Tue, 2011-08-30 at 16:19 +1000, Benjamin Herrenschmidt wrote:>> On Tue, 2011-08-30 at 11:47 +0530, Suzuki Poulose wrote:>>>>>>>>>> The flash operation is performed in the reboot path at the very end.>>> So, even if we restart the event scan, the thread may not be able to>>> process>>> the events. Hence we thought we would leave it stopped.>>>>>> Again, we do not have much expertise in deciding which is the best>>> thing to do.>>> We could resume the event scan, if you think that is needed.>>>>>> Thanks for the review.>>>> No that's ok, I'll merge the patch as-is then.>> Actually, please dbl check you get the dependencies right. The event> scan stuff is only compiled if CONFIG_PPC_RTAS_DAEMON is set, but the> rtas flash code depends on a different config option that can be set> independently.>> So at the very least you need an ifdef to guard the cross-call
Thanks for catching this ! Will address this in the next version.
Thanks
Suzuki