Answered by:

Server failed during deployment, cannot reimage.

Question

I have started deployment on 5 nodes when the headnode failed and restarted. When it was up again, the five nodes are still in Provisioning state, but nothing happens. In the Porivisoning log the last entry is "Reverted" and in Properties - "no ongoing operations can be cancelled"; State: Provisioning. In Operations the last entry is "saving node information to file - state, commited". And it has not changed for a long time now.

The nodes them selves were in their waiting loop for authorization from the headnode. I have now swtiched them off and will boot them up again when I can clean them up from the provisioning state so that I can start all over again.

Answers

to force the state to offline. This may also not work if the cluster considers that jobs are still running on the node. In this case a more heavy handed solution may be to remove the nodes from the cluster using

Remove-HpcNode -Name "nodename"

This will kick the node out entirely, but should at least allow you to redeploy from scratch.

Glad to hear from you. I have tried the above with no effect. It seems the headnode is really confused as it shows that the nodes are online and provisioning, whilst they are actually powered down. Also, if I go to the Operations or Provisioning log, I have no option to cancel any operations.

I have installed HPC 2008 SP1 hoping that the new service pack will resolve this, but without an actual effect.

to force the state to offline. This may also not work if the cluster considers that jobs are still running on the node. In this case a more heavy handed solution may be to remove the nodes from the cluster using

Remove-HpcNode -Name "nodename"

This will kick the node out entirely, but should at least allow you to redeploy from scratch.