for another bugfix, minor configuration changes and addition of a few extensions we need to restart the jupyterhub (https://max-jhub.desy.de/) today at 19:00.That will take only a few seconds, but it will most likely disconnect running kernels. In that case, you'd need to use the control panel to "Start My Server" and relaunch your notebook. The session (i.e. the slurm job) will persist.Apologies for the inconvenience.

on 19th Sep in the time from ~5:00 to 9:00 the home filesystem was on several nodes in the maxwell cluster not available.The problem is solved. For further questions send an email tomaxwell.service@desy.de

The problems regarding the all and allgpu partition are solved. If you still have issue regarding the schedule of your batch jobs please send us a mail to maxwell.service@desy.de

Original Message:After the slurm update last week we see some problems regarding
the "all" and "allgpu" partitions.
Jobs from "privileged" partitions (exfl,cssb,upex ...)
preempting (killing) jobs which were submitted to the all* partitions.
Even if the privileged jobs can't use the preempted nodes afterwards
due to constaints in the job definition.
(see https://confluence.desy.de/display/IS/Running+Jobs+on+Maxwell)
The privileged job will "kill" a job in the all* partition every 3 minutes until a
matching node is found and the "privileged" job starts.
As this bug is only triggered by pending jobs in the
privileged partitions with extra constraints , not all jobs in the all* queues
will fail. So for example the last 10h no job was preempted in the all* queue
We filed a bug report to SchedMD (the company we have a SLURM support contract with)
and looking forward for a solution.