The "why" is explained in the mails, basically because sshd child processes are not stopped (only the daemon), this is a feature of sshd, then the network is shutdown... is like cutting the wire.

Arch Linux installation media does not setup/start the network (you done it manually), finally when reboot the machine, there are a killall5 @ rc.shutdown commands that kills _all_ proceses (but network is still up). This is why your connection is disconnected by remote host ;)

Yes, I know. But to solve this, there is nothing to do in sshd rc script. In the above links there are two proposed solutions that I proposed. S2 is from my point of view the better:
S2: "Do not stop network in the loop, just omit them. And stop, after the killall5 commands. This also ensure that all daemons and your childs are
stopped, the shutdown the network."

Definitely not. This is the ugliest solution I could think of and it doesn't work if your network daemon isn't called 'network', but ... let's say, 'net-profiles' or whatever.

Delaying or even entirely skipping network shutdown is something that might be desirable for a number of reasons, but must be implemented in a place where it belongs, like the network, net-profiles, net-auto and so on scripts.

Two questions we need to ask ourselves:
1) Why would anyone even want to shut down the network on shutdown?
2) Should our init scripts know the difference between boot and start, or between stop and shutdown?

OK. Anyway under initscripts package network daemon is called "network". But agree, and what about adding something like "network hook" and make it independent from "rc.d" scripts? Then "network hook start" is called before "rc.d start" and "network hook stop" is called after "rc.d stop".

1) I guess that is not necessary in all cases.
2) Can be useful in some scenarios.

While Florian's solution would work most of the time, I believe it is
not robust enough and carries some external dependencies which could
make it fail in some cases. Let me explain.

The hanging ssh sessions problem occurs when, for whatever reason, the
network goes down during shutdown before all sshd sessions are
terminated. So any solution to it should guarantee that sshd sessions
are closed before network goes down. Also, it should prevent creating
new sessions so the master sshd should be stopped first. A solution
should be robust enough that is it should not depend on the order in
which daemons (including "network") are stopped.

Thomas Bächler asked two important questions earlier in this thread:

1) Why would anyone even want to shut down the network on shutdown?

As it is now, network is stopped during shutdown. There is an option
(NERTWORK_PERSIST) to prevent this for good reasons. Obviously, we cannot
rely on this option, since it is just an option.

2) Should our init scripts know the difference between boot and start,
or between stop and shutdown?

In my opinion, they should not. There is a well defined mechanism in
"initscripts" to attach additional actions to run level changes: hook
functions (see below).

I think it is fragile to depend on the order of daemons listed in
/run/daemons/. If (today) one uses "network", yes sshd will go down
before network with Florian's solution. But what about different
networking setups or future changes in this area.

In the spirit of my analysis above, I suggest a different solution: a
hook script installed in /etc/rc.d/functions.d/ registered to the
"shutdown_start" phase. I have attached such a script I have been
successfully using for a while. The script should be a new component of
the sshd package, that is why my attachment is not a diff.

The real question is why NETWORK_PERSIST has no effect (killall kills something before sshd?). And moreover, it is still specific to /etc/rc.d/network. Then again, everything started up by initscripts should go down at reboot/poweroff via same initscripts.

In my understanding, the only clean solution can be achieved using cgroups: if a server is woken up after net, it and all its descendants will go down before the net.

Tom (initscripts dev) said the best solution would be to have initscripts kill all user processes at shutdown before starting to touch system daemons, but unfortunately that's not possible with the current initscripts framework.

I agree with his analysis, and in the meantime Dobcsanyi's solution will do.

Leonid: I do not wish to kill all sshd processes in the stop case of /etc/rc.d/sshd as many users (including myself) make use of sshd's behavior to leave current sessions open even after you've killed the main daemon.

brain0: not sure what you are referring to. It is true that we could add "shutdown" as an additional action in the rc script and let rc.shutdown call both "stop" and "shutdown" for each script. That will have the same effect as what Gaetan proposed (but will have the added benefit that other rc scripts could do the same).

It does not solve the problem of killing user processes before daemons in general, but I don't think that is something we can easily do anyway.

Does it make sense to distinguish stop and shutdown for other daemons? I much prefer adding a shutdown hook to fix this specific SSH issue until initscripts kills user processes before system daemons, rather than having you add a shutdown case that only SSH will use...

Tom, I refer to the problem of this bug report. If we kill all sshd processes on shutdown, but not on regular sshd stop/restart, then we win.

I propose the following: Use some bash magic to provide a shutdown function to each rc.d script that defaults to just calling the stop function. Then, any rc.d script can override it. From rc.shutdown, we then call shutdown instead of stop.

1. The "bug" was filed against ssh, so why suddenly net management needs fixing? E.g. not shutting down network, etc. As Thomas already said, this all is not generic and is limited to /etc/rc.d/network. What about wireless servers with netcfg?

2. As long as there is no dep logic in the initscripts, and network (or netcfg) is started _before_ sshd, why should network (or netcfg) care at all about sshd with its users and forks?

3. Why this problem is thought to be reboot/shutdown related? It's a generic issue. From the point of view of sshd, /etc/rc.d/sshd stop ==== shutdown. If you want to kill the master daemon why don't kill it explicitly; if sessions are not cleaned up by stopping sshd, it's a real bug IMHO.

4. Is it architecturally sane to manage daemons through hooks in initscripts? Sshd has its own boot script. I agree with Tom, but really, why can't one just mount an empty cgroup hierarchy from rc.sysinit alongside with /run, which then can be populated/used by individual boot scripts as necessary (for instance, sshd/httpd, but not alsa/iptables/ntp)?

> 3. Why this problem is thought to be reboot/shutdown related? It's a generic issue. From the point of view of sshd, /etc/rc.d/sshd stop ==== shutdown. If you want to kill the master daemon why don't kill it explicitly; if sessions are not cleaned up by stopping sshd, it's a real bug IMHO.

No, no, no, no, no, no! We did this once, and people almost got killed (I was one of the potential killers).

Let's say you upgrade your system, and you want to restart sshd, so it utilizes a bugfix in openssl (for example). So you run "rc.d restart openssh" or "rc.d stop openssh && rc.d start openssh". What happens is this: Your ssh session gets killed (along with everyone else's) and you don't see any output from the sshd start. What else happened to people? sshd failed to start (maybe they changed their config file and screwed up, maybe something else broke) and they got LOCKED OUT from their machine (their headless server that is several hundred kilometers away). No way to get back in. Doing this is pretty common, and a sysadmin expects that his sshd sessions will remain open during a restart of the master daemon. This functionality has priority over any inconvenience, like the problem in this bug.

Alright, I'm pushing a new openssh package with Dobcsanyi's fix to [testing]. The fix will be removed as soon as initscripts offers a better solution, but that's heavier work that isn't IMHO warranted just by this specific issue.

The sshd_close_sessions function in /etc/rc.d/functions.d/sshd-close-sessions tries to stop the sshd daemon without checking first if it's running. This create a failure message if you don't run the daemon. There's a function in the initscript to check if a certain daemon is running.

This fails again when using pure systemd. Looks like there is no sshd.close-sessions counterpart for it.

Also, openssh keeps the paths where it was logged into busy, making them fail umount during reboot (for example "umount /boot: target is busy" if one reboots when pwd==/boot/* and /boot is on separate partition)