I seem to always be able ssh login to my account chongo@shell.sonic.net and see my files, but we sometimes cannot via his account. We say "sometimes see the error message" because sometimes the ssh connection to paulnoll@shell.sonic.net is successful as we DO see his files. This "Transport endpoint is not connected" problem comes and goes.

I seem to always be able ssh login to my account chongo@shell.sonic.net and see my files, but we sometimes cannot via his account. We say "sometimes see the error message" because sometimes the ssh connection to paulnoll@shell.sonic.net is successful as we DO see his files. This "Transport endpoint is not connected" problem comes and goes.

Any ideas why we sometimes see "Transport endpoint is not connected"?

That's the error that gets thrown when the backend ssh process to sshfs loses its connection. I've seen that after killing the underlying ssh process, but not sure why that would happen upon connect.

If it happens again, please note the time. I'm looking through the logs to see what could be causing this.

The new shell server is down this morning. Last night I received a wall message regarding a reboot "to clear old bind mounts". It's been unavailable since. An ssh session does connect, and accepts password, but then hangs.

gtwrek wrote:The new shell server is down this morning. Last night I received a wall message regarding a reboot "to clear old bind mounts". It's been unavailable since. An ssh session does connect, and accepts password, but then hangs.

oldshell.sonic.net seems fine.

Regards,

Mark

Yup, same issue here. The reliability of the new server has been quite poor.

gtwrek wrote:The new shell server is down this morning. Last night I received a wall message regarding a reboot "to clear old bind mounts". It's been unavailable since. An ssh session does connect, and accepts password, but then hangs.

oldshell.sonic.net seems fine.

Regards,

Mark

Yup, same issue here. The reliability of the new server has been quite poor.

Indeed it has. We appreciate your patience while we try to figure out this elusive and maddening problem.

It fell over again this morning at 2am. (It is now 3:52am, I reset it.) Adding memory didn't help. As much as it pains me, we're going to have to stagger 2am and 3am cron jobs until we can figure out why firing off a dozen cron jobs at once is causing a problem. (It _should_ be fine with that.)

yronwode wrote:It seems that you're still working on it. Thanks, if so. I keep getting disconnected. It's not urgent, but i would prefer not having to keep re-connecting.

Scott mentioned turning off some keepalives. That may be counterproductive for this problem. So long as the underlying network connectivity is stable, sending keepalives avoids having NAT devices time out the connection.

[As an aside, I'll mention that back in the early days of the Internet (late 1970s and 1980s) there were folks working on packet radio who hated keepalives because their network connectivity was intermittent. Their TCP connections would break even if they weren't active during the time of a connectivity loss due to a keepalive being sent automatically. But that was before NATs created the converse problem.]