I've been seeing a similar thing connecting from Fedora 7, Fedora Rawhide, or
CentOS 5 machines to a CentOS 4 server. For me, it isn't a matter of minutes
after connecting -- it seems to be when something dumps a few K of text to the
screen at once. Maybe something with terminal control codes, even (although it
is ssh that's hung, not the terminal). John, does that match what you're seeing?
Tried upgrading openssh on the server to RHEL5 version -- didn't help.

It's frustratingly hard to reproduce on purpose. It'll happen in mutt, in ls, in
less, in joe, in vi. But doing the same thing again won't necessarily recreate.
I need to do more diagnosing next time it happens -- at this point, I was
looking for similar bugs that might offer clues.

I don't have a backtrace with debuginfo, but I do have an strace of the running
process. client is responsive to keystrokes, with the appropriate-looking
selects, reads, and writes. On the server, though, it's just doing this:
$ sudo strace -p 25726
Process 25726 attached - interrupt to quit
select(10, [3 6 9], [], NULL, NULL
and the non-debuginfo (um, with RHEL5 sshd rebuilt on CentOS 4 to see if that
would help; it doesn't):
#0 0xb7fe87a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x0042a64d in ___newselect_nocancel () from /lib/tls/libc.so.6
#2 0x0011f847 in main () from /usr/sbin/sshd
Hmmm. That's interesting....

Just tried typing <enter>~<ctrl-z> to see what would happen. Got the "^Z
[suspend ssh]" output, but didn't seem to actually get a prompt. But then I
accidentally closed the window so I'm not 100% sure. Helpful, I know!

(In reply to comment #13)
> No luck; just happened with openssh-4.7p1-1.fc8 (client).
Damn, so now you can try to bisect versions of the client from various old
Fedoras to see when it broke. Because I think the problem wasn't there in the
3.9p1 version - at least I don't see any such reports on RHEL-4.
But that won't be fun :(.
Note that the old clients should be moreless usable on latest Fedoras. The
servers not so much because of selinux changes.

Yeah, the problem definitely started for me when I upgraded my home system from
a Fedora Core 4-based distribution, and for Paul Stauffer (who I added to the cc
list) on upgrading to a RHEL5-based one.
I'll see what more I can discover. Good times. :)

Actually, the system I'm seeing this problem on is Fedora 7. I have never seen
this problem on any of several FC6 systems I've used regularly, so I'm guessing
it's something that changed between 6 and 7.

I definitely see it on my BU Linux 5.0 ( = CentOS 5 / RHEL5) system too, with
openssh-4.3p2-16.el5.centos.bu50.13. (BU changes are to the config file only, so
it's unlikely that we've done anything that causes the problem. Plus my home
system is currently unmodifed rawhide.)
Fedora Core 6 had openssh-4.3p2-10.src.rpm, but then that got updated to
openssh-4.3p2-19.fc6.src.rpm. So it seems a bit odd that that would work and
RHEL5 not.
Maybe the problem is in some supporting library. Wouldn't that be fun.
Paul, can you back your home system back to 4.3p2-19.fc6?

I don't have any more CentOS4 / RHEL4 servers to test against. (Some of them I upgraded specifically to avoid this bug.) The problem definitely does not appear when connecting to a CentOS 5 or Fedora 9+ system. But I'm suspicious that the problem is still there.

Oh, weird. So, at my new job, we have a lot of machines running under VMware ESX 3.5. I still don't have a RHEL4 server, but the CentOS 5 (w/ latest patches) vm I am working on now exhibits the exact same problem when connecting with openssh-5.1p1-2.fc10.x86_64.
Could be a red herring, but the symptoms are exactly the same (including similar network behavior when I watch with tcpdump).
Even if the VMware issue I'm seeing now happens to be unrelated, I think this is serious enough that we want to *know* the problem is solved, not just hope that it happens to go away with a new release.
This bug probably needs to get the attention of someone on the RHEL side of things before RHEL 6, because it's extremely likely that after that release you'll see a lot of this in shops with RHEL 4 and the new 6 release.

I am really curious what triggers it because it seems apparent, that the problem is not just some kind of version incompatibility. To me it rather seems like certain conditions on the network connection or the client or server machines trigger some kind of race condition between the server and client and the client gets into a deadlock condition.

Previously, it mostly triggered when I was paging through e-mail in mutt. This time, it came up when I was running system-config-*-tui commands, and I was able to pretty reliably reproduce it using that. (I can't guarantee that this latest example is really the same bug, though.) However, it wasn't limited to full-screen apps -- it'd sometimes happen when doing an ls. It seems to mostly happen when there's a bunch of data dumped to the screen all at once — although it never appeared to happen when doing scp.

It seems to me that this could be an iptables connection tracking issue. I caught it while logging all dropped packets while I was having problems with NFS.
Although this seems to happen on its own at times, I can force the same result by restarting iptables on the server with some running ssh connections. Iptables then starts to drop seemingly random packets from what should be established connections. Sometimes it's only dropping a percentage of packets, making the connection seem slow (this was the problem I had with NFS), but often drops so many that the connection is effectively frozen.
My workaround has been to remove the NEW requirement for ssh connections, and accept all tcp on 22.
-A RH-Firewall-1-INPUT -m tcp -p tcp --dport 22 -j ACCEPT
I haven't ever had ssh freeze after this change.

I used to see this problem a lot when configuring my firewall using shorewall. The problem was a router somewhere was making out of frame packets which by default get marked as INVALID, and dropped. The fix in this case was:
echo 1 > /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal
It's a long shot, but I thought it might be helpful to mention this.

We just experienced a similar problem as Matt Miller (Comment #9 From Matthew Miller 2007-09-06 15:55:16 EDT). The only difference is we saw the problem performing fast/large SCP and HTTPS transfers.
To correct the problem, we changed the "client" (the one doing the acks) setting for SACK (aka. tcp selective acknwledgements, aka. selective acks). SACK is an option in the TCP Header and is set when a connection uses it.
On the "client" side, you could run the following command:
echo 0 >/proc/sys/net/ipv4/tcp_sack
On the network side, you would want to look for dropped packets at the firewall (maybe related to an unset SACK flag if you are using SACK), or if SACK is set in the network equipment you could disable it.
I'm not sure what the developers of OpenSSH can do to fix this particular issue. The sessions appear to timeout on the "server" (the one NOT doing the acks) side, so maybe if that could be explored, it might provide some insight.
Your mileage may vary. Good luck.

This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '11'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 11's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 11 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

(In reply to comment #34)
> We just experienced a similar problem as Matt Miller (Comment #9 From Matthew
> Miller 2007-09-06 15:55:16 EDT). The only difference is we saw the problem
> performing fast/large SCP and HTTPS transfers.
>
> To correct the problem, we changed the "client" (the one doing the acks)
> setting for SACK (aka. tcp selective acknwledgements, aka. selective acks).
> SACK is an option in the TCP Header and is set when a connection uses it.
>
> On the "client" side, you could run the following command:
> echo 0 >/proc/sys/net/ipv4/tcp_sack
>
> On the network side, you would want to look for dropped packets at the firewall
> (maybe related to an unset SACK flag if you are using SACK), or if SACK is set
> in the network equipment you could disable it.
>
> I'm not sure what the developers of OpenSSH can do to fix this particular
> issue. The sessions appear to timeout on the "server" (the one NOT doing the
> acks) side, so maybe if that could be explored, it might provide some insight.
>
> Your mileage may vary. Good luck.
please try turn off the window scaling.
"echo 0 >/proc/sys/net/ipv4/tcp_window_scaling"

Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.