After swapping Netscreen 204 for SSG-140 Session timeouts

I currently have a pair of SSG-140’s in an Active/Passive HA Cluster running 5.4.0r8.0 code and the users are complaining of session timeouts using most any TCP protocol. I had first thought it was limited to SSH, but they have brought me more examples of other protocols such as 3306 and 389. We replaced our existing Netscreen 204’s with the new SSG-140’s, the issue is this problem did not occur on the Netscreen 204’s running 5.0.0.r11.0 code. Setting most SSH clients to send keepalives seems to help.

Has anyone else seen issues with session timeouts (at 30 minutes) on SSG-140’s or any SSG for that matter. I am wondering why it was not an issue with the Netscreen 204’s or the netscreen 25’s?

set service <service>timeout <minutes>Of course you want to be careful with this command as it will keep the sessions for the services you define in the session table for the period of time you specify if they do not close properly.</minutes></service>

So there is no way for the firewall to proxy an expired session, using a similar technique (I did not think there was). That might be a cool feature, but probably not easy to implement in the code of the firewall. It would be nice feature on a policy based advanced feature basis if anyone is listening.

Yes, this is the SYN proxy functionality you are describing. This is part of the SYN flood screening protection. However this will not occur for all TCP traffic. The SYN proxy doesn’t kick in until the SYN flood threshold is reached. Refer to Concepts & Examples guides downloadable from the link in my sig.

One last question, is there a way to proxy the connections using the firewall acting as a proxy, and send a SYN/ACK back to the originating host with a computed cookie value as it’s Initial Sequence Number. If the originating host responds appropriately with an ACK with legitimate cookie response information, the firewall will create the session and forward the packet ?

Thanks Tim, you are absolutely correct, I had come across that difference doing a “show flow” command. I even went as far to put the command “unset flow tcp-syn-check” back in.

As you said:
I highly recommend against it (most because it will turn your firewall into a expensive ACL router!) but you can turn off tcp-syn-checking via the “unset flow tcp-syn-checking”.

That is the course we are going to go with “flow tcp-syn-checking” enabled!!

I think that any application that does not include a keepalive option is inherently broken, there should be no expectation that a session will be established “forever” without traffic. In that case they should use UDP.

I will go back to my development group and explain the situation to them, I believe the problems were almost exclusively limited to the Linux/Unix/MAC applications I did not here complaints from the Microsoft side.

I appreciate your explanation, and extremely Happy you shed light on this.

On 5.4 Juniper enabled “set flow tcp-syn-checking” by default. Making this a stateful (and decent) firewall. Previously you had to turn that option on to make it stateful.

So with tcp-syn-checking on, the firewall will no longer accept packets for sessions that timed out. This is honestly how it should be, if the application wants to use the connection it should send keep alives (you can configure ssh to do this).

To confirm this is occuring you can do two things, if you have a copy of a previous (pre 5.4 config) compare it to your current running config and look for that command. Or you can run a “debug flow drop” and you will see the tcp-syn drops. If there is not a valid existing session and the packet is not a syn packet (or apart of a current in process 3 way handshake) it WILL drop it.

I highly recommend against it (most because it will turn your firewall into a expensive ACL router!) but you can turn off tcp-syn-checking via the “unset flow tcp-syn-checking”.

This command wasn’t default on in 5.0 and previous code trains and that should explain why you’re now seeing this behavior.

I haven’t heard anything here but we’ve progressively taken customers from the 5.0.0 release through to 5.2 and recently the 5.4 release. We occasionally bring on someone new and bring them forward several releases because they were on an old release (but typically not something as old as the 5.0 branch with software as recent as 5.0.0r11).

I always run PuTTy with keep-alives for the same reasons as your developers and without it expect my sessions to time-out. I really don’t think it’s your Netscreen and ScreenOS release personally. Is it worth loading up a lab test with your 204’s, try and reproduce on 5.0r11 and then again on 5.4r8? It sounds like the SSG is behaving properly and timing out these sessions due to inactivity (when keep-alives are turned off). Is it also possible that your developers being aware of the swap are now just bringing forward some of their long-standing issues as “new” problems. (not being skeptical but I do see a lot of this!)

We had no custom session timeouts before, for the most part I migrated the configuration from the 204. The reason we removed the 204’s for the SSG-140, is the SSG-140 has 2 Gig over copper interfaces, and we needed the extra bandwidth. The Netscreen 204 is/was overkill on sessions and VPN, but not overkill on bandwidth (100mbs per port on the 204). This firewall is our Corporate connection to the Internet, we NAT by policy to a DIP pool of 2 Class C’s using Sticky Dip and never had any issues before.

I have not personally seen any issues, but we have a group of developers that started complaining after we swapped the firewall. They mostly were complaining about SSH connections (linux and MAC users), most use putty with keepalives turned off. This seemed to work fine while we were using the Netscreen 204. It seems like the 204 handles sessions differently than the SSG-140. Just trying to see what I can do to make life better for this group, and why changing hardware made a difference.

Haven’t seen a problem moving from Netscreen to the SSG series. We’ve handled some complex configurations and the transition has been very smooth. Did you migrate the entire 204 config over to the SSG140 (including any custom session timeouts) or did you do some partial migration or just a fresh rebuild?

Depending on the above I would just treat this as a new problem and look at the usual things like port settings, port counters, and the application behavior. BTW did you choose the SSG140 because the 204’s were overkill for your environment? Just wondering… don’t think it is related.