Resources for the Check Point Community, by the Check Point Community.

Tim Hall has done it again! He has just released the 2nd edition of "Max Power".Rather than get into details here, I urge you to check out this announcement post. It's a massive upgrade, and well worth checking out. -E

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Strange connection disruption 30minutes + after policy install

I have a strange issue on our firewall in our UAT eccomerce environment.

Since itís UAT, itís not critical, but can cause some grumpy faces on developers of course.

So, this never happened before R80.10 by the way..

I push a policy, and it installs fine. Half hour later, servers can not talk to each other. They live on seperate subnets behind different interfaces. Doing another policy push, not changing anything resolved the issue 100% of the time.

That I canít get my head around. I change the connection persistence to keep old connections instead of rematching but to no avail.

Fw ctl debugs show nothing unnormal. CPU, memory usage all fine during the time and nothing showing in dmesg.

The weird bit for me is that it is always 30 minutes + after the installation. That isnít a specific 30 minutes by the way, itís varies between 30 minutes to 45 minutes, however itís a significant amount of time afterwards thatís stuff stops.

Interestingly, logs show no traffic at all. After a certain time stuff just stops. Thereís no drop no accept, nothing.

TCPDUMP shows a series of arps, from the servers behind the firewalls, arping for their end destinations.

Is there an issue with the CP not responding to ARP after a policy push? Therefote traffic is never received to the interface on the CP and never processed therefore doesnít show up?

What would cause the firewall to stop responding and not do anything?

Itís the fact is happens significantly after thatís throws me. I canít figure out whatís happening.

Any suggestions to give me food for thought on Monday morning would be great thank you!

Re: Strange connection disruption 30minutes + after policy install

I have a strange issue on our firewall in our UAT eccomerce environment.

Since it’s UAT, it’s not critical, but can cause some grumpy faces on developers of course.

So, this never happened before R80.10 by the way..

I push a policy, and it installs fine. Half hour later, servers can not talk to each other. They live on seperate subnets behind different interfaces. Doing another policy push, not changing anything resolved the issue 100% of the time.

That I can’t get my head around. I change the connection persistence to keep old connections instead of rematching but to no avail.

Fw ctl debugs show nothing unnormal. CPU, memory usage all fine during the time and nothing showing in dmesg.

The weird bit for me is that it is always 30 minutes + after the installation. That isn’t a specific 30 minutes by the way, it’s varies between 30 minutes to 45 minutes, however it’s a significant amount of time afterwards that’s stuff stops.

Interestingly, logs show no traffic at all. After a certain time stuff just stops. There’s no drop no accept, nothing.

TCPDUMP shows a series of arps, from the servers behind the firewalls, arping for their end destinations.

Is there an issue with the CP not responding to ARP after a policy push? Therefote traffic is never received to the interface on the CP and never processed therefore doesn’t show up?

What would cause the firewall to stop responding and not do anything?

It’s the fact is happens significantly after that’s throws me. I can’t figure out what’s happening.

Any suggestions to give me food for thought on Monday morning would be great thank you!

Your first order of business is trying to determine if the stoppage is a Gaia issue (ARP, routing, NIC card, etc.) or a Check Point issue (SecureXL, INSPECT, NAT, ClusterXL, etc). In other which side of the house is "eating" the traffic, which ironically I just talked about in my speech at CPX360. A few things:

1) When it happens again try immediately restarting SecureXL with fwaccel off;fwaccel on and see if things suddenly start working.

2) Also save the output for fw ctl arp when things are fine and compare it with the result of the same command when things are not fine.

3) In cpview baseline the Network part of the Overview screen (throughput, concurrent connections, new connections/sec etc) when things are working, then run cpview in historical mode (-t) and have a look at those same numbers (and perhaps other screens of cpview) during a known problem period. Should give you an idea of which side of the house is causing the issue.

Re: Strange connection disruption 30minutes + after policy install

Please confirm the followings:

1- Everything was working properly PRIOR to R80.10
2- Are you running the lastest JHFA on R80.10? Like Take 56

Since you have an established baseline when it WILL happen, schedule a TAC case with Checkpoint and request for someone with expertise on this, do NOT settle for some junior engineer because it will be a waste of time, so that they can look into this issue WHILE it is happening.

Thanks god I don't have to deal with R80.10. We're still at R77.30 at the moment. Next cycle, those R77.30 will be replaced by PaloAlto

Re: Strange connection disruption 30minutes + after policy install

Tim - I was at your speech at CPX, and I was attempting to use the notes I got down about 'what ate it' - however typically I cant understand my own notes and I think I am doing it wrong.

If I am right in thinking zdebug -T drop will show traffic dropped by SecureXL (if it has been?)

Its the last bit that throws me, with regards to Gaia eating the packet and capital I and lower-case i. Could you clarify if you dont mind?

- I will try your steps mentioned however - thanks!

update - The 30 minutes after statement is incorrect. I have feedback from other business areas that manage the services behind the firewall and they saw issue occur directly after the policy installation.

So - going back to it - doing a policy installation causes the issue, and then another installation will restore service.

I can provide examples next time I purposely cause the problem.

Could this be anything to due with SecureXL tables clearing after a policy installation?

Re: Strange connection disruption 30minutes + after policy install

Originally Posted by JPYDX

update - The 30 minutes after statement is incorrect. I have feedback from other business areas that manage the services behind the firewall and they saw issue occur directly after the policy installation.

So - going back to it - doing a policy installation causes the issue, and then another installation will restore service.

I can provide examples next time I purposely cause the problem.

Could this be anything to due with SecureXL tables clearing after a policy installation?

Could be, as a recalculation of most tables held by SecureXL is performed at that time. I'd try the fwaccel off trick immediately after policy install to help isolate the issue.

Re: Strange connection disruption 30minutes + after policy install

Originally Posted by JPYDX

How about doing it before? Would that cause any problems?

Only question is, why does another policy push solve the issue?

You can do it beforehand but disabling SecureXL on a firewall with 8 or more cores without a good reason is a bit risky, as it may cause a noticeable performance impact. I think it would be better to push policy, have the issue occur, then quickly run fwaccel off and if BOOM everything immediately starts flowing that makes it quite clear where the issue lies.

Every time the firewall policy is installed, there is a recalculation or "sync" between the INSPECT driver and SecureXL which maintains its own set of state tables. I've seen situations where this procedure can get hung up and just installing the policy again breaks the loop it is stuck in. Symptoms of this would be error messages like "waiting for policy load" or "too many errors" being shown by fwaccel stat. This is not very common but worth checking out based on the behavior you are describing.