Re: Firewall priority queues setting

Modifying the Firewall Priority Queue settings is not the proper way to accomplish what you are trying to do; this feature is only intended to ensure that critical firewall control traffic is not inordinately delayed by heavy user traffic flows when a Firewall Worker core reaches 100% utilization. While Priority Queues are enabled by default in R80.10, they do not start actively prioritizing traffic until a Firewall Worker reaches 100% utilization, and only for that overloaded Firewall Worker. All other non-overloaded Firewall Workers are still doing FIFO.

You probably just need to tune the firewall to reduce the latency, please provide output from the following commands and I can advise further:

enabled_blades

fwaccel statfwaccel stats -sgrep -c ^processor /proc/cpuinfo

/sbin/cpuinfofw ctl affinity -l -r

netstat -ni

fw ctl multik stat

Most likely cause of high latency during busy times is APCL/URLF policy being forced to inspect high-speed LAN to LAN traffic due to the inappropriate use of "Any" in the Destination column of the APCL/URLF policy layer, or object Internet is not being calculated correctly due to incomplete or inaccurate firewall topology definitions.

You have 16 physical cores (32 via SMT) and only 4 SND/IRQ cores which is the default. As shown above roughly 80% of the traffic crossing the firewall is being fully accelerated by SecureXL due to the limited number of blades enabled which is great, but 80% of the traffic crossing the firewall can only be processed by the 4 SND/IRQ cores which is almost certainly causing the bottleneck you are seeing.

I'd suggest disabling SMT/Hyperthreading via cpconfig which will drop you back to 16 physical cores, then assigning 10 CoreXL kernel instances via cpconfig which will allocate 6 discrete (non-hyperthreaded) SND/IRQ cores. Hyperthreading is actually hurting your performance in this particular situation; Multi-Queue is probably not necessary either and imposes additional overhead, but leave it enabled for now.

Re: Firewall priority queues setting

No need to adjust the ring buffer size, as that is a last resort and you have zero RX-DRPs anyway. Indiscriminately cranking up the ring buffer size can cause a nasty performance-draining effect known as BufferBloat between interfaces with widely disparate bandwidth available, at least with the typical FIFO processing of interface ring buffers.

Some Advanced Queue Management strategies such as Controlled Delay (CoDel) can be very effective in mitigating the effects of Bufferbloat on Linux systems with the 3.x kernel.

However for the first time publicly I present to all of you this very exciting (at least to me) screenshot from R80.20EA in my lab:

CoDel appears to be present in the updated R80.20EA security gateway kernel, and it is enabled by default as shown by the tc -s qdisc show command! Looks like my days of issuing dire warnings about increasing firewall ring buffer sizes are numbered...

Re: Firewall priority queues setting

There is a common mantra that packet loss should be avoided at all cost, yet TCP's congestion control algorithm relies on timely dropping of packets when the network is overloaded so that all the different TCP-based connections trying to utilize the congested network link can settle at a stable transfer speed that is as fast as the network will allow. Increasing ring buffer sizes can increase jitter to the point it incurs a kind of "network choppiness" that causes all the TCP streams traversing the firewall to constantly hunt for a stable transfer speed, and they all end up backing off far more than they should in these types of situations:

1) Step down from higher-speed to lower-speed interface (i.e. 10Gbps to 1Gbps) and the lower-speed link is fully utilized

2) Traffic from multiple interfaces all trying to pile into a single fully-utilized interface

3) Traffic from two equal-speed interfaces (i.e. 1Gbps) running with high utilization, yet upstream of one of the interfaces there is substantially less bandwidth, like a 100Mbps Internet connection

Note that two equal-speed interfaces cannot have these issues occur assuming there is truly the amount of bandwidth available upstream of the two interfaces that matches their link speed. In my book I tell the sordid tale of "Screamer" and "Slowpoke", two TCP-based streams competing for a fully-utilized firewall interface that has had its ring buffers increased to the maximum size, thus causing jitter to increase by 16X. Increasing ring buffer sizes is a last resort, generally more SND/IRQ cores should be allocated first and then perhaps use Multi-Queue. I'd suggest reading the Wikipedia article about Bufferbloat.