>Number: 44418
>Category: kern
>Synopsis: FAST_IPSEC and if_wm kernel panic - may affect the whole
>network stack
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jan 19 18:55:00 +0000 2011
>Originator: Dr. W. Stukenbrock
>Release: NetBSD 5.1 - HEAD also
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD s0g7 5.1 NetBSD 5.1 (NSW-locationGW) #26: Wed Jan 19 17:05:17
CET 2011 ncadmin@s0g7:/usr/src/sys/arch/amd64/compile/NSW-locationGW amd64
Architecture: x86_64
Machine: amd64
>Description:
The system is a supermicro X8SIL (3400 chipset) with a L3406.
Due to problems with the onboard NIC's, there is a dual port NIC
(82571EB based) too.
The test uses only the two 82571EB NIC's. No cable installed on the
onbaord NIC's.
I've configured a tunnel with ipcomp and esp - the problem happens on
outgooing packets.
spdadd -n 172.25.0.0/16 172.16.0.0/16 any -P out ipsec \
ipcomp/tunnel/62.153.101.247-62.153.101.241/use \
esp/tunnel/62.153.101.247-62.153.101.241/require;
Only the outgooing rule shown. There is a corresponding setup on a
NetBSD4.0 system at 62.153.101.241.
This system is 172.25.0.1 and 62.153.101.247.
Some trafic (ftp put of a large file) is done from 172.25.0.2 to
172.16.3.1 that goes through the tunnel.
In general it works fine, but .... -> panic
Problem description based on the NetBSD source code:
The function key_checkrequest() in /usr/src/sys/netipsec/key.c removes
the sav entry from the
isr structure and a new one is allocated after that. This places NULL
for a short time in isr->sav.
This is done regardless the number of packets currently useing this
isr! (or this SP where the isr is attached)
In the ipcomp code after compression, the sav in the isr is checked in
/usr/src/sys/netipsec/xfrom_ipcomp.c
function ipcomp_output_cb() against a new allocated sav and if they do
not match an ASSERTION is triggered.
Some statements later in /usr/src/sys/netipsec/ipsec_output.c function
ipsec_process_done() there
are some assertens on isr->sav too and after that the pointer isr->sav
is referenced.
Now I've got lots of kernel panics when the isr->sav is referenced,
because it is a NULL pointer!
I've tried to figured out the reason, because it was not clear from the
source.
I've enabled DIAGNOSTICS and DEBUG - turns on IPSEC_ASSERT.
Now the ASSERT fails sometimes in ipcomp_output_cb() and sometimes in
ipsec_process_done() - every time
with a NULL pointer ...
But in DDB I'm always finding the correct valid pointer in the isr
structure ... ?!?!?!
I've added some print messages with the actual values in front of the
assertion in case of NULL, and
find out that the pointer reads NULL and next time the read reads the
valid pointer again.
Strange - is another CPU modifying the data, or is it a cache problem
with the L3406 ????
I've search in the whole sources where isr->sav gets modified and found
only one place in key_checkrequest().
I've changed the way the the isr->sav is updated - mainly allocate new
one first and do some kind of
"atomic" update by assignement of the new pointer to avoid NULL in
isr->sav.
The NULL problems has gone!
Hmmmmm .... it is a multiprocessing related problem!
I've checked the SPL-state (by source code analyses) and it is
splsoftnet all the time.
I've checked the splsoftnet() implementation on amd64 and find out that
it is mapped to the assembler
stub "splraise" in /usr/src/sys/arch/amd64/amd64/spl.S. That one simply
changes a value
in "CPUVAR(ILEVEL)". It does nothing in respect to other CPU's - as far
as I can see ...
So if an outgooing packet is processed and key_checkrequest() is
called, this may be concurrent to the
call to ipcomp_output_cb() by the crypto-stuff-kernel-thread.
That could be the reason for my problem ...
I accedently do not know if the fact, that both CPU's are runnung on
SPLSOFTNET at the same time, is correct or not.
The amd64 implementation looks like that this is a valid situation.
(And I think it would be very slow to contact all other cpu's when
changing SPL level.)
If it is allowed for any number of CPU's to run in parallel on
SPLSOFTNET, then the current implementation
of FAST_IPSEC is broken!
The assumption that there is a valid isr->sav pointer in
ipsec_process_done() is void, because
it may have changed by key_checkrequest() to a NULL pointer after the
check for NULL done in
ipcomp_output_cb() -> panic.
And this may happen every time if a second packet is forwarded in the
tunnel just in that moment when the
ipcomp processing is between the check in ipcomp_output_cb() and the
access to the pointer in ipsec_process_done().
There must be a mutex in order to synchronize the access to the
structures!
remark: I'm not shure if other parts of the network stack or devices
are affected too ...
With the desscribed change to key_checkrequest() above, only
the NULL panic in ipsec_process_done()
are gone.
I've still additional crashes in if_wm.c when extracting a mbuf
from the send-queue with NULL ...
Seems to be something simular to the NULL crash above, but
still no time to go deeper into that.
I've still the problem that sometimes a static added SA with
setkey disapears (the outgooing ESP-SA in
all cases up to now - no time to search that one till now,
but I think it is related to the broken
MP-sync too). No racoon started. Happens too if only the sshd
and the loging shell is running on
the system. (All other processes killed after boot.)
At the moment it looks like the FAST_IPSEC implementation is not MP
safe and the whole
network stack runs only stable, because all device interrupts are
processed (and serialised) on one CPU only.
At least the wm-driver seems to get into problems with the send-queue
if multiple CPU's are gooing to start packets.
This is the reason why I classified this PR as critical.
remark:
This setup runs fine (on other systems of cause with only 2 cores) with
NetBSD4.0 and I cannot say where
the main difference in the FAST_IPSEC and or if_wm implementation is.
Also the change of the splxxxx routines from C to assembler should not
be the reason for anything.
At the moment I think it is running "stable" with 4.0 due to the slower
machine with less cores
and the "other" kernel-internal thread scheduler. So 4.0 may be also
affected.
>How-To-Repeat:
Setup a tunnel with ipcomp and esp - ipcomp alone shoulc be good
enougth too.
Use a fast machine with at least 4 Cores/Threads - e.g. Xeon L3406 (2
Cores, each 2 Threads)
>Fix:
Still not realy known to me. (sorry)
For the NULL panics in ipsec_process_done() the suggested workaround
above for key_checkrequest() seems to help.
But there should be a mutex to synchronise access to the key structures.
But the problem seems to be a much more general one !!!!
The complete multi-CPU synchronisation in FAST_IPSEC needs a review and
seems to be instable at the moment.
The whole network stack may be affected too - e.g. when accessing
interface structrures.
From my current point of view the whole network stack is affected and
the MP-synchronisation needs a review.
>Unformatted: