Re: Performance of libauparse

From: John Dennis <jdennis redhat com>

To: Matthew Booth <mbooth redhat com>

Cc: linux-audit redhat com

Subject: Re: Performance of libauparse

Date: Tue, 30 Sep 2008 15:18:22 -0400

Matthew Booth wrote:

I have been investigating using libauparse in my austream replacement
audit daemon to do some inline data enhancement[1]. austream is
essentially a very thin wrapper which pulls audit records out of the
kernel, wraps them in a UDP syslog packet and sends them to the
network. It is very simple and very fast.

To measure the overhead of libauparse on austream I initialised
auparse as AUSOURCE_FEED, fed each received record into it, and spat
them out unmodified on receiving the AUPARSE_CB_EVENT_READY event.
This added more than an order of magnitude to the time austream spends
in userspace. A brief look at this overhead shows that about 40% is
spent in malloc()/free(), and 25% is spent in strlen, strdup, memcpy,
memmove and friends. I suspect that very substantial gains could be
made in the performance of libauparse by reworking the way it uses
memory, and passing the length of strings around with the strings.
Unfortunately, I suspect this would amount to a substantial rewrite.

Is this something anybody else is interested in? I guess performance
isn't so important if you're just scanning log files in non-real time.

Matt
[1] What I'd really like is a well-defined binary format from the kernel.

auparse is very inefficient in how it handles data. I noticed this when
reading the source code and mentioned to Steve the large number of
strdup's used to assemble an event. This is compounded by the fact the
processed data only persists for the period of time the record/event is
current. I am not surprised to see that profiling reveals a significant
proportion of time is spent repeatedly creating and destroying strings.

I also agree the data stream which emerges from audit is rather
difficult to work with. Eric likes to point out we can't change the
kernel, so maybe what we really need (and has been proposed) is for
auditd to reformat the data before emitting it or writing it do disk
(e.g. assemble records into events, decode strings which have been
hexified, etc.) Currently auparse is responsible for much of this as
part of a post processing step which has to be repeated every time audit
data is read instead of just once as it emerges from the kernel. If
instead the auparse user level code was folded into auditd which then
became responsible for formatting the ad hoc data received from the
kernel the final output from audit could be much more friendly and much
of the rationale for auparse would evaporate.