Converting the Audit Log to SQL

Version Control

The Perforce server's audit log records information regarding access to the content of depot files. For tracking data, it answers the question of: who, what, when, where and how. Every action or p4-command that exposes content to the user has one or more entries created for it in the log. This data can be important to track and understand, so in this post I'll illustrate how to take the text-formatted audit log and convert it into an SQL database, along with examples of how to use the results.

The log has one entry per-line, per-file accessed. A single Perforce command can generate multiple lines. There are two types of format you'll encounter. The first is for a direct client-to-server connection, and the second is for when a proxy is in between the two.

Audit log commands that you might encounter are: branch, annotate, diff, deliver, merge, print, revert, resolve, sync and grep. Note that "deliver" is a server -> proxy transfer, and that the IP can be the string 'unknown' if there was a problem.

Now that you have an idea for what the audit log is about, you can take a look at the script (auditlog2sql.pl) I wrote to parse the log and write the data to a SQLite database. Its only dependency is the sqlite3 executable. There's also a version that uses Perl's DataBase Interface, which will make it easier to adapt the script for use with other databases. There isn't much difference between the two other than the DBI one being 18% slower (something I haven't looked into.) There's nothing fancy going on in the script, but it should be useful as a reference.

If your server doesn't already have audit-logging enabled, you'll need to either use the P4AUDIT environmental variable, or the '-A' flag when restarting the server.

Now on to the examples. The first thing we need is some log data to process. To start off, we'll create a new file with two revisions:

One thing to note is that if a user adds/submits a new file, that there won't be an audit log entry for it. The rationale being that the server hasn't actually sent the user any file content - the audit log tracks access.

The converter script has two ways it can be called. The first is where you specify both the name of the log file and resulting database. The second is where the script reads the log from its STDIN and writes to the default database file 'p4audit.db'. The database and target are created if they don't exist, otherwise they are used as-is.

SELECT DISTINCT user FROM p4audit WHERE file LIKE '//depot/secrets/%';

Since the audit log will grow as fast as your users are getting data, I expect that the log will be rotated frequently. One way to use auditlog2sql is to convert each rotated log into a separate, named database, then extract data from each one into another, smaller database. SQLite can attach a database to the current session, so your audit pipeline might be like this: rotate the text-based audit log, convert it to a database, load the database, attach a database to extract into, extract data, close, repeat. You could keep the source database files around on some offline storage system and use the smaller ones for whatever reports you need.

Here's an example of how to convert two audit logs, connect the resulting databases and run a query between them. The active database you're using when in the SQLite shell is called 'main' by default.

Here's a more useful example - it shows how to record all file content your super user has ever seen. The first block is the bootstrap, and the next one is of the form that every subsequent import will follow.

If you're not keen on storing the converted audit data in binary-form, you can make a text-based SQL export of the database like this:

sqlite3 p4audit.db .dump

For those curious about how P4toDB relates to this - the difference is that P4toDB reflects the current state of the server (and all its metadata,) while the audit log captures content access history. P4toDB provides one database per P4D instance, while auditlog2sql creates many.

If you come up with interesting audit-log queries, I'd like to hear about them!