[DAC] Audit

Details

Description

Audit: Important actions taken by subjects should be logged for accountability, a chronological record which enables the full reconstruction and examination of a sequence of events, e.g. schema changes or data mutations. Logging activity should be protected from all subjects except for a restricted set with administrative privilege, perhaps to only a single super-user.

Support dynamic scaling transparently and support multi-tenant. Acquire enough detail and support streamline auditing in time. Should be configurable on a per-table basis to avoid this overhead where it is not wanted.

Consider logging audit trails to an HBase table (bigtable type schemas are natural for this) and also external options with Java library support - syslog, etc., or maybe commons-logging is sufficient and punt to administrator to set up appropriate commons-logging/log4j configurations for their needs.

Activity

Beware the lessons of the historian, storing data like this in an actual table may cause problems when the systems are offline. I would vote for straight up normal logging and let people put together a log aggregation infrastructure as needed.

ryan rawson
added a comment - 26/Nov/09 21:31 Beware the lessons of the historian, storing data like this in an actual table may cause problems when the systems are offline. I would vote for straight up normal logging and let people put together a log aggregation infrastructure as needed.

Audit is always for regulatory needs. How to secure auditing data as evidence and if there is enough detail to trace the source and problem is the key point I think. If the auditing data can deliver to target in time, it will better.

From regulatory compliant needs, it not only needs to acquire all events on the table, but also needs to collect the necessary events from the cluster, such as server offline information, and some necessary information (metadata and status at that time) to analyze the event. Thus, third-part software can get the detailed event in time for monitoring, content inspection or policy enforcement in the company.

linden lin
added a comment - 03/Dec/09 08:57 Audit is always for regulatory needs. How to secure auditing data as evidence and if there is enough detail to trace the source and problem is the key point I think. If the auditing data can deliver to target in time, it will better.
From regulatory compliant needs, it not only needs to acquire all events on the table, but also needs to collect the necessary events from the cluster, such as server offline information, and some necessary information (metadata and status at that time) to analyze the event. Thus, third-part software can get the detailed event in time for monitoring, content inspection or policy enforcement in the company.

I like the idea of audit logs going out via commons logging so you could hook up a sink of your choosing (and yes, sink could be an hbase table.. we could write a logger plugin for log4j or some such to do this).

stack
added a comment - 04/Dec/09 05:19 I like the idea of audit logs going out via commons logging so you could hook up a sink of your choosing (and yes, sink could be an hbase table.. we could write a logger plugin for log4j or some such to do this).

So I think participants on this issue are in basic agreement we can start with commons logging, presumed into a log aggregation framework. Should put support in package o.a.h.h.log.audit or similar to facilitate routing and filtering in log4j properties.

Andrew Purtell
added a comment - 04/Dec/09 05:29 So I think participants on this issue are in basic agreement we can start with commons logging, presumed into a log aggregation framework. Should put support in package o.a.h.h.log.audit or similar to facilitate routing and filtering in log4j properties.

linden lin
added a comment - 07/Dec/09 03:12 There is afraid of security issue about storing auditing log on the same Hbase. Audit's motivation includes the observation of the administrator's behavior.

@Linden That makes sense. So, if writing to hbase, write to a different hbase instance? Emitting audit logs using apache commons or so or sfl4j make sense to you and then hooking up the logging system to different kind of sinks writing any necessary plugins if needed make sense to you?

stack
added a comment - 07/Dec/09 04:48 @Linden That makes sense. So, if writing to hbase, write to a different hbase instance? Emitting audit logs using apache commons or so or sfl4j make sense to you and then hooking up the logging system to different kind of sinks writing any necessary plugins if needed make sense to you?

@stack, User should have a another hbase instance for audit isn't a reasonable solution from my view. Acquiring the enough detail, log4j or other logging solution is ok (leverage the efforts in implementation). But my consideration is how to transfer the log to different kind of sinks with efficient method.
My draft idea, I recommend using the distributional subscriber & receiver model for Hbase audit. One Hbase server (or HRegion) is a subscriber (many subscribers) for the distributional framework and receiver is the any sink which receives the interested content from distributional framework. The key point is receiver can divide the subscriber's log for load balance (for example, by topic name, topic name is IP address, table name, key range and so on).
Thus, Hbase only needs to add a client plug-in for the distributional framework (message bus, etc) and define the log title for router (it is static from design).

Normally, the auditing feature is disabled. When user want to enable this feature, he should install the specific third-party router cluster (distributional, scalable framework), then add the cluster address to Hbase configuration. Thus, Hbase cluster can be the subscribers for the router cluster. The next things I think they are all customers' task. (Add receiver, operate the log and so on)

Meanwhile, should we need to support dynamic subscriber and subscriber content in this version?

linden lin
added a comment - 07/Dec/09 10:16 @stack, User should have a another hbase instance for audit isn't a reasonable solution from my view. Acquiring the enough detail, log4j or other logging solution is ok (leverage the efforts in implementation). But my consideration is how to transfer the log to different kind of sinks with efficient method.
My draft idea, I recommend using the distributional subscriber & receiver model for Hbase audit. One Hbase server (or HRegion) is a subscriber (many subscribers) for the distributional framework and receiver is the any sink which receives the interested content from distributional framework. The key point is receiver can divide the subscriber's log for load balance (for example, by topic name, topic name is IP address, table name, key range and so on).
Thus, Hbase only needs to add a client plug-in for the distributional framework (message bus, etc) and define the log title for router (it is static from design).
Normally, the auditing feature is disabled. When user want to enable this feature, he should install the specific third-party router cluster (distributional, scalable framework), then add the cluster address to Hbase configuration. Thus, Hbase cluster can be the subscribers for the router cluster. The next things I think they are all customers' task. (Add receiver, operate the log and so on)
Meanwhile, should we need to support dynamic subscriber and subscriber content in this version?

Instead of building a complex audit data management system, I suggest making a log tap that sends audit trace to syslog, either local or remote. Using syslog to audit machines is fairly common and there are a lot of good syslog systems for a variety of levels of paranoia.

ryan rawson
added a comment - 07/Dec/09 10:26 Instead of building a complex audit data management system, I suggest making a log tap that sends audit trace to syslog, either local or remote. Using syslog to audit machines is fairly common and there are a lot of good syslog systems for a variety of levels of paranoia.

Instead of building a complex audit data management system, I suggest making a log tap that sends audit trace to syslog, either local or remote.

I agree. Via log4j preferably, as we already bundle it. I had a log4j setup once which aggregated into a mysql db via a rsyslog hierarchy. Not that such a thing is necessarily ideal, point is log4j affords a lot of flexibility to the user and is clean and simple to use in the HBase code.

I suggest defining a format for audit logging to conveniently support message routing by regexp.

Andrew Purtell
added a comment - 07/Dec/09 18:05 Instead of building a complex audit data management system, I suggest making a log tap that sends audit trace to syslog, either local or remote.
I agree. Via log4j preferably, as we already bundle it. I had a log4j setup once which aggregated into a mysql db via a rsyslog hierarchy. Not that such a thing is necessarily ideal, point is log4j affords a lot of flexibility to the user and is clean and simple to use in the HBase code.
I suggest defining a format for audit logging to conveniently support message routing by regexp.

(Only choose one):
1. Object Name (Recommended, it is table name for Hbase, if there isn't table in the event, Object name is null. If client query metadata from Zookeeper, use hardcode table name to replace. Such as "Hbase Metadata".)
2. HRegion Identity
3. RegionServer IP
4. Others.....

(Only choose one):
1. Object Name (Recommended, it is table name for Hbase, if there isn't table in the event, Object name is null. If client query metadata from Zookeeper, use hardcode table name to replace. Such as "Hbase Metadata".)
2. HRegion Identity
3. RegionServer IP
4. Others.....