The rumor about GDPR provides to situation that customers receive messages that all existing security solution have “something” for that :). It is good sale strategy but definitely painful tactics for Security Officers with limited budget and hard nut to consume before 25 May, 2018.

Here I would like to review GDPR requirements (AS-IS, because still the European Data Protection Council did not provide certification guideline) from DAM perspective and review the most popular questions tied with DAM in the GDPR context.

DAM data classification engine can identify sensitive information with minimum number of false positive results based on catalog, regular expression, dictionary or custom searches. Achieved results allow to focus on the most critical assets from GDPR perspective.
Classification and database discovery processes executed on schedule rapidly identify changes inside database schema and network assets.
Awareness where sensitive data are located is crucial to confirm efficiency of working processes for data pseudonymization and minimization.
Data lake monitoring can be implemented only in the largest corporation, the knowledge what should be protected is the first step before we spend the limited budget.

Article 24.1 and 24.2 – Data administrator duties
These two articles impose a data protection obligation on the data controller as an auditable and controlled process. If we consider databases, data-warehouses, big data and file repositories the DAM was exactly created for this.

Article 28 – Data processor duties
According to data processing on behalf of data controller (very common situation) the processor must guarantee that the access to PI’s takes place on written administrator authorization. Only data access monitoring can provide real access registry.

Article 30 – Records of processing activities – puts the requirement of the personal information access accountability
Small companies will implement this goal by creating simple registry, based on manual data access description, sometimes enriched by approval workflow.
However the low cost solution is tied with complexity of reporting and lack of non-repudiated registry so you should be considered better mechanism to register access to GDPR protected data.

Article 32.1(d) – Security of processing points vulnerability assessment and system hardening
Popular platforms dealing with vulnerability assessment treat the relational databases harshly. DAM originated from RDBMS world provide rich checks and not only focus on CVE’s and standards (CIS, STIG). Based on years of experience it includes also analysis of SQL traffic, influence the configuration changes on the risk score, authorization snapshots and excessive rights identification.
For most critical systems the DAM extension to existing VA solution in your environment can be very helpful.

Article 33.3(a) – Data breach notification imposes on the subject not only the requirement for immediate notification (3 days).
Breach notification should contain information about scale of the leakage or other type of incident. Only DAM solutions can identify this scope (SQL audit) and minimize damages related with data owners notification and possible fines.
Be aware that:

DLP’s (agent and network) covers only data on workstation and remote acceses. What about local session on servers, are you sure that your DLP provides this same SQL structure and session context analysis as specialized to this purpose DAM solutions?

PIM’s monitor access of privileged users to production systems. They are not aware of SQL syntax and session context. PIM should be considered in GDPR compliance program but the real value is visible when DAM and PIM are integrated together (directly or on SIEM level).

Article 34 – Communication of a personal data breach to the data subjectTechnically DAM solutions are able to parse output of SELECT’s but usability of this functionality is limited. The size of outgoing stream is unpredictable and can lead to situation that monitoring system should have more hardware resources that monitored one (especially on data-warehouse).
However DAM can provide list of SQL instructions executed inside suspicious session and simplify the recognition of the attack range. In case of data modification (DML’s) audited SQL activity can directly identify changes and required remediation.

Does DAM provide protection for applications in GDPR context?

The 3-tier architecture of most applications (web client, application server, data store) anonymizes access to data on silo level. So we cannot identify application user on the SQL level basis only on the database user name which points the account from the pool of connections. However DAM can be configured to extract this information from SQL, JDBC encapsulation message, Web Server logs and other streams. In most cases this kind of integration requires additional implementation effort including in the worst case the application code change.
So, if the application user context is visible on DAM level we can utilize exactly it this same way like described earlier with two objections:

Never kill the session in the pool of connection because content of SQL stream inside belongs to many application users. Killed session will raise exceptions on application layer and reinitialize application session for thousands clients.

Never mask data or rewrite SQL in the pool of connection. Masked data in most cases will have inappropriate format and will lead to application exceptions. Even the masked data will have accepted format (data tokenization) the information receiver will not have idea about this fact and can made business or law decisions based on incorrect information – data masking for application should be implemented on application or presentation layer.
The rewritten SQL inside SQL transaction can change it essence and leads to lost of data consistency.

DAM without application user context is still valuable in this stream to identify anomalies, errors, behavioral fluctuations using quantitative analysis.

Can DAM implement data pseudonymization?

Hmm, we should start from basic question – what pseudonymization is?
I saw many web articles which directly equals this word with data masking but I disagree with this approach.

GDPR defines pseudonymization as the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.

I treat this definition as a consequence and continuation of the data minimization process. Briefly, if PI’s data will be separated from transactional ones (data minimization) the natural relation between these two stores (for example customerID) should be utilized in whole data process flow. Only on demand and with approval the customerID can be translated to form which identify the person.
Can DAM help here? – NOPE.
However the implementation of data minimization and pseudonymization for existing systems is tied with complete application redevelopment – who can afford it? So, only new GDPR-ready applications will come with this kind of functionality on board.

For existing systems we try to avoid the personal information identification using data masking and here DAM can be also helpful:

preproduction (test) data – why DAM instead of data tokenization?

access outside application stream:

masking of SELECT output – most DAM’s provides this functionality but efficiency is the main problem

access from application stream – what I mentioned earlier, application masking should be implemented on application or presentation layer only

Member states implementation of GDPR

GDPR is the regulation and unifies law in the European Union but in few regulation articles we can find some derogations. The good example is Article 9.4 where health records can be managed different way according to member state decision. Does it mean that my decision about scope and type of protection should be postponed until parliament implementation of the law?
Definitely, you should not wait because your data may contain personal information about citizens from another EU country and you can be sued based on his state law.

DAM and “right to be forgotten”

It is common question raised during DAM discussions.
The Article 17 introduces subject right to remove its personal information on request. DAM is monitoring solution and does not cooperate directly with DB engine (to cover SoD requirement) and has no authorization to modify data. So this simple explanation leads to only one correct answer on the titled question – DAM is not component which can be useful in case of implementation the citizen right to be forgotten.
By the way, who will agree with the data removal on system with thousands relations what can lead to lose data consistency which can be discovered one year later? I think that only new systems with fully implemented data minimization and psedonymization principles will be able to identify this right easy way. If all personal information are separated from transactions, the PI’s removal or simple encryption will provide suitable solution without any additional effort.

Administrative fines mantra

Have you seen any GDPR related article without remark about “huge fines up to 20 billions euro or 4% company turnover”?
Do you believe that your government will decide to kill local, average or small companies because of GDPR?
If your answers are negative you should consider much more interesting case. In the Article 82 the GDPR introduces citizen right to compensation with a body of appeal attached directly to EU Council. Many organizations seriously consider the costs of civil actions and its possible influence on business.

New type of ransomware
Standard ransomware based on data encryption is not efficient because victims pay rarely (privates are not able to pay large amount of money, backup exists, block of bitcoin account).
With GDPR the stolen data can be a simple way to force ransom from an organization wishing to avoid penalties and massive civil actions.
I think that data gathered actually from unaware companies are stored somewhere in the darknet to be starting package for new type of “business” next year. 😦

Summary:

DAM definitely should be considered as the important element of any GPDR compliance program because of:

PI processing monitoring

data classification

data masking

unauthorized data access protection

vulnerability assessment

and achieves the best value when it is integrated with PIM, IAM, Encryption and SIEM.

GPU installation

Similar to most patches it has to be installed from top to down within existing Guardium domain:

Central Manager

Backup Central Manager (do synchronization)

Aggregators

Collectors

The GPU 200 requires that the healthcheck patch 9997 is installed. 10.1.2 update can be installed on the top of any version of Guardium 10.

GPU will reboot appliance. Existing VM Tools will be automatically assign to new RedHat kernel.

Note: Consider appliance rebuild in case to use EXT-4 filesystems introduced with new ISO installer

View/Edit Mode in Dashboards

Now each dashboard opened in the GUI session works in View mode.

Dashboard in View mode

The view mode is useful in order to better use the GUI spacefordata, especially when dashboard is informational only.
From my point of view the Guardium administrators will not happy with that because it is not ergonomic in case of data investigation. However if dashboard has been switched to Edit mode this settings are saved in the current session.

Much more usable would be the possibility to store dashboard settings permanently per dashboard.

Owing to Managed Unit Groups it is possible to create dynamic views filtered by group of appliances or focus on selected one. Statistics contain reference to Analyzer and Logger queues, buffer space, memory and disk usage and sniffer restarts.
Additionally Events timeline report presents discovered issues, it can be enriched by alerts gathered from appliances. The alert definition contains additional fields to set up result for dashboard:

Alert defintion

Data Classification Engine – task parallelization

In large environment with hundreds of databases the Guardium classification engine limitation to execute only one job in queue was very painful. Current version allows parallelize this tasks on appliance. In most cases the classification is managed on aggregators or central manager where CPU utilization is on low level, so now with new flag configured by GRDAPI we can faster and more frequently review data content.

grdapi set_classification_concurrency_limit limit=<your_limit>

The maximum limit has to be lower than 100 and not higher that numbers of available on appliance CPU cores multiplied by 2.

If you created classification policy based on many databases like this:

Classification datasources

you should change it to set of separate policies executed concurrently:

Separated datasources to different policies

Then if you start a few classification processes together they will executed parallel:

Classification Job Queue

File Activity Monitoring update

Policy builder for Files allows to create many actions per monitored resource. Now we can define different behavior in case of read, modify of file deletion.

File policy rule

The UID chain field from Session entity provides the context of user and process which is responsible for file operation.

File Activity Report

At least we have File Activity reports available out of the box

File Activity Reports

but I suggest to create the clone of the File Activities report and sort values in descending order using timestamp and sqlid (session timestamp does not ensure that events will displayed in correct order)

File Activity query definition

New appliance installer

New ISO installer simplifies the installation process of new appliances (no need to apply GPU 100 and 200). It also removes problem with new GDP licenses support on appliance below GPU 100.

The 10.1.2 installer creates EXT-4 linux filesystems and extends maximum size of supported storage. If you would like to use larger disks on the appliance the rebuild procedure is needed (GPU200 does not convert EXT-3 to EXT-4).

FSM driver deactivation on Linux/Unix

New STAP’s for Linux/Unix supports support new TAP section parameter in guard_tap.ini:

FAM_ENABLE={0|1}

where 0 means that FSM driver is not activated.

Only manual guard_tap.ini modification is supported at this moment.

Outlier detection (behavioral analysis) – new capabilities

Outlier detection is available for file activity now. On the appliance only one, DAM or FAM, functionality can be activated.

Behavioral analysis can be switched on aggregators. It allows analyze user behavior from wider view.

View, reports and new anomaly types introduced – significant update.

Entitlement Optimization

This GPU introduces completely new user authorizations analysis engine. Besides the old Entitlement Reports we can utilize the Entitlement Optimization tool which retrieves user roles and privileges based on direct connection to database and identified DDL commands. The tool presents the changes in the the database authorizations,

Data in-sight

Summary: Important step to manage data access monitoring easier and more transparent for non-technical users. GPU mainly focused on extensions exiting functionalities and make them more usable and stable.

Central Management is one of the key functionality which simplifies Guardium implementation and lowers TCO. Possibility to patch, update, reconfigure and report across hundreds monitored databases is strong advantage.

Guardium implements this feature by selection one of the aggregators as a Central Manager (CM). All other Guardium infrastructure units communicate with it and synchronize information. However the CM inaccessibility disrupts this process and does not allow normal environment management. To cover these problems from version 9 the Guardium introduced the CM backup feature.

It covers two main problems:

planned CM shutdown (patching, upgrade)

CM failure

The CM backup configuration and switching between primary and secondary units need to be managed correctly to avoid problems on collector and aggregator layer.

General consideration for backup CM:

main CM (primary) and CM backup (secondary) need to be accessible by all appliances in the administration domain

quick search and outlier detection configuration should be checked after changes on CM level

switching between CM’s sometimes requires reassigning licenses

Note: Examples in this article refer to simple Guardium infrastructure with 4 units:

CM Primary (cmp, 192.168.0.80)

CM Backup (cmb, 192.168.0.79)

Collector 2 (coll2, 192.168.0.82)

Collector 3 (coll3, 192.168.0.83)

CM Backup registration

This procedure sets one of the aggregators belonging to Guardium management domain as a backup CM and sends this information to all units.

Only aggregator with this same patch level as primary CM can be defined as backup CM. It means that the same general, hotfix, sniffer and security patches should be installed on both machines.

Patch list on CM primary (cmp)

Patch list on aggregator (cmb)

Screenshots above present that both units have exactly this same patches on board. If the patch level will not be this same the aggregator cannot be promoted to backup CM role.

Note: Patch level refers to this same version of Guardium services, MySQL, Redhat and sniffer. If one unit was patched in sequence – 1,4,20,31,34 and the second – 20,31,34 they are on this same patch level because patches 1 and 4 are included in patch 20

To point aggregator as a backup CM on primary CM go to Manage->Central Management->Central Management and push Designate Backup CM button

Central Management view (cmp)

The pop-up window will display all aggregators which covers this same patch level with CM. Then select an aggregator and push Apply button

backup CM selection (cmp)

Simple message will inform that task tied with backup CM started and process can be monitored

Unfortunately “Guardium Monitor” dashboard does not exist in version 10. Simple summary of this process can be monitored in “Aggregation/Archive Log” or you can create report without any filters to see all messages.

From this perspective the right thing to be considered synchronization repeated every few hours. In case of planned downtime of the CM I suggest invoke synchronization manually using Run Once Now button.

If the process finished successfully on the all units except backup CM the information about HA configuration will visible in Managed Unit list – IP addresses both CM’s

If promotion is related to CM failure, the old CM after restart will communicate with new one and refresh information about current status of administration domain- after few minutes the list of managed units will be cleared too.

Guardium does not provide automatic role replacement between CM’s. It requires sequence of steps.

To remove CM functionality from orphaned CM the CLI command need to be executed

delete unit type manager

It changes the appliance configuration to standalone aggregator. Then we can join it to administration domain again but this time the domain is managed by new CM (below example of registration from CLI on cmp)

register management <new_CM_ip_address> 8443

Now the old CM has aggregation function and can be delegated to get backup CM role

backup CM selection

After this task both CM’s have reversed roles

Units patching process

Guardium administration tasks will require CM displacement only in case of the critical situation. There is no need to switch to backup CM in case of standard patching (especially when hundreds appliances will switch between CM’s). Even patch forces system reboot or stop critical services on updated unit for minutes, the temporary unavailability of unit will not stop any crucial Guardium environment functions (except temporary managed units portal unavailability). So realistic patching process should look like:

patch CM

patch CM backup

synchronize CM and CM backup

patch other appliances in the CM administration domain.

“Split brain” situation management

Primary CM failure is not managed automatically. However this situation will be notified on all nodes during access to portal

I suggest use your existing IT monitoring system to check health of CM units using SNMP or other existing Guardium interfaces to identify problems faster and invoke new CM promotion remotely by GRDAPI.

Standard flow for manage CM failure is:

Analyze CM failure

If system can be restored do that instead of switch to CM Backup (especially in large environments)

If system cannot be restored:

Promote backup CM to primary role

Setup another aggregator as CM backup

Despite limited portal functionality on orphaned nodes the backup CM allows promote it also from GUI

I have tested two “split brain” scenarios (in small test conditions):

CM failure and reassign it to backup CM

start the stopped collector when backup CM has been promoted and old one is still unavailable

In both cases after few minutes primary CM and collector identified situation and correctly managed connection to infrastructure.

Summary:

Central Manager HA configuration is an important feature to avoid breaks in the monitoring. Its design and implementation is good however some issues with license management and new quick search features should be covered in new releases.

Tip: Policy is not directly related with database where it will be executed. Use for name the literal which describe the analysis logic (for example: Find Sensitive Data in SAP environments)

Tip: Category and Classification labels are element of event content generated by Action rules. Use them to simplify the distinction events on this level

Info: List of categories is managed by Categories group (Group Type: Category)

Select Category, define Classification literal and Push Apply button

New Classification Policy

then push the activated Edit Rules button (Roles allows to define access to this policy by defined group of users, Add Comments provides possibility to add remarks in case of policy change)

New Policy

Classification Policy Rules manages the current list of rules inside particular policy. We will focus on this in the another section of this article

List of classification rules

Classification policy management

The Classification Policy Finder window displays list all existing policies. For each policy we can add comment or go to rules edition

Policy list

Four icons above policy listallow add new policy, edit, create copy or remove selected one respectively. Policy copying opens Classification Policy Clone window where name of the source policy is preceded by Copy of literal. Save Clone button adds new policy to the list

Policy clone

We can remove policy which is not attached to classification process. In case of removal policy related with process a message will be displayedIn this situation you must first remove the process related with this policy or change policy reference in process to another one.

Classification policy rules in detail

Each rule contains some identification fields: Name, Category, Classification and Description. Classification rule is an atomic element and his name should strictly defines its functionality (for example: e-mail address, US zip code). Classification Rule Type defines type of data which will be analyzed using this rule

Rule description and type selection

In most cases our DAM classification policies will refer to Search for Data rule.

Order of rules in the policy can be changed easily, using move up/move down icons. These icons are active when policy contains minimum two rules

Policy list

The standard policy behavioris the processingof rulesfrom top todown and policy makes verdict when some rule matches pattern. If rule is matched, the rest of them is not evaluated for currently object. Additional rule parameters can change this concept.

Buttons Unselect All and Select All allow group or ungroup rules in the view – used for rules removal (Delete Selected button).

Collapse All and Expand All help with fast review all rules.

Rule parameters review

Logically we can split parameters into 3 groups:

search scope

pattern

search behaviour

Search scope parameters

Table Type – defines types of objects included in the analysis:

Tables

Views (consider the performance influence on production environment in case of existence a huge number of unused and complex views)

Synonyms (not available for some database types)

System Tables (includes system objects)

Table Name Like – limits scope of search to defined object name pattern. Two wildcards allowed – % means string of any length, _ refers to one sign. Examples:

CARS – object with exact name CARS

C% – object names started from C

CAR_ – object names started from CAR and ended with any other sign (CARS, CARO, CARP)

If this parameter is empty all tables are analyzed.

Data Type – defines data type of columns which will be analyzed. They correspond with supported data type inside particular database engine (binary objects type are not analyzed at all)

Date

Number

Text

Column Name Like – limits scope to column names covered by defined pattern. Two wildcards allowed: % and _. Empty fields refer to all columns in the table.

Minimum Length, Maximum Length – refer to defined size of column (is not related with length of data stored in particular row). Sometimes used together to point the particular column size. Good practice is definition of minimum length to reduce number of analyzed columns when the minimum length of searched value can be assumed (for example 16 characters in credit card number).

In this example credit cards have been detected in 3 columns in dbo and glo schemas

Classification report

Rule modification excludes glo schema from search scope

Classification rule and schema exclusion group

and changes the classification results (lack any objects from glo schema)

Classification report

Exclude Table – restricts list of scanned tables defined by data source (if Table Name Like parameter is used in rule it is evaluated on the list tables created after Exclude Table evaluation). Exclusions defined by group reference (Application Type – Classifier or Public, Group Type – Object).

The classification returns 3 columns in 2 tables

Classification report

and after rule modification which excludes CC_NOK table

Classification rule and table exclusion group

the results report contains only two records from one table

Classification report

Exclude Table Column – restricts list of scanned columns defined by data source (if Column Name Like parameter is used in rule it is evaluated on the column list created after Exclude Table Column evaluation). Exclusions defined by group reference (Application Type – Classifier or Public, Group Type – Object/Field).

The classification returns 3 columns in including table CC_1 with column CC

Classification report

and after rule modification which excludes CC column from CC_1 table

Classification rule and table column exclusion group

excluded column disappeared from results report

Classification report

Limitation: The wildcards % and _ are prohibited in the all exclusion groups

Pattern parameters

Info: Only one pattern parameter can be used in a rule. Behavioral parameters can provide functionality to analyze this same column using different patterns.

Search Like – simple pattern based on two wildcards (% and _). Useful for constants, specific values or the part a more complex analysis based on set of rules.

Guardium offers also special pattern tests for limited types of data related to parity or sumcheck control. For example check of credit card number according Luhn algorithm. This functionality can be switched on using special naming of classification rule – name has to start from guardium://CREDIT_CARD string.

For example in the two tables CC_OK and CC_NOK

CC_OK

CC_NOK

4556237137622336

4556237137622335

4929697443528339

4929697443528338

3484057858101867

3484057858101866

4824520549635491

4824520549635490

3767010431320650

3767010431320659

4532861697794380

4532861697794389

5352437717676479

5352437717676478

4539522376654625

4539522376654624

5547728204654151

5547728204654150

5292779270461374

5292779270461373

we have strings represent 16-long numbers. Table CC_OK contains credit cards with correct checksum according Luhn algorithm in the opposition to table CC_NOK.

The policy based only on regular expression only

Find Credit Card (regexp only)

discovers both tables as a credit card numbers

Classification process structure

For policy with additional check the Luhn algorithm conformity

Find Credit Card (with checksum)

only CC_OK table has been recognized as an object with valid credit card numbers

Evaluation Name – the most powerful option in the classification analysis. It allows to create own validation function coded in Java (1.7 in G10 initial release) and implement any checks which cannot be covered by regular expressions.

For example we would like to find banking account numbers in IBAN notation (widely used in Europe) with control of sumcheck (modulo 97 from transformed number). This task cannot be managed by regular expression at all.

Compare to Values in SQL – allows compare values in the sample with respect to the dictionary defined by SQL query.

Limitation: Dictionary has to exist on database where classification process is executed

For example we would like to find columns which contain short name of US states. The table dbo.CC_MAIL_STATE contains STATE column

Inside this same database engine exist table glo.STATES with list all states

This classification rule uses the list defined by SQL instruction:

SELECT short_name FROM Glottery.glo.States WHERE country=1

Classification rule

and identifies STATE column

Classification results

Please notice that classification process worked on CLEXAMPLES database only (scope defined by data source) and the dictionary source table is not in the result because is located in GLOTTERY database.

Use SQL instruction here has some limitations:

must start from SELECT (you cannot send DML or DDL)

should not contain semi-colon (you cannot group instructions)

referred object must use fully qualified name (for example database.schema.object for MS SQL)

Compare to Values in Group – compares column values to the list stored in Guardium group. The group must belong to Application Type PUBLIC or CLASSIFIER and Group Type OBJECTS. Small icon at the right side of group list allows create or modify dictionary

Search behavior parameters

For example we have two tables: CC_MAIL with credit cards and mail adresses

and the table CC_NAME where user names exist instead of mail address

If we will create two independent rules looking for credit card and mail address

Classification policy

the classification process returns only CC columns from both tables

Classification results

because first rule matched table and second one was not evaluated.

This time Continue on Match flag has been switched on

Rules list

and all credit card and mail columns has been identified

Classification results

In next policy both rules has been updated with this same Marker – CC_AND_MAIL

Rules list

and classification policy returns credit card and mail address columns from CC_MAIL table because only this table contains this patterns together

Classification process structure

Hit Percentage – determines thepercentagethreshold of valuesin a samplethatmustmeet thepatternthat therulewill be classified assatisfied. If this field is empty the column will be classified even only one value in the sample matches the pattern.

Important: This parameter allows minimize number of false positive results in process of data classification.

The use of this parameter also adds in the results the information about the number of unique values in the sample that fulfill the requirements of the rule

Classification results

Show Unique Values, Unique Value Mask – attach the matched values to classification report. Only unique values are displayed and maximum 2000 of them per column can be included in the report

Classification rule

Classification report

If the attached values have sensitive nature that Unique Value Mask field allows to mask this data.

Mask must be regular expression which cover expected values and strictly defines the part which should be visible. Regular expression builder is also available to define and check its correctness. Part of regexp inside brackets () defines content of value which will be displayed in the report (for example .*([0-9]{4})[ ]{0,20}$ means that only last four meaningful digits will be displayed)

Classification rule

Classification report

Continue on Match, One match per column – the classification process flow focuses default on the identification of tables with sensitive data. Please consider the table with credit card, mail address and state

and Classification Policy with 4 rules (Continue on Match is switched off)

Classification policy

Only one column from CC_MAIL_STATE table has been identified

Classification results

because first rule covered requirements and policy shift to next table. To change this situation the Continue on Match flags must be switched in the rule sequence on

Classification policy

what leads to expected behavior. All sensitive columns in CC_MAIL_STATE table have been discovered

Classification results

You should also notice that STATE column has been matched two times because two rules meet the requirements on it (what was expected here). However we can suppress multiple matching on one column using One match per column flag. To do that mark it in the first rule in the sequence worked on that column

Classification policy

Find State PL rule has not been matched the STATE column this time

Classification report

Tip: In most cases the sensitive data classification procedure should point all columns where this type of data reside and Continue on Match flag should be switched for all rules in policy on.

Relationship discovery

Using simple trick we can also identify relationship between source data and other objects.

I have source table with users stored in glottery.glo.users table

glottery.glo.users table

where the primary key is id column and correct reference to users from other tables should refer to this value. I have created a rule

Rule classification

looking for numeric column with values must be matched with the list of id from source table (SELECT id FROM glottery.glo.users WHERE id<>0 AND id<>1). Clause WHERE omits values 0 and 1 which can be logical values in some referential tables. I have set the Hit Percentage on very high level 98% to ensure real relationship between analyzed object and users table.

Results clearly show that Users table is referred in 6 other tables

Classification results

Summary:

Guardium provides many different techniques to identify sensitive data. Good implementation relies on that. If we know where critical data resides the real time policies, correlations alerts, SIEM events will work correctly and point real threats.

Classification process setup flows

from scratch – each element created separately, wider elements can invoke more specialized tasks. Useful for people with good Guardium skills, allows configure all existing classification features (Discover->Classification->Classification Policy Builder, Discover->Classification->Classification Process Builder)

end-to-end – streamline process facilitates and making easier the classification process creation and its automation. Some features are not available, can be edited later using first scenario (Discover->Classification->Classification Sensitive Data)

In rule view insert its name and select from Rule Type list – Search for Data

Rule definition

it will refresh the view and then put in Search Expression field the pattern:

^[0-9]{16}[ ]{0,20}$

which is simple representation of credit card number (16 digits, trailed by maximum 20 spaces). Then save rule using Apply button

Rule definition

we will return to the rule list with new created oneClose the pop-up window. New created policy is not refreshed in process view that we need to reopen process creation window. Select again Discover->Classifications->Classification Process Builder, put name and select our policy – Find CC in Tables and press Add Datasource button

Policy definition

another pop-up window – Datasource Finder – displays list of existing database definitions. Use + icon to add a new one