Access Control List(ACL) is very common terminology and not confined precisely within OBIEE. Several software use ACL to define security object permissions.An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects. We have seen ACL when we edit Objects permission or open Catalog object using Catalog Manager. I have noticed “preserve ACL” in OBIEE 10g Catalog Manager if you want to apply permission across objects. Also noticed ACL again on 11g while we embarked a very strange/weird and complicated issue.

We have OBIEE 11g .7 version 140415 running on production since launch past 5 months. Recently couple of customer raised concern about page visibility for different kind of subscription. We have customer could use multiple subscription which can be switched from a portal and based on that switch specific page should have to be visible. However we have experience a issue when a user logged in and seeing Dashboard A and logout and relogin again with different subscription Dashboard A still visible while Dashboard B should have to be visible.

Initially how you attack this problem ? I know what you are thinking of but we have very complicated system with several layers before you land to dashboard page. We have One portal which redirects to IDP (performing TCping Authentication) ,then control goes on Horizontal Cluster nodes via Load Balancer and then control goes to Analytics. So there are 3 way channels before you are landing to OBIEE page. On top of this we have deprecated 11g security system running initialization block. I know this is damn bad and Oracle recommended but we have to live with the crap because Oracle said this can still be functional. So in summary we have no Application Roles / Groups setup on EM and Console and all object based permission defined under Catalog is legacy Catalog Group permission.

So there are various aspect and point we need to carefully verify before we attack this problem. I started debugging this issue from Catalog Object level permission first , nothing seems weird but did little bit tweak here and there and it didn’t resolve the issue.

We thought it could be SSO Ping Auth issue which is bypassing the RPD security and not getting GROUPS variable populated right way. So we added WEBGROUPS and ROLES system variables in RPD and enabled all level of trace on EM and Console on oracle bi security objects including the most granular level of LOGLevel trace in instanceconfig.xml file.

We have found after logout of initial session and login new session is not picking up the permission correctly and it retains the old permission on object.

There are security masking logic on Read , Write and All permissions in each level of Catalog object while can be traceable in sawlog file which clearly depicts the issue ! But unfortunately we don’t have smoking gun yet to resolve the problem.

A search in OTN and googling around doesn’t seem to be matching to right use case ! this was a frustrating issue which bother for 4 days until actual fix has been found and Finally we have isolated our environment from SSO , Cluster , Load balancer everything to segregate the root cause of the issue. We came to know its OBIEE 11g issue for sure. We have tried several areas of tweak to fix this issue mostly on file system level , cleaning cache , clean volatileuserdata, removing cacheduserinfo, removing that user content and re-instate, GUID refresh and nothing works and we were almost out of ideas what to do next 😦 😦

FINALLY there is an idea triggered on my mind if there is any configuration can be done on refresh level which could potentially refresh dashboard object including the permission could resolve the issue. And I realize only Instanceconfig.xml is a place where you can declare on such parameter. So in search of some parameter I found below:

This will reload catalog object permission on 1 minutes interval . So potentially this could fix my issue.

And it Voila… it fixed my problem … Now right user getting correct permissions on multiple subscription switch. But still its not end of the game ! WHY ? Yes, we have seen catalog object refreshing in 1 minutes interval but my problem reoccur if multiple login/logout happens within 1 minute of interval …. Sad !! So this is not going to be a solution. Also reload catalog object frequently is not an good idea as SAW server does a hard IO work each time this happens to crawl across catalog and refresh object permission. Also there is no instanceconfig.xml parameter exist to lowered it down to seconds level 😦

Adding this parameter is as good as “Reload server metadata trigged” by Admin from analytics Admin link. So I was in search how to invoke this automatically rather than required on demand.

Investing more energy on that reveals what is this object permission and seems closely related to ACL and we came to know below details how OBIEE handle the users in Catalog:

The catalog is designed to scale to thousands of concurrent users. To achieve this scaling, the catalog adheres to the following guidelines:

The average user typically only reads from the catalog and rarely, if ever, writes to it. In Release 11g, each user is constantly and automatically updating his or her Most Recently Used file, but each user’s “read” operations still far outweigh the user’s “writes” operations. Therefore, the read-to-write ratio is typically at least 100 to 1.

While a locking mechanism guarantees that only one user can write to an object at a time, it is rare for multiple users to attempt to write simultaneously to the same object. A feature called “lazy locking” allows users to continue reading an object even when another user is updating that object.

Modern file systems cache “small” files directly inside the directory record, such that reading any information on a directory simultaneously loads all small files directly into the operating system’s memory cache. Therefore, it is good practice to keep files in the catalog “small,” especially the frequently “read” .atr metadata files. When these metadata files remain small, then all the .atr files in a directory are loaded into memory with one physical hard disk read. Every file that exceeds the “small” threshold adds another physical hard disk read, which can cause a 100% degradation for each large file. In other words, use care when considering storing arbitrary “Properties” in .atr files.

Reading an object’s .atr metadata file using NFS is far slower than reading it directly from a local disk. For this reason, Presentation Services additionally caches all .atr files internally. This cache can become briefly “stale” when another node in the cluster writes data to the file that is newer than the data that is cached by the current node. Therefore, all nodes are refreshed according to the MaxAgeMinutes element in the instanceconfig.xml, whose default for a cluster is 5 minutes. This default setting commonly achieves the best trade-off between the possibility of stale data and the known performance impact. (The default for an environment without clusters is 60 minutes.)

Ours are pretty similar configuration of having millions of .atr file (as we have very large user base in production) in NAS storage continuously accessed by Sawserver catalog crawler.

Looking much on ACL on OTN reveals not same but similar kind of issue and I came to know about bunch of very important config parameters listed at the end of this blog.

Now I have full fledged smoking gun which could potentially resolve my issue which is below:

So I applied in instanceconfig.xml to see the results and unfortunately sawserver was not started as this is unrecognized tag ! When I see OTN carefully and I found this can be added only 140715 version on wards and not less . So we have asked Oracle to provide patch for 140415. After following up Oracle with Sev 1 issue with several escalation we finally got the fix . We applied it , add this params in instanceconfig and it resolves the issue…. 🙂 🙂 Oracle provided the fix with 3 files , one xsd and 2 saw binary files (.so files) . So the long running nightmare is over !!! 🙂 I am sooo happy…

So we know NOW what is ACL Cache and how it could potentially behave weird and cause lot of trouble to our life and we know what should be the way to get rid of the problem around it. See below some of the other important reference on Cache under instanceconfig.xml file. Recommend is if you Disable ACL cache which Oracle confirmed has no performance impact.

<!– this cache is NOT related to the security stack It is only for the account info in the webcat itself –><Accounts><CleanupFrequencyMinutes>3</CleanupFrequencyMinutes><MaxAgeMinutes>3</MaxAgeMinutes></Accounts>

<!– this cache is NOT related to the security stack It is only for the account info in the webcat itself –><AccountIndex><CleanupFrequencyMinutes>3</CleanupFrequencyMinutes><MaxAgeMinutes>3</MaxAgeMinutes></AccountIndex>

<!– This CatalogAttributes cache contains all webcat objects’ metadata (the owner, permissions, properties, attributes etc).The internal permissions have a their own cache as to who is allowed to access the webcat object.

So to cleanup the internal permissions cache, this CatalogAttributescache must also be cleaned. –>