The following blog discusses Portal v8
installation best practices, things to avoid and some general
information:

General Information on IIM:

IBM Installation Manager replaces the ISMP installer and Portal Update Installer (PUI) and manages life cycle for WAS 8 and WP 8. IIM can install and uninstall core products, update or rollback fixpacks and fixes, modify and remove features

IIM version 1.5.2 or higher required

Silent install supported on all platforms. GUI mode supported ONLY on Windows/Unix platforms. Command line supported ONLY on z/OS.

Can record a response file without actually installing the product either by manually editing a sample response file that comes with the media or record one using IIM.

IIM is composed of a binary directory (from where you run the product) and a data directory (logs are recorded here).

IIM cleans up everything after a failed install. Need to create a log.properties with debug lines to pause IIM on a failure. This will allow for time to collect logs.

When viewing IIM logs for a Portal 8 install, use a browser to open the index.xml file from the IIM logs location for better formatting of the logs. You will need the log.xsl file in the same directory.

To uninstall IIM, for GUI navigate to IM_DATA_DIR/uninstall/uninstall, for console navigate to IM_DATA_DIR/uninstall/uninstallc

Can install to an existing WAS for any content offering. Express MUST install its own WAS.

Using an advanced configuration, can change such options as context root, default home, profile name, etc.

Content offerings (Enable, Extend, WCM and WCM SE) are an extended offering of WP 8 Server and cannot be installed without a WP 8 Server.

WAS ND binary cannot be shared between multiple WP 8 Server offerings

Portal v8 Regular Offerings (Server,
Enable, Extend, WCM, WCM SE)

Best Practices:

A repository contains the required
parts for installation. For Portal, one needs WAS, Portal, content
offering, and Setup repositories.

Single set of Portal binaries for
each content offering (Server, Enable, Extend, WCM, etc.)

Provide up-sell from server to WCM
offering without reinstall

Select all required iFixes for WAS
8.0.0.3 when installing Portal 8

Extract all electronic images to
the same directory.

Setup disk is unique to each
offering. It's different for Server, Enable, Extend, etc. If
installing all at the same time, make sure to have the correct
assembly.

There is no tracing or logging
with launchpad.

Always choose the repository.config from the setup disk, <Setup_Disk_location>/eimage, when chosen, there are no prompts for individual disks over choosing the repository.config from individual offerings (WAS, Portal, Portal Offering).

Every now and then, users with the necessary access(authors,editors,etc) accidentally restart the workflow on content . This can cause unintended consequences often causing the content item go through the whole workflow process again to get published . The following property setting can control access to the button and hide it from users who would otherwise be able to invoke it .

workflowrestart.requires.manager = true

wcmconfigservice.properties Or via the WCM WCMConfigService service.

Notes on the settings Setting this to true means editors will not see the button and only managers and above will. This means the user will need delete access to restart workflow.It also adds the 'restart workflow' to those actions that can be hidden at the authoring template. This provides the user more control over the restart workflow button allowing the restart workflow to be hidden from authors and approvers.

I would like to take a high level overview of the various programming entry points that WebSphere Portal provides to allow you access into the internals of the Portal to both make changes as well as request information about various aspects of the Portal.

The first thing to understand is that you will almost always have to get a home object which represents the starting point for interactions in that package. Such that if you wanted information on access control you would use a JNDI lookup to retrieve AccessControlHome, or for PumaHome if you want to get information on the Portal User interface. Once you have the Home you will need to get either the Model or the Provider object depending on the needs of the code. Next we will look at the most common packages and their uses.

com.ibm.portal.acThis package will contain information regarding access control information on various resources. You can check the ACL data on any item that supports the Identifiable interface(which is most everything in portal). So this would include Pages, Portlets, Containers, whatever you might have. you can pass in and either get the entire role data set or check to see if certain users have a specific permission, or if the current user has a specific permission. You could use this to customize visible data in your portlets, and on your pages depending on what was needed for the environment.

com.ibm.portal.authThis next package is useful for customizing the Portal login/logout mechanism. These will only be triggered when you try to access a Portal protected resources. This replaces the customization that used to be made in the LoginUserAuth classes by now adding them to the Portal login filter chain. One thing to be aware of is the difference between implicit and explicit login/logout. The Implicit login is where a user us already authenticated to WebSphere AppServer but then attempts to reach a portal resource. They will then go through the Portal implicit login chain.

com.ibm.portal.modelThis package has many providers that will see some of the most use out of the other ones. With this package you can retrieve a list of languages the Portal supports, the markup that the Portal supports, information about the navigation, content, navigation selection, portlets, themes, skins and helpers to localize resources when presenting them to users. All of these models will be read only while the subpackage com.ibm.portal.model.controller will give you write access to all of these. All of these will be scoped to the user making the request. With the controller package you can create pages, add portlets to the page and remove them, set skins on the page, alter properties about all of these including ordinality. A developerworks article is in the plan for this from support to show examples of the most common questions using the controller SPI

com.ibm.portal.navigationThe providers in this package will allow you to traverse the tree for navigation of the current user. they can be useful in creating breadcrumbs and creating your own custom navigation portlets. These along with the com.ibm.portal.state package can be used to create links to do a grand variety of actions within portal. Please see Advanced URL helpers for a detailed sample on using the state package to create links to various Portal resources.

com.ibm.portal.portlet and com.ibm.portal.portletmodelThese two packages provide a lot of information concerning the various layers of portlets they can be used to make changes to the portlet configuration and also can retrieve preferences layers for review.

com.ibm.portal.umAlso known as PUMA this is the usermanagment part of the SPI, with it you can do user management, and queries. For more on PUMA please see PUMA whitepaper

This covers the majority of the SPIs that customers use, and can do 90% of what you need to do in your custom Portal environment for management and administration.

Introduction
HTTP is a stateless protocol. Many web designers have utilized a combination of cookies, parameters in URLs, and other techniques to preserve state information for users visiting their websites. WebSphere Portal as a part of its design simplifies the stateless nature of HTTP and automatically encodes state information into its URLs. WebSphere Portal also encodes navigation information into its URLs (e.g. language, back button behavior, etc.). Decoding techniques do exist to decipher the URLs but by default WebSphere Portal encoded URLs cannot be viewed in plaintext and easily determine what information the URL contains.

An unfortunate side effect of encoding all of this information into the URLs is that WebSphere Portal URLs tend to be long and ugly out of the box. For example, if you access the default Portal page by its stateless URL - e.g. http://myportalsite.ibm.com:10039/wps/portal, this is automatically redirected to a stateful URL such as:
http://myportalsite.ibm.com:10039/wps/portal/!ut/p/a0/04_Sj9CPykssy0xPLMnMz0vMAfIj83Kt8jNTrMoLivV88tMz8_QLsh0VAVTAxWw!/

This article will discuss techniques to keep the Portal URLs short and clean.

2015.10.30 EDIT: Portal 8.5 cumulative fix 08 has introduced several configuration options which simplifies creating stateless URLs. See the 8.5 Infocenter for more details. This blog entry will be updated in the near future to discuss how the new parameters introduced in CF08 will interact with other configuration settings noted in this blog entry.

Out of the Box Settings

For WebSphere Portal 6.1 and later the out of the box settings will generally result in a stateful URLs. There are two primary factors that play into whether or not a stateful URL will be display:

1) The type of page - standard page or static page
2) The type of URL - a URL mapping, a Friendly URL, or a Vanity URL (8.5 and newer only)

For #1, Portal 6.1 and earlier utilized standard pages for its default page type. These were shipped with pre-defined page layout templates provided by IBM code and once portlets were added to the page they largely didn't change. Over time it was determined a new design was needed to allow for more flexibility, hence the use of static pages. Portal 7.0 and later utilized static pages for its default page type. Static pages offers many advantages over standard pages, one in particular is "You can include portlets as dynamic elements and containers as placeholders for portlets in your pages. You can display these portlets by using server side aggregation, AJAX". The general recommendation would be to utilize static pages in your WebSphere Portal design.

For #2, Portal 6.0 and earlier offered URL Mappings as a short and clean means of managing URLs. URL Mappings were not dependent on the Portal page hierarchy, hence, a URL Mapping of /wps/myportal/thisblogentry may resolve to /wps/myportal/US/en/IBM/Industry/Cloud/Solutions/WebSphere/Portal/Support/This/Blog/Entry. Over time management of URL mappings becomes difficult for large Portal sites. Portal 6.1 began offering a new feature called Friendly URLs. Friendly URLs are dependent on the Portal page hierarchy and as a result allowed for simplified management of URLs. URL Mappings and Friendly URLs are for the most part mutually exclusive ... you should try to design your Portal site with one of the two, but not both, and the recommendation would be to utilize Friendly URLs given URL mappings are now deprecated with Portal 8.5. Finally, in Portal 8.5 a new type of URL - called a vanity URL - was introduced. Vanity URLs are a complement to Friendly URLs. Friendly URLs are still available and are dependent on the site hierarchy, while vanity URLs are not dependent on the site hierarchy. Using the same example above, a vanity URL of /wps/vanityurl/thisblogentry could be created that would resolve to an actual page in the Portal page hierarchy. Vanity URLs are advantageous in cases where we want a VERY short and clean URL that does not contain the entire page hierarchy information, such as a Friendly URL. For purposes of this blog entry, we will assume every page in your page hierarchy has either a Friendly URL or URL Mapping to allow for stateless URLs. If your Portal site does not have either a Friendly URL or URL Mapping on it and you implement the steps below, some pages may no longer be accessible.

Returning to our stateful vs. stateless discussion, we have 2 * 3 = 6 possible combinations above. With the out of the box settings, here is what should happen:

Note in particular that the fifth combination - Static Page + Friendly URL - is the recommended settings for WebSphere Portal. While an end user may be able to bookmark a short and clean url, e.g. /wps/myportal/Home/somefriendyurl, WebSphere Portal in its design will automatically 302 redirect the user to the stateful URL. The page will render successfully, but the URL in the address bar looks long and ugly. In the next section we'll discuss how to modify that behavior so it stays short and clean.

Primary references

1) The Portal 8.0 and 8.5 Infocenters offer some excellent details on how to make Portal stateless. However, there are multiple settings noted in the Infocenter and not all settings may be required for all Portal deployments. We'll explore each setting one by one.

*Author technical note: I had to manually create a new URL mapping via an XMLAccess script for the Standard Page + URL Mapping test.

Change #2 - friendly.redirect.enabled

2) Our second setting of interest is friendly.redirect.enabled noted on the following Infocenter page. To summarize the effect this setting has on your Portal environment:

- true (default) = send redirect to url with Friendly URL if comes in stateful WITHOUT friendly URL context send it to a Friendly URL ... hasBaseURL will then determine if it stays stateless, or, redirects to Friendly URL + stateful --> GOOD for Friendly URLs

- false = no redirect is sent "a request for a URL mapping will be served an HTTP 200 and the page content rather than a 302 redirect to the URL with navigational state encoded" --> GOOD for URL Mappings

Portal 85x environments should be moving away from URL Mappings, however, if your environment is migrated you may still have some URL Mappings in place and wish to have this setting as false. I did not enable this setting in my 8.5 environment given I am not using URL mappings and wanted the remote possibility of an end user coming in from a stateful URL with friendly URL context to be handled. However, this setting also affects 302 vs. 404 behavior in Portal, in addition to stateless vs. stateful behavior and you may wish to set it to false in a system with Friendly URLs only and no URL mappings. See this Technote for more details.

Change #3 - URL generation in themes

3) Our third setting is the navigation.jsp files. OK, so what exactly are these things if the tests in #1 above resulted in stateless URLs? The tests in #1 assume the user is coming into the Portal via a link sent by email, a bookmarked link, etc. If the user is already IN the Portal and navigation around by clicking the navigation at the top/sides of the page, this is where the navigation.jsp files come into play.

For example: If I manually enter my friendly URL, it retain this same friendly URL, short and clean, in the address bar:
http://myportalsite.ibm.com:10039/wps/myportal/Home/Welcome

However, if I hover over the same link at the top of the page and click it - I observed the following result:
http://myportalsite.ibm.com:10039/wps/myportal/Home/Welcome/!ut/p/z1/04_Sj9CPykssy0xPLMnMz0vMAfIjo8ziDVCAo4FTkJGTsYGBe7CBfjheBf5G-lFAaX9_H1d3I38DbwNHQzcDR2dfQ3_3EC9DAxcjrPqRTSJOPx4FUYTcH4VXCcgHWBSgOLEgNzQ0wiDTEQDA0a2H/dz/d5/L2dBISEvZ0FBIS9nQSEh/?uri=nm%3Aoid%3AZ6_00000000000000A0BR2B300GS0

For purposes of this blog entry, I performed an edit of these files on the Portal 85 default theme. HOWEVER, cumulative fixes installation may overwrite these files and it is NOT RECOMMENDED to change the default Portal 85 theme files. You should copy the theme and modify per your business requirements, implementing the navigation.jsp changes as needed. Once the files were edited, my navigation.jsp looked like the following modified:

4) The final consideration is WCM content. Short and Clean URLs can be created to reference WCM content. That is a task typically associated with site design of WCM. However, WCM URL generation does not generate friendly URLs. We needed to update the themes in item #3 above to prevent the long and stateful URLs from showing, a similar change will be needed for WCM URL generation.
"
With friendly URLs for web content, you can construct URLs to content items that are clear and concise. Although you can construct friendly URLs that reference web content items, IBM® Web Content Manager itself does not generate friendly URLs by default. However, to cause the web content viewer to generate friendly URLs, you can create a plug-in that implements a content URL generation filter.
"

WCM URL generation unfortunately requires a bit more custom coding than updating a few jsp files. This custom coding was not performed for this blog article. Further, the requirements of the sample code provided note:

"
To use the sample filter that is described here, each page for which a content URL is generated must have these characteristics:

The page must be a web content page with a friendly name.

The page must have a default content association that references the parent site area of the content item.

"

...which may not be applicable in all environments. Also note you must have Portal/WCM v8 or higher to leverage this feature. If you are interested in this feature, please see this article for a step-by-step implementation.

Final Thoughts

- The steps outlined in the Infocenter and this blog entry should be considered a starting point for configuring stateless URLs. As a quick example, the login and logout links in commonActions.jsp will generate stateful URLs by default as keepNavigationalState="true" is preserved.

...this would also need to be updated, and, you would need to weigh in on the consequences of losing navigational state information, e.g. returning the user to the same page they were on prior to logging in, should you change this setting to false. Many other examples exist and must be weighed out to keep or remove functionality.

- Carefully consider which area(s) of the Portal site you wish to have stateless URLs. For those areas, it may be advantageous to create a separate theme with the updated keepNavagationState="false" jsps and and hasBaseURL parameter in that theme.

- Remember: WebSphere Portal in its design assumes stateful URLs and completely eliminating all stateful URLs will require some significant effort. However, use of Friendly URLs with the configuration options noted above to keep 95%+ of your Portal site clean and consistent is a very real and achievable goal.

The Enterprise edition of Editlive 9 has a Track Changes feature much the same way as in many generic document editors. WCM 8001 CCF11 enable the enterprise version of Editlive. However the Editlive editor in wcm does not show the track changes feature as it appears to be disable.

The Track Changes feature is disabled and hidden by default out of the box in WCM 8.0.0.1 CF11. This is because feature adds extra content to the markup. This means that any information added as comments will be viewable in the source when the content is rendered. Please be aware of those implications and consider them carefully before enabling the feature

To enable the Track Changes FeatureA. Copy the ephox folder and the contained editlive_config.xml.jsp file from <WAS>/PortalServer/wcm/prereq.wcm/wcm/config/templates/shared/app/config to <wp_profile>/PortalServer/wcm/shared/app/config folder

You may have been using the either the search and Browse portlet or a customized search portlet that made use of the searchandbrowse.jar in portal 7 and or 6.1

Beginning in Portal 8 the search and Browse portlet is no longer available. The inherent features and capabilities of the Search and Browse portlet are now included in the Portal 8 search center portlets and page. Refer to the following pages and documentation for more information

"Starting with Version 8 of WebSphere Portal the Search and Browse
portlet provided with earlier portal versions is no longer available.
The Search Center has been enhanced with advanced search options
previously available in the Search and Browse portlet. Categorization
and taxonomy are no longer available."

In Portal 6.0, themes and skins were typically deployed in the wps.ear application. This included any customized themes and skins for your Portal site. The wps.ear applications, for all intents and purposes, is the primary application for WebSphere Portal that runs on top of Websphere Application Server. Updating a customized theme or skin in Portal 6.0 often was an effort that had to be scheduled during maintenance windows due to the fact an update to the wps.ear file was required. Thankfully, newer versions of Portal no longer require you to include your customized themes and skins in the wps.ear application. Instead, you may deploy such themes and skins in your own application on the Portal server independent of the wps.ear application.

However, there are occasions in newer versions of Portal where updates to the wps.ear application may still be required. For this blog entry, we will focus on the following APAR:

http://www-01.ibm.com/support/docview.wss?uid=swg1PM23905

The APAR itself introduces new code into the WebSphere Portal product. The new code changes the default behavior of the WebSphere Portal if activated. The new code is NOT activated by the action of installation of the APAR itself. Instead, manual changes must be made to a file inside of the wps.ear application, and the wps.ear application must be redeployed with those changes for the new code to become active.

Steps to update wps.ear for APAR PM23905
0) If in a cluster, execute these steps from the Deployment Manager.

1) Navigate to the /AppServer/bin directory

2) Run the wsadmin command to export the wps.ear file to a temporary directory (I used /tmp)

7) After the very LAST </filter> and before the very first <filter-mapping>, i.e.

Add the following between these two lines:
<filter-mapping>
<filter-name>Home Substitution Filter</filter-name>
<url-pattern>/portal/*</url-pattern>
</filter-mapping>

resulting in:

8) Save changes.

9) Portal v8 only: Open the /tmp/wps_expanded/wps.war/WEB-INF/web_merged.xml file in a text editor. Repeat steps 6-8 in this file.

10) Backup the wps.ear file in the /tmp directory. Thereafter, delete the wps.ear file in the /tmp directory.

11) Run the EarExpander utility to collapse the files in /tmp/wps_expanded (including the web.xml just updated) into a new wps.ear file:
./EARExander.sh -ear /tmp/wps.ear -operationDir /tmp/wps_expanded/ -operation collapse

*Note: This command will NOT impact the current running Portal Server loaded into in server memory. However, any files contained within the wps.ear file that are referenced by the Portal Server may be impacted. Thus, it is recommended to run this command during a maintenance window to minimize potential impact.

13) If in a cluster, sync nodes. Wait 5-30 minutes for the new wps.ear to be deployed across the cluster.

The original article was targeted towards WebSphere Portal 6.1 primarily. Additional actions are required for WebSphere Portal 8.0. In particular, both the web.xml and web_merged.xml file for wps.ear must be updated for changes to go into effect. (Thanks Thomas for the tip!)

WebSphere Portal versions 6.1 and later leverage a number of out of the box caches to improve system performance. When configured to an enterprise LDAP, two specific sets of caches are leveraged to cache data from the LDAP
- Portal User Management Architecture (PUMA) caches
- Virtual Member Manager (VMM) caches

Rather than going to the LDAP every single time there is a need to access LDAP data, caches will instead be used, significantly reducing load on the LDAP server. A typical flow would go something to the effect of:

-->If cached and not yet expired, it will return the data to the portlet.
-->If not cached, OR, if cache has expired, move onto next stage in flow

4) PUMA will query VMM for the data
*Technical note: Portal itself never directly talks to the LDAP during runtime. In a standalone LDAP configuration, both WAS + VMM code are used to talk to the LDAP. In a federated LDAP configuration, only VMM code is used to talk to the LDAP. However, key point is during runtime Portal itself never talks to the LDAP directly, it always goes through another layer.

5) VMM checks VMM caches to see if LDAP data is cached and not yet expired

-->If cached and not yet expired, it will return the data to PUMA. PUMA will update its caches with the data from VMM and return the data to the portlet.

-->If not cached, OR, if cache has expired, move onto next stage in flow

Both caches operate independently of each other, so it is entirely possible one cache could timeout, while the other cache would still be active.

Direct Updates to data in LDAP do not show in the Portal
While caches offer the advantage of reducing load on your LDAP server, the one disadvantage is that data in the Portal is not guaranteed to be as current as the data in the LDAP. i.e. It is entirely possible that an update can be made to the data in the LDAP, but, that data will not show up immediately in the Portal, as Portal will still be pulling the cached data from either the PUMA caches or VMM caches. Generally speaking, after 30-60 minutes updates made to data in LDAP will be visible in the Portal once the caches expire.

One potential strategy is to modify the timeout values to a lower value. The following Technote discusses this strategy in more detail:
http://www-01.ibm.com/support/docview.wss?uid=swg21413947

As noted in the Technote, turning off both caches is NOT recommended. In addition to increased CPU/memory usage on the Portal server, you run the risk of overloading your production LDAP server(s), risking an enterprise outage of login services across multiple applications.

OK, so we know there is an advantage to the caches to help with performance, and there is also a disadvantage that the caches may contain stale data relative to what's in the LDAP. Is there a way to get the best of both worlds, i.e. maintain the caches in most cases, but, when needed we can pull the most recent data from the LDAP?

Answer, yes this is possible. APAR PM16430 was introduced to allow the ability to programatically invalidate the current LDAP caches and refresh from the current data in LDAP. http://www-01.ibm.com/support/docview.wss?uid=swg1PM16430

Sample Code to programatically refresh data from the LDAP
How do we implement the steps in PM16430? Let's go over some prerequisites before we get into specifics:

i. You should be in a federated LDAP configuration. Federated LDAP code is guaranteed to go through VMM (and VMM caches) every time. Standalone LDAP configuration with WebSphere Portal is not guaranteed to use VMM code every time. The APAR will still work with standalone LDAP code, but the recommendation is to have a federated LDAP configuration if you plan on implementing the APAR

-->See this Infocenter document for how to convert from standalone LDAP to federated LDAP if needed: http://www-10.lotus.com/ldd/portalwiki.nsf/dx/Changing_from_a_standalone_repository_to_a_federated_repository_on_AIX_wp7

ii. You should already have code in place which leverages the PUMA API to query data from the LDAP, OR, you should have a willingness to experiment with creating such code
iii. Review the following blog entry to be familiar with using JSPs to implement sample code: https://www.ibm.com/developerworks/mydeveloperworks/blogs/PortalL2Thoughts/entry/debugging_your_portal_with_jsps_basics28?lang=en

To get started:
1) Locate the wimconfig.xml file under:
<WP_profile_root>/config/cells/<cellname>/wim/config/wimconfig.xml

In the file, locate the section which corresponds to the LDAP server you wish to programatically invalidate caches with. From my lab system, we have:

8) Upload the reloadall.jsp file to your Portal server under <wp_profile>/installedApps/<cellname>/PA_Blurb.ear/Blurb.war/jsp
9) Setup a test page with the Welcome portlet. Configure the Welcome portlet to point to the following jsp:
/jsp/reloadall.jsp

10) Allow the page to refresh. At this point, we are pulling data for the first time from the LDAP, so the information will be current / fresh.
11) Make a direct change in the LDAP to one of the users. For purposes of this article, I changed the "sn" attribute on the "uid=travis" user

from: travistest
to: travischanged

12) Now, at this point, the data in the LDAP is updated, and, the Portal/VMM caches are both stale. The first call we'll make to fetch the LDAP data will be pulling from these caches, and will contain stale/outdated data. The reload() call, now with PM16430 active, will invalidate the caches and pull the data fresh from the LDAP. The second call we'll make to fetch the LDAP data will still be pulling from the caches, however, the reload() call will have pulled fresh data into the caches, and, we'll now display correct updated data from the LDAP.

13) Enjoy! Feel free to implement within your own custom applications as needed.

*Author's technical note: The current code will reload all entries in the LDAP. This may not be a good test case if you have a large LDAP. You may update the following line of code similar to the following if you require a smaller testcase to begin with: userList = pumaLocator.findUsersByAttribute("uid","travis*");

Special Note for 85x and 8001 CF13+

WebSphere Portal originally used private internal APIs to WebSphere Application Server to achieve the cache clearing functionality for the VMM AttributesCache. Beginning in WebSphere 80x, VMM began to offer a public API for any WebSphere Application Server product to leverage. When Portal 80x was released, it was still using the default private APIs and not the public API.

For Portal 8001x, a new APAR has been created - PI19201 - which switches over from use of the private VMM APIs to the public VMM APIs. The APAR functionally is similar to PM16430 for cache invalidation. However, the APAR does NOT require the manual properties store.puma_default.wim.cacheinvalidation.enabled and store.puma_default.wim.cacheinvalidation.repositoryids. The properties were originally required for PM16430 due to private APIs in use but are no longer required for the public APIs. If you have these properties defined and you install 8001 CF13 or later, your reload() code will continue to function, however these properties will no longer be required or used.

For Portal 85x, the public VMM APIs were built into the GA code. Therefore, the manual properties store.puma_default.wim.cacheinvalidation.enabled and store.puma_default.wim.cacheinvalidation.repositoryids are not required for 85x if you implement the reload() functionality.

Addendum to Original Posting
New bits of information will be periodically added to this blog entry.

1) Data stored in a property extension database will never be stored in VMM caches. However, it is possible for that data to be stored in PUMA caches. It is possible for another application (say the Deployment Manager) to make an update to the property extension data. Thus, it possible for the same condition to occur, that is, ApplicationA (DMGR) makes an update to the property extension database, ApplicationB (Portal) is still actively using caches and may not retrieve the new data. Therefore, reload() should be used to invalidate the PUMA caches and force the updated data to be retrieved from the property extension database.

2) The APAR and reload() functionality does NOT apply to groups and listing members of a group. The results of those calls are stored in the SearchResults cache, whereas, the APAR is only applicable to invalidating the AttributesCache. Therefore, if you make a direct update to LDAP to a group (say adding a new member to a group), then the Portal server is not guaranteed to pull this new data immediately. Example, in the Manage Users & Groups portlet, search for the group name, click on the group, it may show outdated information relative to what's in the LDAP.

However, for access control purposes, such as a users ability to see pages and portlets based on which LDAP groups they belong to, there is a potential workaround in place via a membership attribute (ibm-allGroups for Tivoli Directory Server, memberOf for Active Directory, etc.). In general, membership attributes offer huge performance gains during user logins, and we recommend implementing a membership attribute if your LDAP server supports it. Normally, without a membership attribute, we query the LDAP for the groups and their members one at a time eventually determining which groups the individual user belongs to. Think of this as a "Group" operation, similar to the example above with the Manage Users and Groups portlet. Now, with a membership attribute, a single query is made to the LDAP server to pull which groups the user belongs to. Think of this as a "User" operation, and, note the particular terminology of the word "attribute". User + attribute means the membership attribute is stored in the AttributesCache, and therefore PM16430+reload()+a membership attributewill allow us to pull updated group information for a user, in addition to the other updated attribute information. Using our example above, if we call reload() for this user immediately after they are added to a group in the LDAP, and we have a membership attribute configured, their updated groups from LDAP will be successfully pulled. Note, due to other Portal Access Control caches in Portal, the changes may not show immediately. If you have a requirement to have such changes show up immediately, please open a PMR with IBM Support, reference this blog entry and an interest in "PAC invalidation API".

3) WebSphere Application Server has its own code which can be used to pull a user's groups from the LDAP. WebSphere Application Server also maintains a separate set of caches for this information. WebSphere Portal offers a configuration option to allow Portal to reuse group information from WAS. This option is disabled by default and by default Portal does not reuse group information from WAS. If the reuse WAS group information configuration option is enabled, then PM16430+reload()+a membership attribute will still refresh information from LDAP in Portal and VMM caches, but, not necessary WAS caches. In this situation, WAS caches which store group information for a user will be outdated, but Portal PUMA and VMM caches will be current. However, because Portal is configured to reuse group information from WAS, it will perform this action prior to checking its own PUMA caches or VMM caches. Therefore, WAS caches may be contain stale data and Portal will reuse that stale data. More details on this condition are noted in the following Technote: http://www-01.ibm.com/support/docview.wss?uid=swg21593268

If this condition occurs, a decision will need to be made weighing out the need to reuse WAS group information vs. implementing the functionality offered by PM16430+reload(). If the PM16430+reload() functionality is desired, then disable group reuse as noted in the Technotee. If the reuse WAS group information should remain enabled, consider adjusting your timeouts on your caches as noted further up in the original blog entry under subheading "Direct Updates to data in LDAP do not show in the Portal".

4) Portal 8001 versions CF07-CF12 do not work with PM16430. PUMA caches will invalidate with the reload() call, but the VMM attributescache will not invalidate leading to stale data being returned. A newer cumulative fix with PI19201 must be installed to restore the invalidation functionality for the VMM attributescache.

WebSphere Portal leverages a large number of caches throughout the product. Caches offer may advantages to the WebSphere Portal product - most notably they allow for significant performance improvements when the Portal server is under load. When users login to the Portal server, logout, and then relogin, they may not see the most recent data available from the LDAP server. This is discussed in detail in the following blog entry - How to programatically refresh data from LDAP.

This blog entry will offer an extension of the information contained within that blog entry. While implementing the reload() code within a JSP is interesting for proof of concept, practically we need to implement the reload() code in a location where it will be useful and practical. With most implementations of Portal - the use of reload() code during login is the best location for such an implementation. i.e. For a requirement, I want the user to have the most up to date information from LDAP when they login to the Portal server. The information may change while the user is logged in, and that's OK, but absolutely during login updated information should be pulled from LDAP and displayed in the Portal.

This blog article will discuss implementing custom authentication filters that will leverage reload() code and pull updated user data from LDAP during login. Sample code is available at the end of each respective section for download and implementation.

Explicit Login Implementation

An explicit login filter is triggered when the user types in their username and portlet into a login portlet, or less commonly, when the auto-login URL in Portal is leveraged. For Portal servers which do not leverage an enterprise single-sign-on (SSO) solution, they most likely are using an explicit login.

Before we dive into the coding specifics - we will need to first implement a configuration change to the Portal server. When an explicit login filter is called, the user is not yet logged into the Portal server. The user is in the process of being logged in, but, an authenticated context is not yet available in the Portal server. As a result, the PUMA API code calling reload() is being called under the anonymous context of PUMA API, NOT an authenticated context. In the original blog entry discussing reload(), an authenticated context was assumed, but with an explicit login we need to make a configuration change such that the PUMA API code will be able to function under the anonymous context. Perform the following steps:

1) I used Rational Application Developer with a Portal 6.1 runtime. For the code to compile, I needed to add the wp.auth.base.jar file as an external jar. You may copy the wp.auth.base.jar from your /PortalServer/base/wp.auth.base/shared/app/ directory on your Portal server.

2) At the end of the reload() we must set a login redirect for the reload() to take full effect:

An implicit login filter is triggered when the user perform a "login in WebSphere Portal when being already authenticated in the IBM WebSphere Application Server". In practice, this means we are performing a login to a different system, then SSOing into Portal thereafter. This can be performed with a second WAS system with a shared LTPA key, any TAI implementation (SAML, SPNEGO, Siteminder, WebSEAL, etc.), impersonation and a few other means. In an explicit login filter, Portal is the first to process the user information during the login process. In this case with implicit login, WAS has already processed the user login in part.

The code for the implicit login is similar to the explicit login with one notable exception - we can call getCurrentUser() given we have an authenticated context available to PUMA during an implicit login. Bit easier to get the same details with an implicit login, and, no configuration changes required.

Portal 8.5 only: WCM syndication has updated its design such that it may trigger an implicit login when syndication is called. In such cases, we do NOT want to perform a redirect for the syndication user, as that will break syndication. While the reload() code section is a bit simpler for an implicit login, the tradeoff is the redirect section is a bit more complicated. In the code below, my syndication user is the "wpsadmin" user and I am intentionally excluding that user from having a redirect performed. I am OK if reload() does not take full effect for my "wpsadmin" user - I can live with that vs. syndication breaking outright. For all other users, we must still perform a full redirect.

Introduction
WebSphere Portal leverages a large number of caches throughout the product. Caches offer may advantages to the WebSphere Portal product - most notably they allow for significant performance improvements when the Portal server is under load. When a Portal server is restarted the caches are emptied out. In the event of a planned (or unplanned) outage of the Portal server - it can take a bit of time for the caches on a Portal server to become repopulated following a Portal server restart. When accessing the Portal server in a "cold" state without the caches filled in yet, this can leave the end user experience degraded.

There are a few different means to address warming up a Portal site such that the caches are populated before end users ever access the site. One such method would offload individual caches maintained by each WebSphere Portal Server to a centralized set of caches utilized by all Portal servers. Therefore, if an individual Portal server goes down, the caches local to that server are not lost and are instead persisted to the centralize cache. The WebSphere eXtreme Scale product offers such offloading of caching and a separate article is available discussing its use within WebSphere Portal. In some cases, the cost and/or complexity of implementing such a solution may be too high and an alternative solution is needed. In my specific case - setting up eXtreme Scale on simple lab environments is not worth the time or effort. Yet, after a server restart, I would like my Portal site ready to go so I do not have to wait several minutes when first accessing the Portal site and the Administration area. What can be done to meet this goal - simple and fast?

In this article, we will discuss warming up the Portal server using a series of scripted operating system commands. The code may be freely used/modified as needed for your Portal sever - and comments to this blog entry definitely welcome to expand the functionality.

Script to Warmup Portal Site

1) Create a .bat / .sh file to run your startserver commands. For purposes of this blog entry, my Portal v85 server is running on Linux.
2) For the first part of the script, I'll need to start the ConfigWizard, in a separate profile from the Portal server:

- The wget command was used insetad of curl. Why? wget will recursively fetch multiple URLs - whereas curl will only fetch a single URL. Both tools have their purposes, and for warming up the Portal site - wget is the better choice.

- The -qO/dev/null command line parameter redirects standard output to /dev/null. For this specific scripted approach, output from wget is not needed.

3) For the second part of the script, I'll start the Portal server, business as usual:

4) Now at this point, if I were to access the Portal server, the web browser would go incredibly slow as the Portal server caches are not yet warmed up. Let's mimic access to the Portal server as an anonymous user in the next part of the script:

Note in the wget command we are overriding the user-agent string to send "Mozilla-Loader". Why? Portal treats wget as a web crawler, similar to search engines, and will serve Portal resources different to those user-agents than normal web browsers. By overriding the user-agent string we will force Portal to treat wget as a normal web browser / normal user, which will accurately mimic Portal users hitting the Portal site.

5) OK, so that's helpful to warmup the anonymous part of the Portal site, but not logged in users. For that, we'll need to expand the script further:

Note this is a SINGLE line in the file. Here we are specifying to utilizing the Portal auto-login URL to login tot he Portal server with a valid username and password. In this case, it happens to be our administration user, but any username / password could potentially be passed in. We are also specifying to save the cookies returned from this wget command to a temporary file on the filesystem. We will need the cookies generated by this login process (namely the LTPAToken2 and JSESSIONID cookies) during the next set of steps.

6) Now with a user logged in and cookies stored temporarily to a file on the filesystem, let's access various parts of the Portal site

And you may expand the list further as needed within your own custom scripting. In theory, you could access every single page in your Portal site in this manner, so long as you had a Friendly URL, URL Mapping, and/or Vanity URL associated with each page in the Portal.

7) We are done warming up the site with this single user. Remove the cookies in the file. This does NOT perform a logout from the Portal server, however, it does ensure we do not have potentially sensitive information laying around on the Portal server.

It is important we warm up not just the Portal administration user but also a userID that accurately mimics each of our user populations in Portal. Such user populations typically are derived from business requirements - not technical implementation. For example, let's suppose we had a healthcare company - we would typically expect to see the a composition of user populations similar to the following:
- Portal Administrators
- Business Administrators
- Members
- Providers

A userID from each of the user populations would be need to be included in the warmup script to ensure each user population can be properly warmed up on the Portal site. While some of the portlets, pages, etc. from the Portal will overlap between user populations, not all items will, and having a separate userID for each user population will ensure that user population has its caches warmed up in full.

Consideration #2: More complex configurations

Clusters:
The warmup script is targeted towards a standalone system. In a cluster, each Portal server would need to be accessed directly to ensure it is warmed up. Pointing the warmup script to the web server in front of the Portal servers is not sufficient, as the web server may randomly load balance the requests to any given Portal server, not properly warming up each individual server.

External Security Managers (ESM) / Single-Sign On (SSO) solutions:
Portal environments which leverage WebSEAL, Siteminder, SPNEGO, SAML, etc. will have difficulty implementing the script in this blog entry. Such configurations are much more complex than passing a single username and password via an auto-login URL. While the script would still work with such configurations - the intention of the script - to accurately warm-up a site to mimic a real Portal user - would not be as accurate with an ESM in place.

Heterogenous LDAP Data:
An LDAP server is standard in many Portal configurations. In many Portal configurations - LDAP data looked up for userA may be cached and reused by userB, userC, etc. that shares that same LDAP data as userA. Most commonly LDAP groups will overlap and cached data will be returned rather than going out to the LDAP for the group data every time. In LDAPs which contain a large amount of data that does not overlap - e.g. each user is truly unique and may not share similar data with another user - the warmup script will not be helpful in warming up LDAP caches. The warmup script assumes a fair bit of overlap in LDAP data - e.g. the user populations are homogenous within each other.

Consideration #3: Impersonation

While this script can be used for known usernames and passwords - in practice in a production environment we will not have access to the passwords of production users. If one or more "dummy" userIDs can be created to accurately mimic various groups of user populations - that can provide an acceptable workaround. However - in practice - such dummy userIDs are difficult to implement in production as well.

Solution: Leverage impersonation programmatically within the script. You can feasibly create a page within Portal, create a custom impersonation portlet leveraging the impersonation API , and then programmatically access the page to kick off the impersonation process.

NOTE: This is psuedo-code only and was not implemented as a part of this blog entry. The psuedo-code assumes the custom impersonation portlet would call request.getQueryString() via the request and begin the impersonation process with that userID, saving the new LtpaToken2 cookie to the "impcookiefile.txt". The next request to /wps/myportal would then issue a JSESSIONID cookie for the impersonated cookie, also needing to be saved to the "impcookiefile.txt". In such a setup, you need not know the user's password, yet, you can accurately warmup the site with that userID via impersonation.

WebSphere Portal at its core is a collection of enterprise archives files (.ear) and web application archive files (.war) deployed to a WebSphere Application Server. Most versions of WebSphere Portal ship over 100 applications to the Portal server. Each of these applications may require a location in the server memory to store data temporarily. Most commonly the location where this data is stored a session. For a single application on a WAS server - managing the session is a trivial exercise. However, on an application server with over a hundred applications - such as WebSphere Portal - managing each individual session can be a difficult task.

WebSphere Application Server - realizing that managing several hundred applications on a single application server would be difficult - extended the standard javax.servlet.http.HttpSession class with the IBMSessionExt class. Per the Javadoc - "The IBMSessionExt interface extends the javax.servlet.http.HttpSession interface of the Servlet API to Invalidate all sessions with the same session id as the current session." Over a period of time, several sessions can accumulate on a Portal server and with each application potentially needing a session to store data for each individual user- memory can be used up quickly. WebSphere Portal makes use of the invalidateAll() command within the IBMSessionExt class during a user logout to ensure session data gets cleaned up.

However, even with the use of the invalidateAll() command to cleanup session memory, session leaks can still occur within the Portal server. This can be due to an application setting the lifetime of the session to infinite such that it will never be garbage collected, the application storing an excessive amount of data within the session, and/or, the sheer number of sessions on a single Portal server. While monitoring tools exist that can help with tracking session usage - often they are not helpful with diagnosing leaks. This blog entry will discuss utilizing a tool which will help in determining where session leaks may exist.

IBM Monitoring and Diagnostic Tools for Java
IBM offers a a tool called IBM Support Assistant (ISA) to perform a wide variety of diagnostics against multiple IBM products. Prior versions of ISA required a download and installation to Portal servers - which often was not feasible for production environments. The current version of ISA at the time this blog entry was authored - v5 - may be downloaded and executed as a standalone program - no installation required. ISA may be downloaded from the following link:http://www-01.ibm.com/support/docview.wss?uid=swg27039277. The download itself is over 1GB in size - be patient! For purposes of this blog entry - I downloaded the ISA5 tool to a Windows2008 system. Also ensure you have a version of Java installed to the system you will be executing the following steps on.

1) Once the download is completed, extract it and execute the "start.isa" executable.
2) Open ISA5 in a web browser - e.g.https://localhost:10943/isa5/
3) Click on the Tools tab and scroll down until you see the "Memory Analyzer [Desktop]"
4) Click the Launch command, leave the values in the boxes at their defaults.

Analyzing the Session Data

1) Ensure your Portal server(s) operating systems are configured to generate full core dumps. See the following document for details of how to configure the operating system for such a setup.
2a) For Portal servers v80 and later, generate a core dump utilizing the DMGR / WAS Admin Console (Troubleshooting > Java dumps and cores > System Dump)
2b) For Portal servers v70 and earlier, a kill -6 command (*NIX) or task manager (Windows) will be needed to generate the core dump.

3) Wait 2 hours. Generate a second core dump.

4) Copy the core dumps from the Portal server to the server where the ISA5 tool is running.
5) Click on File --> Open Heap dump
*Note: This is how the tool has its menus worded, the tool can open either a core dump or a heap dump and we want to open the core dump. A heap dump will not output the information needed to determine if a session is leaky.

For a Portal v8 system running on WAS v8, the following screen was seen.

*Note - the security username set to anonymous is correct behavior. Portal v8 in its default settings does not have the setting "Session Security Integration" enabled by default. Enabling this setting will show the userID associated with a given session. For more details on this setting - see the Portal Security Hardening Guide and the following blog entry.

For a Portal v85 system running on WAS v855, the following screen was seen.

*Note - for Portal v85 the Session Security Integration setting is enabled by default. Hence the Security User Name shows the "wpsadmin" user.

7) Sort by the various columns to analyze the data. Items which should generate red flags for review:
- Any sessions which have >1MB for session size (heap size in the screenshots) - most sessions do not grow this large in size.
- Any sessions which have -1 for the timeout value, and show up in BOTH of the core dumps - these likely will not be garbage collected and may be the sign of a leak.
- Any username with more than 5 session per application - is the user logging in multiples times? Why? Or is this the sign of a user trying to break the system?

One of the questions we often receive is a fairly simple question - "How many users are logged into Portal right now?". Most of the time, the response is not a straightforward answer, but a series of counter-questions which ask "What do we define as the user being logged in?"

- If a user clicks logout, that can be considered a straightforward answer of "they are no longer logged in, therefore, do not count them".
- If a user perform a login to Portal with Firefox, then perform a login to Portal with Chrome, do we consider that one login, or two logins?
- If a user closes their web browser, but doesn't logout from Portal, do we still consider them logged in, or, are they logged out?
- If a user steps away from their machine for a brief period of time, and comes back, the session timeout has tripped but the LTPA timeout has not tripped - do we consider them logged in or logged out at that point?
- If a user adds an item to a shopping cart, but is not yet logged into the Portal site, do we consider that as being "logged in?"

And this is the tip of the iceberg without going into the multitude of additional configuration options available in WAS/Portal which could spawn many many more questions. For purposes of this blog entry - we will define the number of logged in users to Portal to be roughly equivalent to the sessions that exist in the Portal. For Portal sites which require a login to the Portal, this will provide a fairly accurate count of users active in the Portal server. For Portal sites which do not require a login, but do have public session enabled, the same logic holds - it is a good approximation of how many users are active on the Portal server. For Portal sites which do not require a login and do not require a public session, this blog entry will unfortunately not apply to those sites and a different set of tools are needed to make determination for such sites. For the Portal sites which do require a session - we will answer that question in this blog entry.

Setting up Monitoring

WebSphere Application Server ships a built-in set of monitoring code called Performance Monitoring Infrastructure (PMI). PMI can be used to monitor a number of different activities in a WAS server - such as the total number of database connections, total number of LDAP connections and for what we are interested in discussing - the total number of active sessions in a Portal server. PMI itself is enable on a per application basis and for WebSphere Portal there are typically over a hundred applications deployed to a Portal server. Enabling PMI across all applications would be expensive and unnecessary to answer our basic question. Instead, we would look to enable PMI on a single application only - wps - and monitor the session count of that application.

To enable PMI for the number of sessions on the wps application - perform the following steps:
1) Login to the Deployment Manager / WAS Console
2) Navigate to Monitoring and Tuning > Performance Monitoring Infrastructure > WebSphere_Portal
3) In the configuration tab, select "Basic" if not already selected.
4) Save changes.
5) If in a cluster, repeat steps #2-#5 for each Portal server in the cluster. Sync nodes.
6) Restart the Portal server(s)

While more advanced PMI beyond the "Basic" setting can be enabled on the Portal server(s) - the Basic level is sufficient to answer our original question posed.

Monitoring Session Count

1) Login to the Deployment Manager / WAS Console
2) Navigate to Monitoring and Tuning > Performance Viewer > Current activity > WebSphere_Portal
3) On the left-hand side, click the Plus (+) next to WebSphere_Portal > Performance Modules > Servlet Session Manager > wps#wps.war
4) Click "Start Monitoring"
5) Repeat steps #2-#4 for each Portal server in the cluster.
6) Navigate back to the wps#wps.war per Portal server to see the session count.

In the screenshot below, a single session / user logged in is shown for demonstration purposes:

In a live production system with hundreds or thousands of users logged in - you can monitor the activity over time to determine when peak loads may exist.

In most cases, this Technote will fix one or more of the following issues:

- Blank lines in Resource Permissions portlet
- Permissions not working for some users, but working for other users
- Recent LDAP migration occurred and need to permissions remain the same in the Portal.

However, the Technote is light on details as to why the CleanupUsers procedure is needed and exactly what data it modifies in the Portal database. This blog entry will provide those details and discuss which use case scenario for CleanupUsers you may wish to run for your Portal environment.

Overview of Database Tables
- Portal stores data about users and groups in its databases. The main item of interest for discussion in this article is the USER_DESC table in the release database.

- Two pieces of information are needed to identify any user or group to Portal
- The full distinguished name (DN) of the user or group
- The externalID attribute (extID) of the user or group

- By default, my USER_DESC table on a clustered system looked like the following:

- There are several built in userIDs that are internal to the Portal server, these should be left alone / never modified.

- The other items in there are the default Portal administrator, and, the default Portal administrator group. In this case, we see each entry listed TWICE. The UNIQUE_ID field, corresponding to the external ID attribute is outdated for 2 of these 4 entries.

- Looking at this table alone, we cannot determine which entry is outdated. We would need to go directly to the data source which contains these user(s) and group(s) to determine the most current value of these two items. Given these items are file repository users and groups, we will check the following file on my lab system:
C:\DMGR\IBM\WebSphere\AppServer\profiles\Dmgr01\config\cells\<cellname>\ fileRegistry.xml

- We see the two correct entries in here

…indicating the “86” wpsadmin user and “e4” wpsadmins group in the database is outdated. The cleanup users process should be run to cleanup this data.

- Our next steps will be to clean up the stale database for these two items from the Portal database. Note, the changes will ONLY impact the Portal database. LDAP user repositories, file repositories, etc. are NOT touched by this process.

Test Case #1 - Cleanup, No Migrate
Inspecting the invalidusergroups.xml file, we can see that cleanup-users="invalid" is set by default. This indicates that the duplicate users will be removed from the database completely. In my particular use case with outdated users, this is the correct test case to implement. We already know duplicates exist through visual inspection. We just want to get rid of the duplicate.

Which produced a correct end result in the database:

…indicating the “86” wpsadmin user and “e4” wpsadmins group which were stale in the database were moved.

So what is cleanupusers actually doing during this time? For each entry in the database, CleanUpUsers will attempt to query the live user repositories (LDAP, File Repository, etc.) for that user data. If a match is found, CleanupUsers will compare the data in this database table against the match that was found against the data in the live repository. If the live repository contains updated data, CleanupUsers will mark the data to be deleted.

For this specific use case, why did I have a duplicate wpsadmin user and duplicate wpsadmins group? Answer: Because I was in a cluster.

When you first install Portal in a standalone server configuration, it installs an admin username of choice (I chose wpsadmin), and a hardcoded admin group name of wpsadmins. That created two entries in the database. Later, when I installed the DMGR, I had to specify an administrative user (I also chose wpsadmin here). This wpsdamin user in the DMGR file repository had the same DN, but a DIFFERENT extID than the wpsadmin user in the Portal file repository. What about the wpsadmins group? DMGRs do not require groups to function. This is correct, however, when you augment the Portal profile to the DMGR, it creates a wpsadmins group there as well. Once again, the wpsdamins group in the DMGR file repository had the same DN, but a DIFFERENT extID than the wpsadmins group in the Portal file repository. Once I federated Portal to the DMGR, the DMGR file repository became the master copy and overwrote Portal’s file repository. Portal saw the change in the extID for the wpsadmin user and wpsadmins group, (no change in DN, just the extID) and ended up creating duplicates in the database. Hence, if you keep the same user and group name in the file repository for both your DMGR and Portal, you should see at least two entries reported by cleanupusers.

Before we move to the next test cases, we’ll need to setup a live LDAP data to have real data to work with. File repository data can still be used, but this test case makes more sense in the context of an LDAP being used. In this case, the LDAP in use is Tivoli Directory Server, and our ibm-entryuuid attribute will become the extID attribute in the database. Looking at the LDAP data itself:

Note, my database is NOT yet populated with this data. Portal will populate the database on the first lookup of a user or group. Let’s do this via the Manage Users & Groups portlet now.

And the resulting entries in the database

Note that it wasn’t just the user that was populated in there, but also the user’s groups!

As a quick test case, I moved my user to a different location in the LDAP. Do we see duplicates in the database? Answer is NO! Portal code automatically updated the database with the new location correctly. Why? The extID is the primary key! Portal code is able to handle such an update in runtime.

OK, that covers the use case of the DN changing. What about the extID changing? Directly changing the extID in the LDAP isn’t easy and most LDAPs forbid it. In most cases extIDs get changed when an entry in LDAP is deleted and then recreated. I’m going to do that now with my “travis” user.

My end result is the same DN, different extID:

Once the updated data from LDAP is pulled ... duplicates were shown

In this particular test case, I have an old userID - ending in “46b”, that may have permissions associated with it in the Portal. Once again I have duplicates, so I’d want to get rid of the old ID while preserving the new ID. After running the script, the stale user is successfully removed.

Test Case #2 - No Cleanup, Migrate Only
This is the recommend use case for most scenarios, and, what the primary Technote for CleanpUsers recommends: https://www-304.ibm.com/support/docview.wss?uid=swg21377025. This is most often associated with the “blank lines” issue. We will swap out the existing bad IDs for new IDs. Existing data will be updated, no data will be removed. Once this is completed, it is recommended you test and ensure end users can use your Portal successfully.

Note: The reason we need to migrate data is because the data is spread out across MANY tables in the release database. USER_DESC is a linkage between these tables. However, cleaning USER_DESC by itself is not sufficient. For example, the LNK_USER_ROLE table ALSO contains the extID of the userID!!!

For this test case, we will perform a similar delete and recreate of a test user. However, rather than performing a lookup in the Manage Users & Groups portlet resulting in a duplicate in the database, we will INSTEAD run the CleanupUsers.xml process. This would simulate a real scenario where LDAPs were recently changed and we need to update items in the database to match the new LDAP.

Here’s our original user in the Portal database:

Here’s our recreated user in the LDAP server:

And with our run of CleanupUsers we get a report this user needs to be cleaned up. Rather than deleting the user, I will instead migrate this user. We should see this also occur in the LNK_USER_ROLE table shown previously:

In USER_DESC, the updated user was created (as the CleanupUsers process did perform a PUMA lookup in this case!):

And in LNK_USER_ROLE table is also updated:

Now I can be confident to delete my invalid old user.

Test Case #3 - Cleanup and Migrate Only
This is recommend only if #2 fails, and if #2 fails, we recommend you contact IBM Support via a PMR to determine why. This is a combination of #1 and #2 such that we will swap out the existing old IDs for new IDs, existing data will be updated, and stale data will be removed. While it may appear advantageous to run this approach as it requires fewer steps, it also does not permit testing of functional correctness. i.e. If for some reason the user data is not migrated, poof, its no longer available as the invalidation of the stale data will have occurred all in a single operation.

*Note: This test case will not be covered with an example given this is a combination of the first two test cases, and, the first two test cases are covered in detail above.

A Note on MemberFixer

This blog entry is focused specially on CleanupUsers being run against the release database. The MemberFixer procedure operates in a similar manner as the CleanupUsers procedure only against content in the WCM database. Data in the jcr database is dependent on data in the release database being correct. Therefore, if you use WCM content content in your Portal environment, be SURE to run the CleanupUsers procedure in full before beginning the MemberFixer procedure. This blog entry will not cover memberfixer in detail. Look for a future blog entry on details of memberfixer.

A Note on ExtID and DN changes

If you have a use case where both the extID and DN will be changing, please open a PMR with IBM Support. There are options available to handle this use case scenario and a case-by-case evaluation is needed to determine which option is best suited for your Portal envrionment.

Final Thoughts

Portal as a product offers a means to guard against LDAP data changing. The CleanupUsers process allows for a manual cleanup of data in the Portal database at a time of the Portal administrator's choosing. Its two step operation - i.e. report the bad data and then clean it up - also allows a manual intervention to prevent certain data from being cleaned up if so determined by the Portal administrator. This procedure may seem a bit odd at times, however, it does allow for data integrity of user data.