Synchronizing Web Content Across Web Publisher Environments

Nicki DuBose

November 7, 2008

Introduction

Several of the articles on our website discuss the technical, business-related, and planning challenges inherent in migrating enterprise content from one repository to another. This article focuses on the unique challenges of migrating web content from one Web Publisher environment to another. We’ll identify the most common web content migration scenarios and discuss options for addressing each of them. We’ll also explore the elements unique to Web Publisher which must be taken into consideration in a web content migration effort.
Based on client requests and feedback from our peers, we’ve chosen to address the most common web content migration scenarios:

To eliminate the need to re-create the latest content in a test or development environment, migrate web content from one environment to another.

To reduce the risk of testing in a production environment, enable the migration of templates from a development or testing environment into production.

What our clients really want is the ability to keep their development, testing, and production environments fresh with the latest and greatest content and templates. Ideally, our migration approach would allow for low-effort, incremental migrations.

Migration Scenarios

Most web content migrations fall into one of the following common scenarios:

Refresh Content

The most common web content migration request we hear from our clients is to refresh (or synchronize) the content in one environment with that of another. For instance, since the content in a production environment is frequently updated and maintained, users often request that we migrate that content from production to a development or test environment to facilitate testing with real-world content scenarios. The reverse is also true. Content authors might create new content in a development environment while testing a new web site feature. To avoid having to recreate that content they often request that the content created during development be migrated to production.

Refresh Templates

Similar to testing new web site features, developers often make tweaks and changes to templates to support web site changes. Ideally, developers would make the change in a test environment before putting the change into production. For most developers, this means a manual effort of exporting templates, presentation files and rules files from a testing environment and then importing them into the production environment.

Migrate Entire Repository

Another reason to migrate your content is to upgrade some or all of the software components or hardware platform in your Documentum environment. You’ll need a migration plan that allows you to move all of the content and supporting objects and relationships from the source repository to the target. This scenario is present in all types of content migrations and is not unique to web content, so we will not address it in this article. Look for a future article to address this scenario in greater detail.

Web Publisher Migration Challenges

There are a number of concepts in Web Publisher that make it different from other Documentum applications. These concepts add unique challenges to migrating web content, beyond what you might experience in a traditional Documentum migration. We can group these concepts into three categories: Environmental, Relationships, and Content.

Environmental

The following elements are a part of any Documentum repository. The good news is that all of these environmental artifacts, with the possible exception of Categories, can be migrated using DocApps or DAR files (using Composer). Optionally, you could also use a third party migration tool to migrate some or all of your environmental artifacts.
Environmental artifacts are generally the same artifacts found in a typical Documentum repository. These artifacts change infrequently when synchronizing two repositories, so we don’t necessarily need to concern ourselves with migrating these objects after our initial development or implementation effort. Below we’ve identified some of the key environmental artifacts and their role in a Web Publisher environment:XML applications: One useful feature of Web Publisher is two-way synchronization of properties and content using XML applications. The XML Application enables automatic population of a piece of content’s property values when the content file is checked in or imported into Web Publisher. With two-way population, the opposite is also true. An author can set property values on a piece of content at the time of creation and the corresponding content values will be updated.Lifecycles: Web Publisher uses a document lifecycle to define the various states that content will occupy during its life. In an out-of-the-box Web Publisher environment, content in its draft state is in the Work In Progress (WIP) lifecycle state. When a piece of content is ready for review and approval it is promoted to the Staging lifecycle state. After the content is reviewed, it is promoted to the Approved state. In addition to these standard lifecycle states, many people choose to define custom lifecycles with custom lifecycle states to support their business process.Workflows: The workflow defines the promotion path, typically from one user or group to another, that the content will take as it passes through the various lifecycle states. A process workflow will automatically promote a web page to the next state in the lifecycle when certain approval tasks have been completed.Permission Set Templates & User Defined ACLs: EMC Documentum provides a number of features to manage permissions within Web Publisher. All Documentum objects are assigned a single Access Control List (ACL). The ACL assigned to an object sets the access permissions and restrictions for the object. Permission set templates, which are a special case of ACLs, are applied to a document based on its lifecycle state. Alias sets can be used as placeholders for user names, group names, and ACL names. A combination of alias sets and permission set templates set the ACL for web content. Additionally, some Web Publisher implementations will use custom, user-defined ACLs to set access permissions and restrictions for certain objects as dictated by business processes and security policies.Custom Folders: In a typical Documentum repository, most folders are of type dm_folder, the standard, out-of-the-box type of folder. But Web Publisher uses a custom type of folder called wcm_channel_fld. When you are migrating documents into a Web Publisher cabinet, you have to make sure that any folders that get automatically created are created as wcm_channel_fld rather than the default dm_folder.Categories: One of the ways Web Publisher allows users to organize content is via Categories. All Web Publisher solutions will have at least one category instance, the Functional Taxonomy, which organizes content by page type according to the template with which the content was created. Typically, each page type in the system is represented by a category. Categories created to organize content templates by page type are created as functional taxonomy nodes. To associate content to its corresponding category, Web Publisher creates a wcm_category relationship. In essence, the category is another type of folder (a subtype of dm_folder). This type of category is migrated with the corresponding content template in a DocApp or DAR file.
Categories can also be used to define a custom hierarchical taxonomy that can be published to a website as XML. In this case, the root level category is typically a sibling of the functional taxonomy described above. The challenge with this type of category is that it can’t be migrated using a DocApp or DAR file (using Composer). For this reason, it’s important that our migration solution support the migration of this special folder sub-type.

Relationships

Templates are an integral part of the Web Publisher solution. Templates enable authors to create and edit content in a familiar form-like interface. Because templates often drive the content on a web site, it is not uncommon for them to change rather frequently to keep a web site looking fresh and interesting or to incorporate new page features or layouts.
Templates (or document types) are the basis for creating new content in Web Publisher, and are composed of the following set of files:

A defining template upon which new files are based

A rules file which defines the fields in the content creation form (optional)

A presentation file which transforms the XML content into a web page (optional)

A preview/thumbnail file which uses a graphic to identify the templates to content authors (optional)

Web Publisher uses relationships to connect templates to their supporting rules, presentation and preview files. The image below illustrates the relationships between a sample template and its supporting files:

Figure 1: Sample relationships between a template and its supporting objects

To learn more about creating document types, refer to the Web Publisher Developers Guide or the article on this website titled “Web Publisher Template Basics”
Just as the templates have relationships to their supporting files, Web Publisher creates relationships between the content and each of its supporting objects. The diagram below illustrates some of the common relationships using a sample product page to show the relationships between a piece of web content and its supporting objects.

Figure 2: Relationships for Sample Product Page

Below is a summary of some of the most important relationships that Web Publisher content might have:

Relation Name

Description

wcm_category

Tracks the category for an object

wcm_default_workflow

Tracks the relationship between the template and its default process workflow

wcm_doc_template

Tracks the template from which the content was created

wcm_dynamic_content

Indicates content that uses xDQL queries that require transformation

wcm_editor_link

Tracks other files that are linked to the content file

wcm_ewebeditpro_link

Tracks other files that are linked to the content file when the content file is modified by eWebEditPro*

wcm_html_editor

Indicates when a file is editable by eWebEditPro or Rules Editor**

wcm_layout_template

Tracks the presentation file used to transform the content

wcm_my_template

Tracks the category for a favorite template

wcm_native_edit

Tracks the objects that use a local editor

wcm_process_workflow

Tracks the process workflow attached to content

wcm_publishing_template

Tracks the publishing template for an object

wcm_rules_editor

Tracks if content is checked out by the Rules Editor tool

wcm_rules_template

Tracks the rules file used for content authoring

wcm_pb_blueprint_template

Tracks the relationship for taxonomy management

wcm_template_thumbnail

Tracks the thumbnail graphics for the template

* eWebEditPro is an optional HTML editing tool packaged for use in Web Publisher** Rules Editor is an optional tool that simplifies creation of rules file XML

Content

We mentioned that the template, together with the rules file, enables the content author to enter the content in a standardized form. The form simplifies adding elements like images and links, and ensures consistency across all pages of a certain type. Web content created using a template and rules file is stored as XML in Web Publisher. To maintain the reference between the content and any images or included text files on the page, Web Publisher embeds the document id of the image or text file into the content XML file.

Figure 3: Image IDs embedded in sample content XML

These embedded object IDs are used within the template to display which object has been selected by the content author when using the out-of-the-box graphic image (or text) selector provided with Web Publisher.

Migration Tools and Web Publisher

The best way to address the challenges of migrating web content from one Web Publisher environment to another is to use a migration tool that is up to the task. Although we may be a little biased, we think that Migration Workbench is the best tool to use for Documentum migrations, and it has several features that were specifically designed to synchronize WCM repositories.
As we stated earlier, one of the main motivators for moving web content from one environment to another is to keep development, testing, and production environments synchronized and fresh with the latest and greatest content and templates regardless of where it was created. This implies an ongoing, or iterative, process rather than the “one-time push” that is commonly associated with content migration initiatives. Therefore, you want to select a tool that supports an iterative approach – meaning you configure the migration process once and run it repeatedly at specified intervals. With each run, any new or updated documents will be migrated to the target system, and any deleted documents will be removed from the target system. Migration Workbench is the only tool we know of that does this.
But supporting an iterative approach is just the first item on the list. For a successful web content migration, the tool you use will also need the following capabilities to address the unique challenges of web content migrations.

Environmental

While DocApps and DARs can be used to migrate most of the environmental elements we discussed above, you also need to consider the effect these elements have on content.
For instance, it’s possible that some of the content in your source repository will be in the middle of a workflow. Depending on your business requirements, you may want to migrate only the content that has been completed and approved. If that’s the case, you’ll need a migration approach that will allow you exclude certain content from the migration based on some criteria. Migration Workbench uses migration rules to evaluate property values, such as the lifecycle state, to determine which content to migrate.
You may have a business case for migrating content that is in an intermediate lifecycle state – it’s neither in its initial lifecycle state nor in its final state. For instance, you may want to migrate all content – whether it’s in WIP, Staging, or Approved. Intermediate states, such as Staging, are often “non-attachable”. If content is in a non-attachable state, it cannot be imported into the system in that state. Instead, the content must be imported into the system in an attachable state, such as WIP, and then promoted to the intermediate (or non-attachable) state. The migration tool should support setting an object to an initial, attachable state, such as WIP, and then auto-promote it to a desired state, such as Staging. This is one of the advanced features of Migration Workbench.
Earlier in this article we discussed how Web Publisher uses a combination of alias sets and permission set templates to set the ACL for the content in a repository. The permission sets and ACLs are generally moved from one repository to another using custom DocApps or DAR files. Your migration tool must allow you to define the rules by which a permission set template was used to attach an ACL to a piece of content in the source system, and then re-apply the correct ACL after the content is imported into the target system. Migration Workbench can determine and attach the correct ACL depending of the lifecycle state or other properties of the document.
Your migration tool must also deal with the fact that Web Publisher uses a custom folder type called wcm_channel_fld. Most migration tools will automatically create folders when neccessary, if the folder does not already exist when trying to link a document into it. But Migration Workbench allows you to specify what type of folder should be created. This ensures that all the folders in your repository will be the right type (wcm_channel_fld) rather than the default.
Finally, since some categories cannot be moved in a DocApp or DAR file, your migration tool must be able to migrate your complete category folder structure prior to migrating the web content into those folders. Migration Workbench can migrate custom folder types such as categories, keeping all the custom attribute values intact.

Relationships

Because Web Publisher relys so heavily on relationships to tie together the different components of a web page, the migration tool must be able to migrate these relationships easily. Recall the diagram of object relationships earlier in this article; each arrow represents one relationship and all of these relationships must be reassembled in the target system for each piece of web content. This means the tool must maintain a source-to-target mapping for the migrated object IDs, migrate the relationship object and then update the parent and child fields with the new IDs that were generated by Web Publisher when the parent and child objects were imported. Following this sequence is essential.
Other relationships link one migrated object to a non-migrated object, such as workflows which are usually moved from one environment to another using a DocApp or Composer. In this case, the tool must be able to update the migrated relationship object with the Web Publisher-generated ID for the migrated object and then perform a lookup into the target repository to get the ID of the non-migrated object.
Migration Workbench supports both of these scenarios.

Content

To update embedded object IDs, your migration tool should maintain a source-to-target mapping for the object IDs so it can update the IDs embedded in the content XML file with the new ID assigned when the related image or content file was imported. Sequence is important; this step must occur after the related object is imported so the new ID is known. You may also opt to ignore the embedded object IDs. As long as you migrate both the content and the embedded content into the appropriate folders, you’ll find that the resulting web page will render appropriately. If the embedded IDs are not updated in the target content object, then content authors can expect to see errors similar to the one below when editing content. Reselecting the embedded content and saving will resolve the error.
The previously selected Documentum object 09000405800dc63 is unavailable. The object is unavailable either because it was not
found, or you do not have the correct permissions to open the object.
Select an object and click Save to overwrite the previous selection.
Otherwise, make no selection and click Save to continue using the same object.

Summary

Our goal in writing this article was to highlight some of the unique challenges inherent in web content migrations between Web Publisher environments. Hopefully you have gained some insight that will help you anticipate these challenges and plan for them in your next web content migration initiative.
Experience has taught us that selecting the right migration tool is key to addressing the challenges unique to web content migrations. Ideally, a migration tool will enable everything from simple, iterative migrations of content to entire repository migrations. Here at Blue Fish, we’ve been working hard at solutions to overcome these challenges. Visit our product information page to learn more about how Migration Workbench can be a part of your web content migration solution. And for more information on the Blue Fish Agile Migration Methodology, including the concept of an iterative approach which we call incremental migrations, please see the article titled “Using Migration Workbench to Address Common Content Migration Issues”.