An Introduction to Web Publishing Jobs in Documentum

Nicki DuBose

May 13, 2008

Overview

Documentum provides a number of out-of-the box administrative jobs that help administrators ensure the health and integrity of the Documentum environment. For example, there is a job to check the consistency and integrity of the objects in the repository and a job to remove the old audit trail entries.
In addition to the standard jobs that ship with the content server, there are several jobs delivered with Web Publisher and Site Caching Services (SCS) that automate the clean-up and publishing of content in web publishing environments. For example, the dm_WebPublish job publishes content from the repository to a website.
This article will provide an overview of the jobs used in the web publishing environment: what they are, recommended configuration, and troubleshooting tips for some of the more common errors.

What is a job?

Jobs in Documentum are used to automate the execution of a program or script. Each job in Documentum represents an automation schedule and execution frequency for something called a method, which in turn holds the reference to the program or script to execute.
Jobs are configured in Documentum Administrator (DA) under Administration->Job Management->Jobs. To view the properties, schedule, method, and object information for a job, click on the information icon in DA.
Selecting the “Pass Standard Arguments” on the method tab page will result in the following default argument values:

docbase_name

[name of docbase]

user_name

[docbase owner]

job_id

[r_object_id of the dm_job object]

method_trace_level

0

The method uses these arguments to log into the docbase to accomplish its intended purpose.

Figure 1: Pass Standard Arguments on the Method tab of the Job Properties

Another argument commonly used in job configuration is the window_interval. This argument defines a range of time in which the job can start outside of its scheduled execution time. For instance, if a job is scheduled to run every day at 2pm and the window_interval is set to 60 (minutes), then the job can start at any time between 1pm and 3pm. This interval can be useful if the Content Server is down at the scheduled execution time, then the job will run if the current time is within the current window interval. Using my example of 2pm, if the content server went down at 1:30 and was restored at 2:30, the job execution process would attempt to run the job in the remaining half hour window (2:30 – 3:00). Setting the interval to 1440 (24 hours) will guarantee the job will run, but may also negatively impact performance.
The execution of jobs is handled by the agent exec process. On the Content Server, the agent exec process runs continuously, polling for new jobs to execute. By default, the agent exec process polls the repository for jobs to execute every 60 seconds. When the process finds more than one job to execute in a cycle it will execute each job, sleeping 30 seconds between each. The number of jobs executed in a single polling cycle is configurable; by default, the agent exec process will execute up to three jobs in a polling cycle. To change this value, modify the max_concurrent_key value in the dmcl.ini. For more information on the agent exec process and modifying the polling interval, see the Content Server Administrator’s Guide.

Jobs for Web Publishing

Now that we’ve introduced the basics about Documentum jobs, let’s discuss the jobs used in the web publishing environment. Web Publisher and SCS provide us with a few jobs to aid in creating, promoting, and publishing content. To help the content authors with content creation, we have the Create_Dynamic_Content job and the WcmObjectBagJob. For automatic promotion of content and publishing we have the Monitor_Lifecycles job and the dm_WebPublish_[r_object_id] job, respectively. Lastly, the dm_SCSLogPurge job is used for housekeeping of the WebCache logs. Let’s dive into each one of the jobs…

Create_Dynamic_Content

The Create_Dynamic_Content job is used to update certain types of dynamic web pages with the latest content.
In web content management, there is often a requirement to create dynamic web pages – pages that update automatically based on other content in the repository. One of the most common types of dynamic page is a listing page; for example, a web page listing all of the latest press releases or announcements. Each time a user creates or removes a press release, you would expect the press release listing page to be updated automatically.
At Blue Fish, we prefer to create dynamic pages by having the web page query the SCS database. This provides real-time updates and follows a familiar paradigm that most web developers are familiar with. If you create dynamic web pages the way that we do, you won’t need the Create_Dynamic_Content job.
Web Publisher does, however, provide an alternative approach to creating dynamic web pages in which XDQL queries (Documentum queries that produce XML fragments) are embedded into Web Publisher presentation files (XSL stylesheets). Whenever the file is published, the XDQL query is run and the listing page is updated with the latest press releases.
The Create Dynamic Content job is used to refresh these web pages on a regular schedule so that the pages are always up to date.
To use this job, Web Publisher must know which pages are dynamically generated using the XDQL method. To indicate that a file has dynamically generated content, its template must be flagged to automatically refresh when related content is modified. To set the flag, select the Content Refresh checkbox on the publishing tab of the content’s Properties page. This type of content is created by the transformation process performed by the Create_Dynamic_Content job, which transforms the content using the presentation file containing the XDQL. The job will only perform this transformation on content in the Active lifecycle state.
An administrator can configure the Create Dynamic Content job to run as frequently as is appropriate for the content types. The job run frequency should be set based on the freshness requirements of the dynamic content. For instance, a current press release listing page might be updated daily, but an employee listing might only be refreshed monthly. Since the Create Dynamic Content job can only be configured once for all content types, frequency should be set based on the content with the highest freshness requirement.
Note that the Create Dynamic Content job does not invoke an SCS publish operation. For the content to be published to the Active website, configure the dm_WebPublish job to run at periodic intervals.Tip: By default, the Create_Dynamic_Content job is installed in the Active state, so if your site is not using XDQL to generate dynamic content, remember to set the job to Inactive.

Recommended Schedule

varies

Method invoked

wcmCreateTransformation

Arguments

check “Pass standard arguments”

dm_WebPublish_[r_object_id]

The dm_WebPublish job is used to automate the process of publishing content from a repository to a website.
Content is more than just files; it includes the properties or metadata that describe the content, as well as any relationships to the content. To automate the process of publishing all of these elements, you must define a site publishing configuration for Site Caching Services (SCS) in Documentum Administrator (DA). SCS is an application that automates the publishing of content from a Documentum repository to a website – and a required piece of the Web Publisher puzzle.
Site publishing configurations are created for each web site to identify what type(s) of content will be published, their format, and where to publish the content. In the site publishing configuration, an administrator defines the properties to publish as well as their destination (table name) in the SCS database. Once the publishing configuration has been created and tested, an administrator can manually invoke a publish operation by running the site publishing configuration. Running the site publishing configuration will “export” the appropriate content from the Documentum repository to the website location. It is up to a web developer to create the code or application(s) that will then consume the attributes and relationships published to the SCS database.
Alternatively, the site publishing operation can be automated by activating the dm_WebPublish_[r_object_id] job generated when a site publishing configuration is created. For each site publishing configuration defined in DA, there is a corresponding dm_WebPublish job that can be configured to automate the publishing task on a regular schedule. Based on the schedule set on the job, the website can be refreshed with the latest content updates.Note: The r_object_id in the name of the dm_WebPublish job corresponds to the r_object_id of dm_webc_config object, that is, the site publishing configuration object.

Recommended Schedule

Every half hour

Method invoked

dm_webcache_publish

Arguments

defaulted from SCS

dm_SCSLogPurgeJob

The dm_SCSLogPurgeJob is used to delete out-dated SCS log files.
Now that we’ve covered a little bit about publishing jobs and SCS, we can move onto the logging that occurs with each publish operation. By default, SCS publishing logs are written to the server’s file system in the $DM_HOME/webcache/temp directory. If the trace level is greater than zero on the publishing job or the job fails, then the publishing logs are written to the file system. As you might imagine, with tracing enabled or multiple publishing failures, you could generate hundreds of log files in a short time.
The dm_SCSLogPurgeJob provided with SCS is intended for clean-up or housekeeping of the SCS publishing logs. The job does exactly what you’d expect – it purges old SCS log files. Like all of the jobs in Documentum, administrators can designate how often and at what date and time to run the SCS Log Purge job. In addition, you can regulate the duration of time to keep these log files. To keep a certain number of days of SCS Log files, set the cutoff days argument. For instance, set the cutoff_days argument to 30 to keep the last 30 days of SCS log files.
This job also generates a report of all of the files that were deleted and the directories that were inspected for SCS log files. By default, this report is generated in $DM_HOME/dba/log/[docbase hex id]/sysadmin. The report is overwritten each time the job executes.

Recommended Schedule

Once per day

Method invoked

dm_SCSLogPurge

Arguments

-window_interval 120 -queueperson -cutoff_days 30

Monitor_Lifecycles

The Monitor_Lifecycles job is used to promote and expire dated web content.
In Web Publisher, content authors can create content with future effective and expiration dates to indicate when content should be published to, or removed from a website. For instance, a content author might write an announcement about a new product scheduled for release in two weeks. The author can draft the announcement and get the appropriate approvals prior to the product release date, but set the page to publish to the website in two weeks. Similarly, you might want to remove a page from a website on a particular date. For example, you would probably want to remove a product coupon page when the coupon reaches its expiration date.
The purpose of the Monitor Lifecycles job is to automate the promotion and expiration of this type of dated content. The job enables content authors to create content well in advance of the publish date, and also to ensure that the content can be automatically published (or expired) at a specific time and date. By setting effective dates on the content and activating the Monitor_Lifecycles job, the content will automatically be promoted to the Active lifecycle state and subsequently be picked up by the dm_WebPublish job for publishing to the website. Similarly, content will be expired by the job when the expiration date is reached and then removed from the website on the next publish.
The Monitor_Lifecycles job will search the repository for content in the Approved lifecycle state, comparing the effective date on the content to the current date. If the effective date matches the current date, the job will automatically promote the content to the Active lifecycle state. The job will promote content in the Approved lifecycle state in two cases- 1)when the effective date is reached or 2) when the effective date is blank and “delay publish” is enabled on the lifecycle assigned to the content. In addition to promoting content to the Active website, the Monitor Lifecycles job will search the repository for content in the Active lifecycle state, comparing the expiration date with the current date and setting the content to the Expired lifecycle state when the expiration date is reached.Note: The Monitor_Lifecycles does not publish content. You must enable the dm_WebPublish job to publish the newly promoted or expired content to the website.

Recommended Schedule

Every half hour

Method invoked

wcmLifecycleMonitor

Arguments

check “Pass standard arguments”

WcmObjectBagJob

The WcmObjectBagJob is used to increase performance in the new content creation process by pre-creating content objects.
Remember, content is more than just a file; it includes properties or metadata, and relationships to the content. Consequently, creating new content involves creating instances of each of these elements. To improve the performance time for content authors when creating new content in Web Publisher, Documentum provides a job to pre-create these content objects. To illustrate the overhead required to create new content, let’s explore the objects created when a content author chooses to create new content. In addition to creating the actual file that contains the content, the metadata that describes the content is created and written to the repository. Also, relationships are created that tie the content to content template(s), rules file(s), and presentation file(s). Additionally, relationships are created between the content and its corresponding category, workflow and any images or other artifacts linked with the content.
The WcmObjectBagJob can reduce some of the overhead required when creating content by pre-creating a number of these empty content objects.
To enable the WCMObjectBagJob, set the job to Active in DA then set the new content object count in Web Publisher. To set the new object count in Web Publisher, log on as an administrator and select Admin-> Web Publisher Admin-> Settings-> General tab. Setting the “New Content Object Count” value on the general tab will set the number of objects that will be pre-created when the WCMObjectBagJob runs (the default number is five). The job only pre-creates objects from Approved content templates. If the template, rules file, or presentation file is updated then any items in the existing “object bag” are replaced with new versions when the job runs.Note: Content created by the WcmObjectBagJob is created with the document owner attribute defaulted to the docbase owner name instead of the name of the content author. If defaulting the owner of the content to the docbase owner is not desirable for your implementation, we recommend setting the job to Inactive as we do here at Blue Fish.

Recommended Schedule

Nightly

Method invoked

wcmObjectBagMethod

Arguments

check “Pass standard arguments”

Troubleshooting Common Errors

Detailed below are some of the most common problems and errors with the web publishing jobs. These errors can be found in the docbase owner’s Inbox.

These errors indicate a problem with the Java Method Server. If you encounter this type of error, perform the following actions:

Validate that the Java Method Server is running

Validate that the wcm.jar and WcmMethods.jar are in the classpath

These jars are included through the dctm.jar defined in the classpath variable on the content server and the appserver

Validate the existence of the jars as defined in the dctm.jar manifest file

Restart the Java Method Server

[ERROR] [AGENTEXEC 3408] Detected while processing dead job
Monitor_Lifecycles: The job object indicated the job was in
progress, but the job was not actually running. It is likely
that the dm_agent_exec utility was stopped while the job was
in progress.

[ERROR] [AGENTEXEC 1372] Detected while processing dead job
dm_WebPublish_[r_object_id: The job object indicated the job
was in progress, but the job was not actually running. It is
likely that the dm_agent_exec utility was stopped while the job
was in progress.

The above errors indicate a failure in the agent exec process that can occur for any job type. If you encounter this error, perform the following actions:

Check the window_interval argument configured on the affected job. In some cases, the interval may not be large enough to start the job in the allotted time slot

Restart the docbase to release any locks created by the failed agent exec process