Web Script: Modified Content List

For a recent customer engagement I was asked to write a web script which they’ve agreed to let me share.

The script was written to give them:

A list of content that had been modified from a point in time until “now” (The time of execution).

Specify a specific space or recurse the child spaces for the named space.

return a specified XML structure

In preparing for this post I’ve added some modifications/fixes:

I’ve added two new return formats: JSON and HTML

fixed an issue that allows it to run on the 3.2.x release of Alfresco.

Here are some insights to a few areas of the script:

Lucene Date Queries

There are two specific things to remember about Date queries:

1 – Date Format. Date queries with Lucene are looking for the date in an ISO8601 format using combined date and time in UTC. (Even if you are looking just for the date it will complain if you don’t pass the time ).

To do this I had originally used the JavaScript Date prototype from http://delete.me.uk/2005/03/iso8601.html. This works fine in pre 3.2 releases. But due to a change in the 3.2 code line top level objects are now sealed which means you can’t make these type of prototype change to Root level objects in the JavaScript libraries. So I’ve made simple modification to Date prototype code allowing me to pass a Date Object into the code which then does the coversion and returns a properly formated ISO8601 string.

I’ve turned the original prototyping into a function, where I pass a Date object, with the date I want to work with, into this function. I also replaced any reference to ‘this’ with that data parameter. I will admit there is some excess code here that may never be called. But this was the quickest change to accomplish what I needed.

2 – Only dates are indexed. By default, only the date portion of the property is indexed. In general, most use cases only require the date . But in this case, the customer was interested in what may have changed over the period of an hour.

There are two options for this: Code around or modify how Alfresco indexes these DateTime properties. In this case I choose to code around it. This means that I am not modifying default behavior in Alfresco…always a plus.

Code Around. We have all the pieces we need to do this: We know the datetime we use for the start and end our range query. We also have access to the full datetime property (even if it wasn’t indexed).

I chose to pull this into a function that takes the collection (array) of documents found in our range query and then test to see if the datetime property of the node (in this case the modified datetime property) is between our the beginning and end of our range, if so then add it to our new filtered results array.

Now rebuild your indexes. Depending on the amount of content in your repository it this may take some time. It may also not be possible to make this immediate because the SLA you have with your customers won’t permit it until you have scheduled maintenance downtime. (HA Clustering is helpful in these situations.)

System Folders

One thing that may show up in these queries are system folders. These folders contain any of the rules that may be associated with a space. In most queries these folders and their content has no value, so we want to exclude them.

When we are performing queries for content within a space we have a couple of options. Two of the most common are PARENT and PATH.

PARENT queries work directly on a space. No recursion. It will return all of the nodes (spaces and content) in that space. In other words, PARENT queries work directly on a space without recursion. It will return all of the nodes (spaces and content) in that space alone, but will not return any nodes (spaces and content) that are in sub-spaces of that space.

PATH queries are a subset of XPATH. They are eagerly evaluated. Thus they can be memory (Caching) and CPU intensive. They are useful if you want more than what is just in the space you are working, may not know the location of the space, or if a space of that name may exist in multiple locations.

Be smart in your choice. Use PARENT as often as possible.

Note For Java extensions: unless the other clauses in your query are complex, it’s likely more efficient to enumerate (list / ls / dir) a space using the FileFolderService rather than a Lucene query. ie. a simple query of the form “PARENT:[noderef]” would be better implemented using FileFolderService.list()

OK, enough about some of our design decisions let’s talk about how to use the web script.

The ‘in’ allows you to tell the web script to just list the content in the space passed in the path parameter. Without the ‘in’ the web script will recurse the space structure from the passed path parameter down.

Next, the DateTime structure is not fixed: Possible (tested) formats are

mm/dd/yyyy

yyyy/mm/dd

mm/dd/yyyy hh:mm:ss

yyyy/mm/dd hh:mm:ss

The hh is expecting a 24 hour clock format. And millisecond and timezone tests are not supported

The query tests against the modified datetime property of the content. (Fairly easy to change it and test against the created property or any other custom metadata DataTime property.)

There are three return formats: XML (the default), JSON and HTML.

XML The XML format is a simple structure that returns back the name of the file, the path to the file and the modification date. It is also the default format returned.

HTML This final format is more for testing than actual production use. It returns a simple html page with a list of modified items. It displays a simple unordered list, where each item is a csv list with key and values seperated by colons.

In all of these return formats, the freemaker template processes the returned scriptNodes for a simple collection. This makes it easy to extend the return with the returned nodes properties to include things like nodeRef, creation date, node icons, download url, etc.

If you have suggestions for improvements or questions please feel free to comment here or on the google project. As always I’m willing to open access to the project to those will to help improve the code.

Jared Ottley

Jared Ottley is an engineer at Alfresco. As the father of 6 he is raising a geek army. He has a degree in Political Science and drives a Mini.