Best Practice Makes Perfect

Getting back to the subject of performance, let's consider again an agent to process lots of documents in LotusScript (or Java). The key to getting a well-performing script is to access as few documents as possible -- for read or write -- and if you have to access them, at least do so in the fast C (and C++) code of the Notes client or Domino server, rather than in your script. There are three reasons to push document access into the C layer:

The view indexer makes use of a "summary" version of the document -- one where any large fields, such as rich text, are not loaded into memory. This causes less disk access and uses less memory.

Taking advantage of compilations of document information (view indices and full-text index) let you locate relevant documents without having to touch non-matching documents at all.

Just as you have the most slack if you take advantage of work previously done by yourself and others, so too will your script have a more relaxing run if you use as much pre-calculated information as you can. Once you create a NotesDocument object, you lose, because it pulls all the document information into memory. The rule of thumb is, your script will run fastest the fewer NotesDocument objects you create. Let's start out, then, with an example of the worst possible way to do it:

Before getting into the details of how, I like to think about whether the task is really worth doing. You can save much work by just not doing it. But this is just an example, so let's ignore for a moment that this script is a waste of time, and figure out how to waste time most efficiently. The first problem is that this script touches every document in the database. On line 7, it uses AllDocuments, then scans that collection. So it loads into memory all the data in the whole database, when really you maybe just needed to update three documents (the other 200,000 already had the correct field value). Even worse, the script saves every document in the database, even those that didn't need changing. This not only makes the agent slower, but also makes everything else about the application slower. Every document you change causes extra work of view indexing, replication, and full-text indexing, as well as increasing the chance of save conflicts with users who were working on the documents at the time.

(Extra credit for you if you spotted the use of GetNthDocument, which is an easy way to get really poor performance. Use GetNextDocument to iterate thru a collection.) There are a few different ways to speed up the search, and target just the documents you need to change. I won't describe them all in detail now, but I plan to cover them in future entries.

Use the timestamp of the document. You can think of what's happening like this: the agent has an "unread list". UnprocessedDocuments is a collection of only the agent's unread documents. UpdateProcessedDoc marks a document as read. And of course, if it's edited by someone else, it becomes unread again. This is a very quick way to find documents of interest, provided you're only interested in modified ones.

Use a full-text index. If the database is full-text indexed, you can limit the selection of documents by a full-text search; for instance in this example the search would be not ([Somefield] = "Nevermind"). This is done either in the Document Selection section of an agent, or using FTSearch method. The drawbacks here are that (1) you don't always have a full-text index, and (2) not every search is possible; for instance, there is no test for exactly equal, so this example search would not return documents where Somefield contains "Nevermind" but also other information, e.g. "Nevermind, Sam".

Use a macro search. NotesDatabase.Search in LotusScript, or a SELECT formula in a macro agent, lets you identify documents of interest very precisely, albeit not very efficiently, since the entire document has to be loaded into memory to evaluate the selection formula on it. About the best that can be said for it, is that because the iteration through the documents happens in C code rather than a LotusScript loop, it's not as slow as it might be.

Use a view. You find or create a view that contains just the documents you want (or that contains a column that's sorted so you can locate the documents by searching for a key). The view selection formula is also macro code, of course, but views have two big advantages over a macro search: first, the view index remembers which documents are in there from before, so it only has to consider documents modified since the view was last used. Second, the view indexer only uses the summary fields of the document, so the rich text of the document doesn't have to be loaded into memory.

Combined approaches. Often you can use a quick method (such as timestamp) to limit the number of documents you must consider, then another method (full-text) to narrow it down further, then finally iterate through the remaining documents to identify the ones of interest.

Choosing the best-performing approach generally involves considering your data and thinking about how much work must be done, total.