I'm operating on a large corpus of documents where human coders are looking at OCR-ed PDFs. We're trying to facilitate their coding work by highlighting relevant search terms. Redaction actually works pretty well for that, as highlighting with an empty red rectangle draws attention to stuff likely to be of interest.

Let's say, though, that some documents match 5 words in the document, some match 30, some match whatever. It would help our project immensely to know how many matches there are in a given document. We also want this to happen automagically -- counting by hand is beside the point.

Logging would be the holy grail ... even if there's a redaction log with a bunch of noise in it, we're willing to parse the log to get what we want. Yet, we can't find any evidence that Acrobat (Windows or Mac, v9, but we're willing to shell out for X if it gives us this functionality) logs much of anything it does to a document.

Are you saying that you're using the Search & Redact feature? It's possible with a script to count how many redaction annotations are present. It's also possible with a script to search through a document for a word and automatically add text highlights, and then count how many there are.

Process two: operating on the documents saved from process one, Search & Redact (marking for redaction) based on one or more search terms.

Now, if marking for redaction doesn't create a "redaction instance" that gets logged / counted, no problem. We don't mind doing the actual redactions on a copy of the PDFs, since we'll retain the un-redacted copies to actually read.

That is, we could OCR in one step, search&redact (applying redactions) in another step that creates files we'll just toss because the point was just counting / logging, and then search&redact marking for redaction but not applying so that we have little red rectangles around our search hits.

This just demonstrates that you can determine the number of redaction annotations with a script. You can adapt it to suit your needs. For example, you could use it in a batch process and write the number for each file to the JavaScript console by changing the last line to:

console.println(documentFileName + ": " + sum);

When you open the console (Ctrl+J) after the batch process, it will show a line for each file that shows the file name and the number of redaction annotations.