But there’s a problem with this approach: I’m supposed to be copying the behaviour of Unix’s wc, not OpenOffice Writer’s word count. Normally, this wouldn’t be a problem – a word count is a word count, a line count is a line count, and Writer should pump out the same numbers as wc.

Not so.

In my last post, I wrote:

According to OpenOffice Writer, this text has 32230 words, 173543 characters, and 4257 lines.

However, upon passing the same text (saved in the textfile “count.txt”) through wc, I got the following output:

5302 32230 178845 count.txt

Writer and wc agree on the number of words, but disagree on the number of lines – 5302 (wc) vs 4257 (Writer). It’s a disagreement of about a thousand lines.

Brutal.

Anyhow, I’m going to focus on wc’s approach to line counting – simply returning the number of newline characters in the file.

In this version, I’m using Mozilla’s TreeWalker implementation to stitch together the page text. So far it seems to be working alright, but if it somehow ends up falling through, I might end up using something like Andrew Trusty’s code with the jQuery library to do the text stitching.

So there it is. Maybe I’ll keep working on this, pretty it up a bit, etc. However, work starts on Monday, and that’ll probably take up most of my technical attention.