I tried to keep an eye on that in the past months and while maintaining long logs worked to increase my wording variety etc but nothing really changed. I still have a pretty bad ranking on the log similarity which I may have to accept..

4 Answers

I guess the algorithm count the number of occurrences of each word you put in your logs. So, words like "the" in english which are really oftenly used, will increase significantly the log similarity.

For my part, I also love making long and (I hope) interesting logs. But, when I log a series of caches, I always add a "header" log (identical to all caches of the day) and a part dedicated to the cache itself. So, my log similarity is very high (45%).

This is the way we log too (meanwhile). The logs are mostly in German, while on holidays usually in German and English and somtimes even a bit danish.
The stats say we have an average of 244 words. So overall I would say it "feels" like it may be a similarity of 35% but the stats say 64%. This is a bit funny as it looks like we are more a copy and paste logger, but we really try to say something unique on every cache (even on PT if possible and if we are going for one what actually is not very likely) ;)

I would not be surprised if details about the algorithm are withheld on purpose. This might make it too easy to fool the algorithm to achieve a good ranking.

I fully agree it is worth trying to make good and interesting logs, however, this is not an easy task, and very difficult to be judged by an algorithm. My own logs (mostly written in German, plus the local language in most cases) also have a high similarity which may also be due to the fact that a few words appear frequently in any normal text. I also have to admit I use the 'day header' or 'trip header' part which may add to the similarity and then would assume the check may also be done on a 'text similarity' (checking for identical sets or sentences), which is also easily programmable.

Perhaps shorter logs give more variance. I have an average log length of 41 words, and a similarity score of 31%. Although I sometimes write full length descriptions, I often post logs such as "Another quick find here - TFTC" (I'm not saying I should, and I will write more for a good cache, or more effort than usual needed to find it). Personally I'd rather have a higher log length and lose out on the similarity score, since I get a better badge for the log length, whereas the similarity score doesn't seem to be used in any other statistics. However with over 2000 logs, bringing the average length up is very slow!