Note: If you want to play with TF-IDF, download this spreadsheet. The first tab is a simple TF-IDF calculator. Enter the occurrences of a word, the words in each document, total documents and the number of documents containing the phrase. It does the rest. The second tab demonstrates falling TF-IDF as documents containing a phrase goes up.

Wait! Don’t run! I’m not here to teach you the math behind TF-IDF. Truth is, I barely understand it myself. But Term Frequency, Inverse Document Frequency (TF-IDF – a great phrase for the next SEO cocktail party you attend) contains some crucial lessons for us copywriters.

Here’s a very brief description of TF-IDF and how it works (a little fancy math involved):

TF = Term Frequency

We all know this one:

If our key phrase is “flibbergibbet,” and it occurs 4 times in a document that’s 400 words in length, then the TF for “flibbergibbet” is:

4 / 400 = 1%

Some folks call this keyword density. But we’re past that now.

Inverse Document Frequency

Inverse Document Frequency (IDF) is the inverse of the number of documents in which a phrase occurs. That’s a terrible description – I know that because the mathematicians I know all punched me in the arm after I said it. But it’ll work for our purposes.

In case you want to know:

IDF = log(total documents/number of documents with phrase)

So, if “flibbergibbet” appears in 250 out of 1000 documents, the IDF is:

log(1000/250) = .6

TF-IDF

TF-IDF is the Term Frequency times the Inverse Document Frequency, or TF*IDF.

Here’s the thing about IDF that you must understand: As the number of documents containing a phrase goes up, the TF-IDF score goes down. Have a look at this graph — document frequency goes up as you move to the right:

Yikes. So, the more times you mention a phrase, the less important that phrase appears on a specific page.

What It All Means

We don’t know for certain if the search engines use TF-IDF to determine the importance of a word on a page. But it’s likely they use it or something very like it.

Say you want your website to rank well for our favorite word. You include the word at least 3 times on every single page of your site. That actually reduces the TF-IDF score of each page for “flibbergibbet.”

Of course, there are many, many other ranking factors. Thousands. If your site is 150 pages of fantastic content, and it:

Has a unique, fully descriptive title tag for every page

Has a unique structure for every page

Doesn’t spin or duplicate content

Uses fully-descriptive ALT attributes, etc. etc.

… then TF-IDF probably doesn’t hurt you at all. A visiting search engine can use other signals to determine page relevance.

But content farmers, beware. If you crank out 999 pages of total crap, using your key phrase 5-10 times per page, all you’ve done is made it harder for a search engine to figure out which page is most important for that phrase.

If I were a search engine (and I’m not), I’d take that as a signal of a poorly-organized site.

Wouldn’t you?

The Lesson

In the past, site owners created page after page expounding on a specific key phrase, repeating it time after time in articles that were barely different, poorly written and poorly structured. That’s still a standard “SEO copywriting” tactic. I use quotes because it’s not SEO copywriting at all.

TF-IDF explains why that tactic has lost its power. It also shows why the cliche “If you want to rank, write good stuff,” really is the right strategy. TF-IDF means more isn’t necessarily better. So, write good stuff!

Huh? I thought that the IDF was calculated from *all* known documents – not just the ones on your site – and was thus used to filter out common words whose TF should not be used as a ranking factor. e.g. The word “good” might have 5% TF in an (uncreatively written) blog post about my good times on vacation but that doesn’t mean it’s a relevant result for “good” queries. Search engines can use IDF to determine that – across the interwebs – “good” is a common phrase and its appearance on my page isn’t especially noteworthy / shouldn’t be heavily weighted as a ranking factor.

Seems to me that if single-site IDF was used as a ranking factor many sites would have a harder time ranking for their brand terms or core products than a competing site that only mentions your brand / products on one page.