Wordtracker's API is much more reliable than scraping data from Overture

the Overture tool seems like it has been down more often than it has been up recently

I primarily focus on the US market and did not realize how popular the international aspects of the old keyword research tool were until I started getting a rash of email complaints after taking the tool down.

So I put the old keyword tool back up, renamed it the international keyword suggestion tool, and defaulted it to using UK values (while still allowing users to grab data from other regional markets). I also zipped it up here, so anyone can install it on their site, and set it to a different default language if they like.

I recently talked to the fine folks at Wordtracker about how unreliable the Yahoo! keyword suggestion was, and Wordtracker offered to work with me to power the SEO Book keyword tool using Wordtracker's robust and reliable API.

We now have a CSV export option at the top of the results. And it is pretty sweet! It lists keyword, WordTracker count, daily estimates for the big 3 engines, and broad and phrase match versions of each keyword :)

Because Wordtracker's business model relies on selling keyword data, they have a vested interest in keeping it as clean and reliable as possible, and are unlikely to pull a Yahoo

Wordtracker does not tokenize plural words into their singular versions, so you get to see volumes for both singular and plural to know which is more popular. In fact, if you search for the plural they will still return the singular

Wordtracker does not arbitrarily alter the word order like the Overture did

Wordtracker's API is much more reliable than grabbing the data from Overture was

Wordtracker's API allows you to filter out adult keywords.

Yahoo! Search Marketing offers a developer API, but given how rough their transition away from their old keyword tool was, I would much rather use a reliable market leading tool like Wordtracker. Please give it a spin and let me know what you think.

I can plumb around Google blocking it, but there are a limited number of types of webmaster tools that interface with search engines that can be provided to the general public without either being cloned by the search engine or having the search engine serve you some type of retribution for creating them.

Editorial judgements are rarely equitable, and nobody wants to have sitelinks, but have them appear at the top of the 5th page of the search results for their own brand.

New Media is a Key to Growth (ish)

I have never created a Facebook application and have no intent in doing so, because if I am successful they would likely steal my idea and find a way to ban or silence me and/or halt and clone my project. Which is sorta what Kevin Rose did to a Digg member who created an unofficial Digg group on Facebook.

The Transition From Open to Close

Sure that Google maps API is open today, and so are many other data sources, but after they buy enough marketshare look for that to change. The big networks are only open in markets they are losing. What did they do to their SOAP search API after they had enough market leverage? They killed it.

Relying on APIs or scraping data from someone else's platform only has value if you can aggregate it from many sources, do it in a way that is hard to block, add substantial value, have alternative data sources, and you are creating something that you know the data sources you are relying on will not clone for a strategic reason.

Wanted: Writer, Editor, & Marketer...Pay: $0

All these networks pretend that they care about you, but they are vultures. Their data is their data. Their ideas are their ideas....and so are your ideas, unfortunately. If you find yourself becoming someone else's user generated content, or your business can be described as a feature on someone else's product, you are wasting your time.

Joost created an SEO link analysis extension for Firefox that shows link anchor text and PageRank on Yahoo! Site Explorer, Google Webmaster Central, and Microsoft's webmaster portal. I also updated SEO for Firefox to fix a Yahoo! Search error, but to get it to update you have to uninstall and reinstall it because I did not update the versioning data and my programmer is a bit backed up at the moment.

The Website Health Check tool aims to provide a simple and intuitive interface to seeing if your site has any major SEO issues. The site queries Google to grab pages you have indexed in Google, and looks for issues amongst the first 1,000 results.

If your site is exceptionally large, you can use the date based filters to view a sample of recently indexed pages in Google to see if there are any duplication issues amongst those pages.

Questions Answered by the Website Health Check Tool

Is Google indexing your site? Are they quickly indexing your new pages?

Do you have duplicate content pages getting indexed in Google?

Do you have canonical URL issues?

Are any of your pages in Google missing page titles?

Does your server send correct error messages?

Feedback Needed

This tool is in beta. Please leave feedback below.

I sent the programmer this URL and he would love to get your feedback on what you think of it. We are looking to have version two out before the end of the month.

Features We Are Looking to Add

Allow you to search for not just a site, but a site and a keyword, like [seobook.com seo]

Video About How to Use the Website Health Check Tool

Michael Jenson from Solo SEO recently emailed me about a cool new free SEO tool he created called Index Rank. After seeing my post about Google date based filters, Michael created the Index Rank tool, which allows you to see the growth of a site's profile in Google based on the number of pages indexed over different periods of time. The tool also allows you to compare multiple sites against each other.

Why is this data useful?

Since Google removed the supplemental results label, the next best thing we have to test site trust for lower end longtail pages is how quickly new pages are getting indexed.

If you see a rapid increase in indexing you know that is caused by an increase in domain trust due to better inlinks, an increase in content creation that leveraged unused authority the site was sitting on, solving a crawling issue, improving internal site architecture, or some technical issue that might be associated with creating duplicate content pages.

If everything you create is getting indexed you may consider creating content at a faster rate, perhaps using sub-brands off subdomains.

If you keep pumping out content but are not seeing your indexing stats go up, that is a cue to build links.

The people from SEO Digger recently put together some research on search spam. Some of the terminology they use (like using the word illicit) is inaccurate, but the trends they discovered align well with what one would expect.

Spam Dominates Longtail Adult & Pill Search Queries

In high money niches, spam sites tended to dominate longer search queries while having less exposure in search results for shorter queries. View the below graph with adult, pills, dating, cars, gifts, and casinos. It shows the normalized density of spam sites ranking in Google by 1, 2, and 3 word queries.

Why is Casino an Anomaly?

I believe the reasons casinos appear so tight nit are

US advertising laws and gaming laws prohibit some of the common spam related revenue streams

leading online gaming sites have heavily embraced both offline advertising and SEO

people who gamble tend to be quite passionate about gambling

That passion means gamers are more active to participate in community sites in that niche, which further consolidates traffic streams due to network effects and creates a lot of free on topic content for some of the major community driven sites.

Effective Search Spamming Business Models

Given this research, if you were to create a business model revolving around spamming, it makes sense to focus on the long tail of search. Get enough PageRank to get your pages indexed, but do not worry about accumulating enough PageRank to try to rank for core keywords in the spammy niches. Plus, staying away from the core keywords makes your sites less likely to get booted from a manual review and/or a competitor snitching on you.

Spam & Ranking Low Trust New Sites

The exact same trend that is seen between real sites vs spam sites is paralleled when considering new websites vs older websites.

Older websites that are heavily linked at and heavily trusted dominate the core category related keywords.

Longer search queries have less matches in the search database, and are thus more reliant on the on the page aspects of SEO.

Older sites can not possibly adequately cover all the related longtail search phrases, so newer sites with less authority rank for many of the more accessible long tail keywords.

If you create a new site you can set your goals on ranking for core category keywords, but realize that longtail traffic will come first. If Google lets entire categories get dominated by spam pages then there has to be an associated opportunity to rank real pages.

I just updated SEO for Firefox to include Compete.com website rank and Compete.com monthly uniques. If you leave Compete.com in on demand mode it tends to work quite well. I am also going to ping the guys at Compete.com to ensure the automatic mode gets to be pretty reliable too. Compete.com data is far better than Alexa because it has less of a webmaster bias.