In the past few years, the three big engines have spent big dollars on acquiring data. Now free and/or affordable tools are making search data available to any SEO with a few bucks to spare. In this post, I’d like to share with you what I see as an emerging trend and one around which new businesses can be built.

You can buy scraped data. You can buy panel data. If you’re into site buying seriously, you might have considered buying analytics data independently of a site purchase, directly from the site owner himself or from others in the niche.

So the question is: what value can be extracted in the longterm from this trend towards data commodification?

Short term, those people selling the data directly or using it to tweak their PPC algorithms and sell clicks more expensively have got value extraction nicely figured out. So trying to jump in now with a copycat business would likely make you an imitator or an idiot.

In any case, if you provide value immediately and only extract value in the longterm, your profits will be much greater (idea paraphrased from Aaron Wall). And the trend is only picking up pace as Google Suggest starts helping you get copyrighted data for free.

In the medium term, I believe that people will be able to add value by auditing the quality of the data.

If they can automate ways (read: create software) to audit data quality, all the better, because then they can scale their business easily.

Of course, automated auditing can only go so far. My friend Scott Hendison created a new tool that helps you with analyzing the onpage seo factors on your site. It’s useful for basic stuff, but if you want to get into deeper analysis, a pair of human eyes are necessary. The same will likely be said of new tools made to audit data quality.

On particular way that data auditing might come into its own would be auditing analytics for the sake of advertising and site buying/selling.

For advertising, I can think of three particular applications that would be useful, and would love to hear any ideas you have!

One obvious application would be to have rate card data verified. Publishers with verified data could gain a competitive advantage over publishers whose data is unverified.

Advertisers could also have particulars checked. For instance, how much search traffic a site attracts through keywords relevant to the advertiser (this traffic usually converts well for advertisers), or what the average historical CTR is for ads on a particular site (very important for CPM deals) etc.

A third item would be to create a tool that figures out what percentage of visitors is shared between sites. The purpose of this is to increase the ads’ effectiveness by showing them to the same audience repeatedly, thus making the message more likely to be remembered/acted on. It’s possible to execute this cheap “retargeting” idea based off link data (i.e. retargeting the same audience as you did initially), but if you have the actual traffic goings-and-comings, then you’re golden!

For site buying, I think businesses could be set up around providing analytics verification and due diligence services. There could even be a [paid-subscription] newsletter featuring sites that were audited and checked up on. Another route would be to set up a specialized auditing agency for servicing the site buying / site selling market. This should help smooth transactions and standardize processes for site purchases above a certain size.

The key to success with this data auditing will be extracting actionable insight from the data.

Data mining will grow as the volume of data overwhelms traditional analysts.

The analysts will resort to tools to help them make sense of this and figure out what to do based on the data at hand. Simultaneously, as the cost of data approaches nil, data mining services will be a valuable service that more and more people will consider.

I know from chatting with a certain gentleman on the way back from SMX Advanced that Visible Technologies (not to be confused with Visual Sciences, a powerful but totally un-user-friendly web analytics program), home to non other than the brilliant Todd Friesen, are using QL2 data mining. Leading search marketing companies are already making partnerships with the data analysts…

Other assorted implications of data becoming a commodity:

Collusion in ad pricing – Advertisers can collude to fix ad pricing and/or make the PPC market more open. Retailers who don’t participate and/or share their data will see themselves squeezed by their wholesalers, themselves under pressure from the other retailers who are participating in the data sharing and who are supplied by the same wholesalers.

MFUT sites – After Made-For-Adsense (MFA) sites, here comes the Made-For-Useless-Traffic site. Think of the a real estate blogger selling you his search traffic data. He might then optimize for image search traffic to boost his earnings.

That can be worked around by excluding the purchase of non-industry-relevant traffic, within the data purchasing contract. However, these forms of exclusions also risk closing advertisers eyes to new markets they could get links from and build their distribution through.

Pollution of the datastream – As I wrote way back in my first Scratchpad column, people are going to be messing with their competitors’ analytics more and more. There’s always been referrer spam, sure, but SEOmoz recently covered bad bots spamming web analytics just to mess with people.

As this datastream pollution becomes more prevalent, services will arise to combat these tactics and make them more prohibitive. Think of the data commoditization equivalent of the virus/antivirus wars.

The long arm of the law gets longer - Some groups that will very likely be affected include law enforcement and the military, who will likely partner with ISPs and proxy service providers.

In tandem with this, and to fight spammers who promote nasty crap like kiddie porn, plugins will arise to auto-forward such comment spam in these markets to law enforcement agencies. The agencies will then need to push the data through data mining programs and programs focused on pattern recognition.

The FBI’s first plugin for WordPress will both be a PR coup and a huge tool in its arsenal against pedophiles and child abuse.

ICANN will get its act together, step in, and crack down on the registrars and resellers selling domains to these people, as the commoditization of data makes it more cost-effective to do so. I think ICANN is too greedy an organization to do this unless it becomes more strictly regulated, though I’d love to be proven wrong.

Vertical and Horizontal Data Industry Consolidation – Of course, if we have too much of a good thing… we puke. As the supply of data increases, some data providers will go under and the market will consolidate and stabilize to an extent. That should make the purchasing of data more convenient, but also more expensive.

The more succesful data providers, imho, will package it together with value-added services like analysis, audit, frequency/retargeting estimations of shared traffic etc. And SEOs will need to be savvier so that we can take the most effective actions at the earliest possible time.

Here are some action items for you to work on after reading this post:

1) Go buy a day’s membership at Spyfu, or get a few dozen credits at Compete and run a bunch of different reports on your site(s) and your competitors. Write some analysis and write 5 actionable ideas down based on the analysis. You score: a bonus from the boss.

2) Code the reporting program for law enforcement agencies and have it submit the pedophile spam alerts to the relevant enforcement agencies. You score: 1000 links from government websites!

3) Figure out a better way to act than my barren mind can think of and share it in the comments: you score dofollow links plus major props and a mention + links in the next Best of the Z List newsletter!

4) Create a business based off one of the ideas above (and make a donation to my “Foundation for the betterment of Gab’s bank account” ) ! If you already are, I’d love to hear how you came to the idea and what your focus is on. I can keep it to myself if possible, but anything you can share publicly would be awesome (plus I can give you links, like I said ).