It was March 10, 2000 – the NASDAQ peaked at its highest in history, the phrase “Dot com” had entered popular vernacular, and AOL’s “You’ve Got Mail” slogan was part of our daily routine. In the following days, the stock market would fall by over 78% and continue to decline over the next several years. The bubble had burst, trillions of dollars had evaporated – the fallout was spectacular.

In the early days of “Dot com” excitement, banner advertisements and CPMs ruled the internet. If there were a penny for every time the words “pageviews” were spoken, the internet may have actually earned some real cash. However, much of the promise behind advertising on the web had relied on the theory that it was no different from the same TV advertisements we had all grown sick of, or the billboards we had learned to ignore on our way to work. Simply put, flashing products in front of our eyes when we had no interest in them simply didn’t work.

Through their new “Dot com” perception of the world, investors and entrepreneurs had failed to make the connection between the failures of yesterday’s billboards transplanted into the parkways of the Information Superhighway. Then, just as the dust settled from one of the largest economic crashes in modern history, Google, the internet’s new poster child, bought a company that many of us had never heard of: Applied Semantics. It was this purchase that would ultimately turn the Internet around, giving any publisher with a webpage the promise of earning revenue, and turn its host into one of the largest revenue generators in history. The technology was relatively simple: scan a page’s content and allow advertisers to bid in a real-time auction, choosing the highest bidder and placing their advertisement on the relevant content. The key to this new magic was how incredibly well it matched an advertisement to the content the user was interested in at the very moment they were reading it.

When you’re on the edge of your seat watching zombies take over the world in a new episode of “The Walking Dead,” the last thing you want to see is an advertisement for diapers. Yet, in the year 2000, it was exactly those types of unrelated advertisements that had littered the tops of websites, flashed in front of them at double the size, and sometimes overcame our screens in a hostile takeover. Google’s approach was much simpler. The reader would see a small clip of text no larger than a matchbox on the screen with links to other websites about what they were reading at that moment. For the first time, advertisers saw incredible promise in the internet, with interaction and sales conversions profoundly higher than traditional methods. The dawn of contextual and performance based advertising had begun.

Google had an incredible run with their new golden goose. Text ran the desktop web for a very long time, but as users became increasingly mobile, everything began to change. Google responded in 2009 with their $750 million acquisition of the mobile advertising platform, AdMob, and finally, in the fall of 2011, something very interesting happened: desktop search declined for the first time in history. The world had gone mobile.

Thankfully, YouTube and other media continued to engage the users that had abandoned desktop search, though the method provided more difficult to monetize than the old World Wide Web’s text-based pages. Similar to the TV advertisements we all grew to hate in previous decades, YouTube attempted to sprinkle video advertisements into the “real” content, but has so far been wrought with problems. Besides being frustrated with conversion, advertisers are now pressing Google and other platforms, like Facebook, on inaccurate performance reporting and offensive, unrelated content alongside where their precious brands are shown.

On a web page, text can be easily analyzed and compared to advertisements for related content, avoiding any offensive material in the process. For video and other imagery, however, the technology to understand has not yet been widely deployed, particularly in the hands of Google. This is likely troubling for the search giant, given they’ve invested so heavily in technologies designed to process and understand visual media with the promise of Deep Learning Neural Networks.

Google Brain, one of Google’s experimental projects, leverages technology that simulates the electrical activity in the human analog with mathematical formulas. The trick with a simulated brain, similar to a human, is that it must be trained properly to achieve the desired effect. The end result is an underlying requirement for a substantial amount of training data. A toddler can run around, interacting with his environment, touching toys and putting them in his mouth, and swaths of data are gathered through his senses, but most importantly, his three-dimensional depth perception. However, when training a computer to “see,” researchers currently rely on a very limited universe of data. In a typically flat, two-dimensional format, snapshots of objects, which are synonymous with a momentary blink of just one of those toddler’s eyes, are fed to a computer repeatedly until it gradually adjusts its mistakes into successes. The trained “brain” is frozen in time, the mathematical weights saved as a file, and finally reproduced in reverse when unknown objects need to be recognized.

This entire process is the basis of Deep Learning Neural Networks, which have shown promise in advertising by classifying video into silos like “People,” “Beach,” “Party,” or “Sports.” However, advertisers need dramatically more detail in order to decide where and when to place their advertisements. In a very disappointing way, these classification labels are far less granular and detailed than the data Neilson’s set-top boxes collected for traditional TV advertisers of the past. Will the internet ever go “Back to the Future”?

Using the Spott app to identify items on TV. Photo Credit: Appiness PR

Adtech platforms like Luminate, before their acquisition to Yahoo!, and more recently, GumGum, have experimented with this new method as well. Both companies rely on image recognition techniques to analyze publisher content and display related advertisements superimposed on the imagery. Imagine that “Walking Dead” episode where, once enamored enough with Rick Grimes’ pocketed shirt, you could actually click to buy it. This product and object-based advertisement is only the beginning to the visual frontier.

In order to bring advertisement to the promising future that Applied Semantics first gave Google, we need to focus on actually understanding visual content, bringing meaning to its meta-data. Once we incorporate cognition into visual media, advertisers will finally begin to see the profound impact in relativity, performance, revenue, and most importantly, safety. The new adtech frontier, and the billions that follow, will be captured forever by those that are able to process visual content with cognition and understanding.

Brad Folkens is Co-Founder and CTO of CloudSight
, where he oversees all technical aspects of the image recognition platform and app development. He earned his Bachelor of Science in Theoretical Computer Science and Mathematics from Northern Illinois University. Prior to creating CamFind, he co-founded CityTech USA, notable for creating the web-based collaborative government website PublicSalary.com.

It’s a game changer. Desktop search will be there but what people are looking to right now is the easiest and most reliable way to search, with no phishing ads popping allover your desktop, not even those video or text ads going through allover your phone while playing, Almost everyone of us has smartphones which is a big hit for new way to search. In a single tap and shoot, I guess Image recognition applications is awesome, relevant, easiest, and fun way to search for items and products on web.