I had the great fortune to speak twice at the Intelligent Content Conference in Las Vegas on March 30. I said to a colleague there that the conference agenda fit into the bull’s-eye of my skills and interests. Several of the talks had the word “AI” in them. Other common words included “outside-in” “big data” and “content marketing.” For those who know of our book: Outside-In Marketing: Using Big Data to Guide Your Content Marketing, it should be obvious that the conference was highly relevant to my interests.

Indeed, everywhere I went, I was able to connect conference talks with my work. But there were three “Aha!” moments amidst the sea of validation for my work. I thought I would present three of them here.

1. Data-driven content marketing is the future

In the keynote, he told the story about how he acquired ICC in 2014. Early that year, he was picking the winners for his annual CMA awards, and he noticed that they all were excellent in one way: they were splashy, multi-channel, agency-led, branded content campaigns.

It didn’t win an award that year, but the Dos Equis campaign about “the most interesting man in the world,” is an example of the kind of thing Joe referenced. It won several awards prior to 2014 and positively influenced sales for the beer over the course of years. By many accounts, it was a very successful campaign. But it was not a digital-first campaign. They did very little audience research beforehand. They measured stuff after the fact, but they didn’t bring data into the planning or execution of the campaign. They followed the original creative concept to a T.

The goal of content marketing is to create the minimum amount of content for the maximum business benefit.

Pulizzi figured at ICC 2014 that this was the gap CMI needed to fill. So he acquired ICC from Ann Rockley and Scott Abel, and pivoted the mission of CMI towards data-driven content marketing. In his 2017 keynote, he said we are at the cusp of transforming the entire marketing discipline towards data-driven content marketing. And all of the sessions at the conference were about how you transform a marketing organization to be more data-driven.

2. AI is hard

My IBM colleague Pavan Arora gave a keynote on how Watson Knowledge Studio can be used to build more data-driven marketing. Rather than painting an overly optimistic picture, Pavan focused on the challenges in building artificial intelligence into data-driven marketing systems.

As someone who has been building cognitive systems into the IBM content stack, I wanted to hug him. So many people have asked me if I could “just point Watson at content.” When I start to explain why you can’t simply do that, they take out their phones to check their email. Pavan had a better way of explaining the challenges than I ever did in those conversations. Let me share two highlights.

Data cleansing: Perhaps the hardest part of AI is making sure the data in the system is relevant to the content being created or optimized. It’s easy enough to run a text extraction program against a large repository of generic content, such as DBPedia—the database of content in Wikipedia. But many of those entities the system extracts will be irrelevant to your audience because they don’t relate to how you differentiate your business against competitors.
These false positives fall into two categories: so-called laughers and everything else. Laughers are those obviously irrelevant phrases that any human with some experience in your company can identify but the system cannot. An example of a laugher from a recent test had to do with the lyrics from a Bob Dylan song. Because IBM did an ad campaign that associated Watson with Bob Dylan, all of the main phrases from all of his lyrics also got associated with Watson in the training data. It took a few weeks for a data scientist to remove all of those false positives. The second type of false positives are more difficult to eradicate because they are not obviously irrelevant, even for humans. They might be irrelevant. Or they might be opportunities to extend your content footprint. For example, haptics continually comes up when we extract concepts related to cognitive computing. It might be relevant, especially in mobile device-related contexts. Sometimes you just have to test concepts to see if they resonate with the audience. In any case, AI does not produce certainty, but only degrees of confidence. The first step is to set your thresholds for your degrees of confidence so you can develop an acceptable level of relevance for your audience.

Domain-specific training: Text extraction or other natural language processing (NLP) applications are the easiest parts of the AI process. The harder part is machine learning. NLP only gets you so far. You also have to work through the data with human subject matter experts to help make adjustments to the both the data and the algorithms. Once you get a certain level of accuracy, you can let the system learn by itself. But domain-specific knowledge is necessary to even get into that game. For large and various organizations such as IBM, you need different trainers for each domain within the company. Then you have to stitch all the domains together with clustering technology. It’s a long and laborious process. And it never really ends, because you have to continually improve the system after you develop a minimum viable product.

Several of the talks at ICC were about AI in content strategy and content marketing, including my own. But Pavan’s was the best one to paint an accurate picture of how hard it is to use AI for content marketing.

3. Less is more

The best talk at the show was by Marcus Tober of Searchmetrics. Of all the things he showed, the one that blew me away was a case study of a company that decided to delete 95 percent of the content on its website. Now, this was a difficult task of deciding what to delete and how to redirect the apparent duplicates to the pages they decided to keep. But the results were almost instantaneously positive. The search visibility of this company’s website went up and up overnight. At the time he pulled the data, it was going up by 40 percent per month.

Why? The likely answer is Google was penalizing the company for duplicate content. For Google, duplicate content is not just pages that have identical content. Duplicates can have different designs, serve different brands, and exist in different parts of a URL tree. Google’s own definition of duplicate content suggests that you only need a 60 percent content match between two pages in one domain for Google to consider the two pages “appreciably similar.” Think about how many pages on your website meet that criterion. For the company Tober profiled, they were able to reduce their content to just the unique stuff, and Google rewarded them instantly. He said he has dozens of clients with similar stories.

There are other good reasons to minimize your content footprint. For one thing, the crawlers and bots have an easier time finding and indexing the content. Also, it’s a lot easier for humans to find the content most relevant to a topic for a company when they don’t have to choose between multiple apparent duplicates. The striking thing about Tober’s talk was that he was able to show that minimizing your content footprint causes an improvement in business results. It’s not a trade-off between competing goods—more content on one side and better business results on the other. Minimizing content is itself a strategy to improve results.

James Mathewson is IBM's Distinguished Technical Marketer for search. He has 20 years of experience in web editorial, content strategy, and SEO for large and small companies. A frequent speaker, lecturer and blogger, James has published more than 1600 articles and two books on how web technology and user experience change the nature of effective content. James has two advanced degrees on related subjects from the University of Minnesota.