The days of succeeding with email campaigns while sending a generic message are done. No one disagrees with this point. Everyone sees response rates dropping. The answer? On this point, people in the know also agree: The answer is segmenting your audience and sending a targeted message. The debate approaches when discussing how to accomplish segmentation.

Much has been written recently on Account Based Marketing (ABM). Simple concept: determine as many attributes you can about your target accounts. Use those attributes to pick the best companies to reach out to. Great! This IS segmentation, however, there is a brilliant opportunity here, within reach, that is being overlooked.

The opportunity is segmenting your prospects (and clients) titles to more effectively target market with them. Again, no one disagrees that this should be done, but how is it currently being done?

Currently we see people building lists of 100’s (or more) phrases to match and segment. Example: “Marketing” OR “Marktg” OR “Mrktg”. These lists get long and have some inherent flaws:

The lists are NOT comprehensive. Their will always be exceptions. There are some amazing, talented, consultants that have their “golden list” of match phrases like this. The problem is that you will need consultants to maintain and modify if you need changes. Not sustainable or efficient. This is a brute force approach.

If you segment on a department like Marketing, you will miss the title level, such as VP. Conversely, if you segment on level, like VP, you miss the department. If you try to segment on both, now you have multiple lists of 100’s of phrases that again, it grows into the 1000’s if you want it to be comprehensive.

Analyzing connected relationship becomes unmanageable. For example: Knowing that you have a VP of Marketing that influences two Directors of Marketing? You can’t establish this connection without multiple attributes that have been pre-calculated.

Having a single attribute like Title Level OR Department makes the logic crazy complex. Trying to build a campaign that includes the “Top Marketing Contact”, “Top Sales Contact” and “Top Operations Contact” is only a dream when you only have a single segmentation bucket.

It’s not fun. Complexity should be hidden and the average campaign manager should be able to set up brilliant campaign. Democratize it.

After 14 years of data mining and working with massive amounts of contact data, I’ll fast forward and give the answer: If you segment on both DEPARTMENT and TITLE LEVEL, you can accomplish, I’ll say it again, brilliant, segmented, campaigns that are easy to execute. Watch the video for a nice visual walk-through of the concepts.

99% of the Whitney Houston Tweets were exactly, “RIP Whitney Houston.” Okay… but how did she move you? Do you remember a special dance with that one girl while she was singing? Was her voice so beautiful that it made you tear up? It was for me. Originality is there, but buried on Twitter. I would have enjoyed others insights on Whitney, to feel camaraderie in a shared loss. If it existed on Twitter, it was obfuscated behind all the drone “RIP Whitney Houston” tweets. So instead I played some Whitney songs and told my children who the woman with the beautiful voice was.

Twitter is big data.

“Big data” is making the news. The concept has crept from the back pages of technical publications into the mainstream. It’s a new topic, so the reporters have commandeered it. It’s becoming popular, and that’s too bad. Media feeding frenzies perpetuate the peripheral definition; articles get copied over and over again, and people stop thinking.

With their IPO in the news, Facebook has become the poster child for big data. So what is it? What is big data? Simply put, massive amounts of information about millions, and eventually, billions of people. Big data is making the news because of fear – fear of the possibilities of abuse. It sells newspapers, gets clicks, and page views which means we will be hearing a lot about big data. Scare people and make money.

Facebook is big data.

Google is changing its privacy policy. Another media feeding frenzy. If you have a Gmail account, Google+, music, shopping, etc. All the privacy policies are melding into one. I like the idea and I have to admit, I don’t understand the problems people are having. If you use 5 or 10 different Google services, are you really going to read many different user agreements? I don’t know anyone who actually does. I would prefer to have one policy that covers them all. Google gives these services away, if you don’t like that one, single policy – stop using the service. The chances of people being informed about Google’s policies will increase if they have a single policy. It’s a good thing. Stop the bitching.

Google is big data.

Another bit in the news. The Seattle Times reports a top porn site, Brazzers, was hacked. From the article, and other news about regarding it, usernames, passwords and real names were hacked. The data is making its way across the Internet on file sharing sites.

Internet user databases are big data.

In my vision of the world, big data is in its infancy. Don’t freak out for at least 10 years.

Why now? Why is big data coming into mainstream now? It has been around for many years. Large data providers like Experian, Axiom, and D&B have been collecting data for a long time. What is different now? To ask “why now,” you must understand the continuum of getting at big data.

11 big Data Prerequisites

The data must be there – this is the most exciting tipping point. In being the CEO of a data-mining software company, I’m still dumbfounded when users expect to get information off the web…that is not there. It must actually exist.

You must be able to flag it – you can’t store everything and must make choices. What is important? When does it happen? Example: News release with subject: Nanotechnology

You must be able to find it – in the absence of a real-time data stream, you must able to search though data to find a “flag” of what you are looking for.

You must be able to parse it – this is the analysis of relevant grammatical constituents, identifying the parts of what you need, from within potential noise. Example: parsing out the name of an inventor from within an article on nanotechnology

You must be able to extract it – Not the same as parsing. What if the data is in a PDF file or HTML web page? In many cases, extraction is about access. Is the data I am looking for across 5 sub-links of a single web page? Extraction as it relates to the Internet also encapsulates web crawling.

You must be able to process it – This takes CPU cycles. Bigger problems need bigger computers.

You must normalize it – If you have multiple pieces of data on “The Container Company”, “Container Company, The”, “The Container Co”, etc, how do you merge that data? You must normalize like entities to a standard “canonical form”. With out it, we’ve got the Data Tower of Babel.

You must be able to store it – Big data takes up disk space.

You must be able to index it – If you ever want to find it after you store it, the data needs to be indexed. This also means more disk space.

You must be able to analyze it – big data needs big (or many distributed) CPU’s to crunch the numbers and garner order from the chaos.

There must be a payoff – Putting together big data is expensive. Without a end goal in mind, it is expensive to collect. Google & Facebook collect, process, index & store data for profit.

So what is my vision of “big data”? What is being talked about in the media is very short sighted. I think I know where big data is going. I’m basing my vision on my prerequisites.

Big Data Thoughts

1: Information is growing beyond the ability of any single source to store and index everything. Therefore, big data can never be “all data.” Facebook and Google cannot store everything. Therefore choices must be made. Google already does it; indexing what they deem relevant.

2: The amount of data about people on Facebook is paltry…compared to the maximum possibilities. Yes, in aggregate, it is the largest set of minimal data. Think for a second about your day. What would it take to record your entire life in HD, from 7 different angles. This future data stream would include everything you heard, read, and generally interacted with.

3: Mass, personal data recording is on the horizon. The first phase is already starting. The only limit is reasonable storage. The term is called “LifeLogging.” There are devices that you can wear and it will take a picture every 30 seconds. High quality LifeLogging technology will be critical in the future. Every 30 seconds is 1/900th of video (30 frames per second). If the Lifelogging device is just the conduit vs. the storage medium, the lifelog could be stored on your home PC. With h.264 video compression and 5.5 hours of 1080p video can be stored on a 32GB thumb drive. That means a single 1TB (terabyte) drive can hold 176 hours of hi definition video (7.3 days of video). It would be expensive today to buy 52 X 1TB drives to store a year of your life. It seems crazy… right? Not when you are a historian. In 1992, the average hard drive was around 1GB – 1000 times less than today.

Some ideas to reduce the storage size of LifeLogging:

-Go vector. If you have an avatar created of you, a vectorized version of you could be stored. This type of compression does not exist, but it will. LifeLogging in bitmap video is like a tape deck. Vectorizing video with the lifelogee as the center of the story would save 1000X the storage. It is like the hard drive compared to tape storage. In addition, storing data in this way could be accessed very quickly. Bottom line: with the right *Software* real LifeLogging could be done today. I should save this for another in-depth blog. I’ve spent many nights thinking about how it all could be done. I’ve got to stop watching Sci- Fi before bed. Lawn Mower Man

4: Assume that we are in the 2020′s. Based on Moore’s Law, and several others, A LifeLogging device will be able to be worn around your neck, and record your life in HD. They’ll probably be the price of premium iPad. At that level, LifeLogging is ubiquitous.

5. What did I eat today? What about over the past week, month, or year? Just because that information, is recorded, as video (me munching Apple), does not mean that it can be analyzed and recognized as Donato-eats-apple. Where did you buy that Apple? Can the date of the purchase be cross referenced with the date that you bought it at the grocery store?

New industries

Software that analyzes and makes inferences from LifeStreaming (the will be a multi-billion dollar industry. (Donato ate apple, Donato started car, Donato got phone call, Donato was watching the movie Contact). I would expect that each major type of world interaction would be handled by a different app or algorithm.

Software that compiles inferences, builds statistics and performs what-ifs on mass LifeStream data will be multi-billion dollar industry. (23% of people that ate apples 4x per month, where the apples came from Chile, and most likely were treated with chemical X, developed cancer by age 55). These are the types of discoveries we will be able to make that are currently only made by virtue of a happy accident. (I made up that example…but do eat organic apples).

Example: compiling a list of the junk (postal) mail letters that I throw out without opening. That is good data. What is the one that I opened?

Software that manages the rights, payments, connectivity and privacy between life streams will be a multi-billion dollar industry. So if that apple from Chile used some real nasty pesticides – like a carcinogen? Could that supplier of that apple to the store be tracked? Do you want to know this? What if your wife bought it… and it is not part of your personal data stream? Do you and you wife have a LifeStream sharing agreement?

One person, eating one Apple does not a trend make. Multiply that by 50 million people over 5 years. This is not science fiction. This is simply faster computers, more memory, and analysis software. It’s a lot of Apples. Do I want to share, if it was anonymous, my eating habits and cross reference it with my health…maybe.

I expect that companies will pop up, each with a different set of analysis technology for different niches. It will probably evolve into an AppStore model. One company looks at how you interact with media, what you watch, listen to, theaters attended. Another knows what you eat. You can choose which feeds to share with the greater LifeStream and take part in a greater community.

By the way, none of this LifeStreaming will be on Facebook, or Google+. No one would trust them. In addition, it would be prohibitively expensive to centrally transmit, store and analyze it. Hmmm, maybe Facebook could be the trend builder? It is well positioned for it. Can you imagine it?

Donato ate an Apple
Donato threw core in garbage
Donato did not recycle V8 can
Donato is driving 15 miles over the speed limit

This is the first time in a few years that I thought of a way for Facebook to survive long term. In this Facebook, you would never log in to look at what people are doing, you would log in to see that latest trend and how it affected you.

I just hope it does not make it to twitter and get retweeted by the “RIP Whitney Houston” drones. Once analysis agents can understand (and broadcast) our individual actions, Twitter has no reason to exist.

I’ve tried this and it works every time. How long it will work, who knows.

If you google a specific page and then search the page again… but this time using the last few words from the excerpt of the page results… you can use this technique to actually scroll through an entire web page.

It’s like peering through a small cache window a section of the page at a time.

Resumes sites, Linkedin, and many others. Maybe call it a recursion search?

What is semantic search? To put is simple: semantic search can take, as input, a word like “Java” and offers up other related terms like “J2EE” or “Beans” (both are related to Java). This allows the user to type in a few terms but match many, many terms.

The matching terms are built into an “expert system” that is continually built over time. Many fancy names are given to these systems, based on how they are built, but basically they are sets of rules.

Semantic search is not AI (artificial intelligence). If you hear that, it probably started in a marketing department somewhere.

Companies that have built semantic search engines, while they have not created AI, have spent a tremendous amount of time and resources to build these sets of rules. The better engines can build rules on the fly from a new set of data, like resumes. This is very cool stuff.

Overall, I like semantic search. It has great potential, however, it has great weaknesses if used incorrectly. If built into the engine itself, semantic search can be very powerful, this is because semantic processing is done at the search engine side, without any limitations or constraints. However, if bolted onto a search engine, it can be more harmful than good.

Here is what I mean. I’ll try to keep my logic simple.

1. The Google search engine has a limit in how many terms can be submitted to it.

2. Semantic search, by it’s nature, creates permutations upon given terms. For example:

“Senior VP of Sales” can be “SVP Sales” or “Senior Vice President of Sales”

to translate that into a boolean expression you get

“senior vp of sales” OR “SVP sales” OR “senior vice president of sales”

3. After creating permutations upon several concepts, you are out of search terms.

I’m a big believe in laws (maybe not speed-limit laws), but more the “laws of the universe” type stuff. I like to understand and deconstruct the rules and see if each one stands alone, or, do I need to recheck my premises. In this spirit, just before the first sourceCon conference, I developed the Seven Laws of Internet Research. I felt there was too much emphasis on memorizing search strings and the latest search engines or sites, but not enough fundamental thought leadership on how to think about searching the Internet.

The first two laws are

1. The Law of Permutation
2. The Law of Completeness

The Law of Permutation simply states that when searching the Internet, as it is not a homogeneous source of data, you must describe what you are looking for in the language of the many vs. the language of the one. (YES, this is what Semantic search is doing).

The Law of Completeness states you must strive for completeness of search engine results in order to have the superior outcome

Big Question: What happens if semantic search is applied before you reach completeness of results?

Answer: Missing data. Competitors eat your lunch. If you are a sales person, it means missed sales leads, if you are a recruiter, it means missed resumes or passive candidates.

Does this mean that I am anti-semantic search? No way. I think it has great potential.

Here are my take-aways:

-Semantic search should be inside the search engine for optimal results

-Semantic search will cause data to be missed if applied before reaching completeness of possible results

-When combining a standard search engine and semantic search, it is best to apply the semantic processing AFTER completeness of data has been reached. In reality, this would not be semantic search, but semantic filtering.

Having worked with many databases as well as having extensive experience in searching the Internet, I thought I’d share some thoughts on the differences between the two.

When I observe people searching the Internet, there is a common mistake I see them making. Most people search the Internet like they are searching a database.Don’t get me wrong, the Internet does include databases.Thomas Register, Spoke and Zoominfo are examples of different types of databases.Via different methods, information is added to these data sources and some sort of query mechanism is provided the subscribers.Can you use the Zoom query on Thomas Register and visa versa?No, these are proprietary systems that have search methods specialized to the content inside them. Each of these databases is limited, incomplete, but stored in a homogonous fashion.

“Data normalization” is a phrase that leaves a blank stare on most peoples faces. Here is a secret: it is really simple.

Here is the inside scoop: Technology people have a secret club, complete with handshake and everything. It’s a club that we don’t want outsiders in. So we create these long phrases that make peoples eyes glass over. Why? Because if everyone understood what we do, then we wouldn’t make the big bucks. Being a recovering technologist, I’m on a continually journey to lose my geek speak. So get ready, here is the skinny on Data Normalization (more…)