Blind Five Year Oldhttp://www.blindfiveyearold.com
SEO, SEM, Marketing and Technology sprinkled with Sports, Parenting and RantsWed, 09 Aug 2017 16:52:36 +0000en-UShourly1BlindFiveYearOldhttps://feedburner.google.comAnalyzing Position in Google Search Consolehttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/WDXooKTNfU0/analyzing-position-in-google-search-console
http://www.blindfiveyearold.com/analyzing-position-in-google-search-console#commentsTue, 18 Jul 2017 22:26:00 +0000http://www.blindfiveyearold.com/?p=9978Clients and even conference presenters are using Google Search Console’s position wrong. It’s an easy mistake to make. Here’s why you should only trust position when looking at query data and not page or site data.

Position

Google has a lot of information on how they calculate position and what it means. The content here is pretty dense and none of it really tells you how to read and when to rely on the position data. And that’s where most are making mistakes.

Right now many look at the position as a simple binary metric. The graph shows it going down, that’s bad. The graph shows it going up, that’s good. The brain is wired to find these shortcuts and accept them.

As I write this there is a thread about there being a bug in the position metric. There could be. Maybe new voice search data was accidentally exposed? Or it might be that people aren’t drilling down to the query level to get the full story.

Too often, the data isn’t wrong. The error is in how people read and interpret the data.

The Position Problem

The best way to explain this is to actually show it in action.

A week ago a client got very concerned about how a particular page was performing. The email I received asked me to theorize why the rank for the page dropped so much without them doing anything. “Is it an algorithm change?” No.

If you compare the metrics day over day it does look pretty dismal. But looks can be deceiving.

At the page level you see data for all of the queries that generated an impression for the page in question. A funny thing happens when you select Queries and look at the actual data.

Suddenly you see that on July 7th the page received impressions for queries that were not well ranked.

It doesn’t take a lot of these impressions to skew your average position.

A look at the top terms for that page shows some movement but nothing so dramatic that you’d panic.

Which brings us to the next flaw in looking at this data. One day is not like the other.

July 6th is a Thursday and July 7th is a Friday. Now, usually the difference between weekdays isn’t as wide as it is between a weekday and a weekend but it’s always smart to look at the data from the same day in the prior week.

Sure enough it looks like this page received a similar expansion of low ranked queries the prior Friday.

There’s a final factor that influences this analysis. Seasonality. The time in question is right around July 4th. So query volume and behavior are going to be different.

Unfortunately, we don’t have last year’s data in Search Analytics. These days I spend most of my time doing year over year analysis. It makes analyzing seasonality so much easier. Getting this into Search Analytics would be extremely useful.

Analyzing Algorithm Changes

The biggest danger comes when there is an algorithm change and you’re analyzing position with a bastardized version of regex. Looking at the average position for a set of pages (i.e. – a folder) before and after an algorithm change can be tricky.

The average position could go down because those pages are now being served to more queries. And in those additional queries those pages don’t rank as high. This is actually quite normal. So if you don’t go down to the query level data you might make some poor decisions.

One easy way to avoid making this mistake is to think hard when you see impressions going up but position going down.

When this type of query expansion happens the total traffic to those pages is usually going up so the poor decision won’t be catastrophic. It’s not like you’d decide to sunset that page type.

Instead, two things happen. First, people lose confidence in the data. “The position went down but traffic is up! The data they give just sucks. You can’t trust it. Screw you Google!”

Second, you miss opportunities for additional traffic. You might have suddenly broken through at the bottom of page one for a head term. If you miss that you lose the opportunity to tweak the page for that term.

Or you might have appeared for a new query class. And once you do, you can often claim the featured snippet with a few formatting changes. Been there, done that.

Using the average position metric for a page or group of pages will lead to sub-optimal decisions. Don’t do it.

Number of Queries Per Page

This is all related to an old metric I used to love and track religiously.

Back in the stone ages of the Internet before not provided one of my favorite metrics was the number of keywords driving traffic to a page. I could see when a page gained enough authority that it started to appear and draw traffic from other queries. Along with this metric I looked at traffic received per keyword.

These numbers were all related but would ebb and flow togther as you gained more exposure.

Right now Google doesn’t return all the queries. Long-tail queries are suppressed because they’re personally identifiable. I would love to see them add something that gave us a roll-up of the queries they aren’t showing.

124 queries, 3,456 impressions, 7.3% CTR, 3.4 position

I’d actually like a roll-up of all the queries that are reported along with the combined total too. That way I could track the trend of visible queries, “invisible” queries and the total for that page or site.

The reason the number of queries matters is that as that page hits on new queries you rarely start at the top of those SERPs. So when Google starts testing that page on an expanded number of SERPs you’ll find that position will go down.

This doesn’t mean that the position of the terms you were ranking for goes down. It just means that the new terms you rank for were lower. So when you add them in, the average position declines.

Adding the roll-up data might give people a visual signpost that would prevent them from making the position mistake.

TL;DR

Google Search Console position data is only stable when looking at a single query. The position data for a site or page will be accurate but is aggregated by all queries.

In general, be on the look out for query expansion where a site or page receives additional impressions on new terms where they don’t rank well. When the red line goes up and the green goes down that could be a good thing.

]]>http://www.blindfiveyearold.com/analyzing-position-in-google-search-console/feed14http://www.blindfiveyearold.com/analyzing-position-in-google-search-consoleIgnoring Link Spam Isn’t Workinghttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/ls50psrKbds/ignoring-link-spam-isnt-working
http://www.blindfiveyearold.com/ignoring-link-spam-isnt-working#commentsThu, 06 Jul 2017 16:37:45 +0000http://www.blindfiveyearold.com/?p=9922Link spam is on the rise again. Why? Because it’s working. The reason it’s working is that demand is up based on Google’s change from penalization to neutralization.

Google might be pretty good at ignoring links. But pretty good isn’t good enough.

Neutralize vs Penalize

For a very long time Google didn’t penalize paid or manipulative links but instead neutralized them, which is a fancy way of saying they ignored those links. But then there was a crisis in search quality and Google switched to penalizing sites for thin content (Panda) and over optimized links (Penguin).

The SEO industry underwent a huge transformation as a result.

I saw this as a positive change despite having a few clients get hit and seeing the industry throw the baby (technical SEO) out with the bathwater. The playing field evened and those who weren’t allergic to work had a much better chance of success.

Virtually Spotless

This Cascade campaign and claim is one of my favorites as a marketer. Because ‘virtually spotless’ means those glasses … have spots. They might have less spots than the competition but make no mistake, they still have spots.

First, I doubt they can actually pull that off. Second, we’re pretty good at ignoring links, regardless of the site.

This was Gary’s response to a Tweet about folks peddling links from sites like Forbes and Entrepreneur. I like Gary. He’s also correct. Unfortunately, none of that matters.

Pretty good is the same as virtually spotless.

Unless neutralization is wildly effective in the first month those links are found then it will ultimately lead to more successful link spam. And that’s what I’m seeing. Over the last year link spam is working far more often, in more verticals and for more valuable keywords.

So when Google says they’re pretty good at ignoring link spam that means some of the link spam is working. They’re not catching 100%. Not by a long shot.

Perspective

One of the issues is that, from a Google perspective, the difference might seem small. But to sites and to search marketing professionals, the differences are material.

I had a similar debate after Matt Cutts said there wasn’t much of a difference between having your blog in a subdomain versus having it in a subfolder. The key to that statement was ‘much of’, which meant there was a difference.

It seemed small to Matt and Google but if you’re fighting for search traffic, it might turn out to be material. Even if it is small, do you want to leave that gain on the table? SEO success comes through a thousand optimizations.

Cost vs Benefit

Perhaps Google neutralizes 80% of the link spam. That means that 20% of the link spam works. Sure, the overall cost for doing it goes up but here’s the problem. It doesn’t cost that much.

Link spam can be done at scale and be done without a huge investment. It’s certainly less costly than the alternative. So the idea that neutralizing a majority of it will help end the practice is specious. Enough of it works and when it works it provides a huge return.

It’s sort of like a demented version of index investing. The low fee structure and broad diversification mean you can win even if many of the stocks in that index aren’t performing.

Risk vs Reward

Panda and Penguin suddenly made thin content and link spam risky. Sure it didn’t cost a lot to produce. But if you got caught, it could essentially put your site six feet under.

Suddenly, the reward for these practices had to be a lot higher to offset that risk.

The SEO industry moaned and bellyached. It’s their default reaction. But penalization worked. Content got better and link spam was severely marginalized. Those who sold the links were now offering link removal services. Because the folks who might buy links … weren’t in the market anymore.

The risk of penalty took demand out of the market.

Link Spam

I’m sure many of you are seeing more and more emails peddling links showing up in your inbox.

Some of them are laughable. Yet, that’s what makes it all the more sad. It shows just how low the bar is right now for making link spam work.

There are also more sophisticated link spam efforts, including syndication spam. Here, you produce content once with rich anchor text (often on your own site) and then syndicate that content to other platforms that will provide clean followed links. I’ve seen both public and private syndication networks deliver results.

I won’t offer a blow-by-blow of this or other link manipulation techniques. There are better places for that and others who are far more versed in the details.

The response by John Mueller (another guy I like and respect) is par for the course.

The tricky part about issues like these is that our algorithms (and the manual webspam team) often take very specific action on links like these; just because the sites are still indexed doesn’t mean that they’re profiting from those links.

The problem? Many of us are seeing these tactics achieve results. Maybe Google does catch the majority of this spam. But enough sneaks through that it’s working.

Now, I’m sure many will argue that there are other reasons a site might have ranked for a specific term. Know what? They might be right. But think about it for a moment. If you were able to rank well for a term, why would you employ this type of link spam tactic?

Even if you rationalize that a site is simply using everything at their disposal to rank, you’d then have to accept that fear of penalty was no longer driving sites out of the link manipulation market.

Furthermore, by letting link manipulation survive ‘visually’ it becomes very easy for other site owners to come to the conclusion (erroneous or not) that these tactics do work. The old ‘perception is reality’ adage takes over and demand rises.

So while Google snickers thinking spammers are wasting money on these links it’s the spammers who are laughing all the way to the bank. Low overhead costs make even inefficient link manipulation profitable in a high demand market.

I’ve advised clients that I see this problem getting worse in the next 12-18 months until it reaches a critical mass that will force Google to revert back to some sort of penalization.

TL;DR

Link spam is falling through the cracks and working more often as Google’s shift to ignoring link spam versus penalizing it creates a “sellers market” that fuels link spam growth.

]]>http://www.blindfiveyearold.com/ignoring-link-spam-isnt-working/feed37http://www.blindfiveyearold.com/ignoring-link-spam-isnt-workingWhat I Learned in 2016http://feedproxy.google.com/~r/BlindFiveYearOld/~3/qpIPeQoUqUI/what-i-learned-in-2016
http://www.blindfiveyearold.com/what-i-learned-in-2016#commentsMon, 02 Jan 2017 20:18:44 +0000http://www.blindfiveyearold.com/?p=9889(This is a personal post so if that isn’t your thing then you should move on.)

2016 was the year where things went back to normal. My cancer was in remission, family life was great and business was booming. But that ‘normal’ created issues that are rarely discussed. Managing success is harder than I expected.

Success

I made it. Blind Five Year Old is a success. Even through my chemotherapy, I kept the business going without any dip in revenue. Looking at the numbers, I’ve had the same revenue four years in a row. That’s a good thing. It’s a revenue figure that makes life pretty darn comfortable.

It wasn’t always like this. Back in 2010 I was always waiting for the other shoe to drop. Even as I put together back-to-back years of great business revenue I still had that paranoia. What if things dried up? But in 2016, cancer in the rear view, I felt bulletproof. The result? I was restless and, at times, unmotivated.

Guilt

You don’t hear a lot about this topic because you feel guilty talking about it. You’ve got to figure you’re going to come off like a douchebag complaining about success when so many others are struggling.

I’ve been dealing with that not just in writing about it but in living it too. While I’ve never been poor, I’ve often lived paycheck to paycheck. At one point I was out of work and $25,000 in debt.

My wife and I lived in an apartment for 10 years, saving like crazy so we could buy a house in the Bay Area. And once bought, we were anxious about making it all work. I had nightmares about being foreclosed on.

But we made it. I worked hard to build my business and we made smart moves financially, refinancing our mortgage twice until we had an amazing rate and very manageable mortgage payment. My wife was the backbone of the household, keeping everything going and making it easy for me to concentrate on the business.

For a long time it was all about getting there – about the struggle. Even as the business soared we then had to tackle cancer. Now, well now things are … easy.

Easy Street

It’s strange to think how easy it is to just … buy what you want. Now, I’m not saying I can run out and buy my own private island. I’m not super-rich. But I’m not concerned about paying the bills. I’m not thinking whether I can afford to give my daughter tennis lessons or get my wife a leather jacket or buy a new phone. I just do those things.

And that feels strange … and wrong in some ways. Because I know that life isn’t like this for the vast majority.

Of course, I can rationalize some of this by pointing to my work ethic, attention to detail and willingness to take risks. No doubt I benefited from some friendships. I didn’t get here alone. But that too was something I cultivated. I try not to be a dick and generally try to be helpful.

But it’s still unsettling to be so comfortable. Not just because I keenly feel my privilege but also because it saps ambition.

Is That All?

When you’re comfortable, and feeling guilty about that, you often start to look for the next mountain to climb. I think that’s human nature. If you’ve made it then you look around and ask, is that all? Am I just going to keep doing this for the next twenty years?

For me, this presents a bit of a problem. I’m not keen on building an agency. I know a bunch of folks who are doing this but I don’t think it’s for me. I don’t enjoy managing people and I’m too much of a perfectionist to be as hands off as I’d need to be.

I took a few advisor positions (one of which had a positive exit last year) and will continue to seek those out. Perhaps that’s the ‘next thing’ for me, but I’m not so sure. Even if it is, it seems like an extension of what I’m doing now anyway.

Enjoy The Groove

In the last few months I’ve come to terms with where I am. There doesn’t necessarily need to be a ‘second act’. I like what I do and I like the life I’ve carved out for myself and my family. If this is it … that’s amazing.

I remember keenly the ‘where do you see yourself in five years’ question I’d get when interviewing. Working in the start-up community, I never understood why people asked that question. Things change so fast. Two years at a job here is a long time. Opportunities abound. Calamity can upset the applecart. Any answer you give is wrong.

I’m not saying I’m letting the random nature of life direct me. What I’m saying is more like an analogy from basketball. I’m no longer going to force my shot. I’m going to let the game come to me. But when it does I’ll be ready to sink that three.

Staying Motivated

So how do you stay ready? That to me is the real issue when you reach a certain level of success. How do you keep going? How do you stay motivated so you’re ready when the next opportunity comes up?

There’s a real practical reason to keep things going right? The money is good. I’m putting money away towards my daughter’s college education and retirement. Every year when I can put chunks of money away like that I’m winning.

But when you’re comfortable and you feel like you’re on top of the world it’s hard to get motivated by money. At least that’s how it is for me. To be honest, I haven’t figured this one out completely. But here’s what I know has been helping.

Believe In Your Value

Over the last few years there’s been a surge in folks talking about imposter syndrome. While I certainly don’t think I’m a fraud, there’s an important aspect in imposter syndrome revolving around value.

I’m not a huge self-promoter. Don’t get my wrong, I’ll often humble brag in person or via IM and am enormously proud of my clients and the success I’ve had over the last decade. But I don’t Tweet the nice things others say about me or post something on Facebook about the interactions I have with ‘fans’. I even have issues promoting speaking gigs at conferences and interviews. I’m sure it drives people crazy.

What I realized is that I was internalizing this distaste for self-promotion and that was toxic.

That doesn’t mean you’ll see me patting myself on the back via social media in 2017. What it means is that I’m no longer doubting the value of my time and expertise. Sounds egotistical. Maybe it is. But maybe that’s what it takes.

Give Me A Break

Going hand in hand with believing in your own value is giving yourself a break. I often beat myself up when I don’t return email quickly. Even as the volume of email increased, and it still does, I felt like a failure when I let emails go unanswered. The longer they went unanswered, the more epic the reply I thought I’d need to send, which meant I didn’t respond … again. #viciouscycle

A year or so ago I mentioned in an email to Jen Lopez how in awe I was at the timely responses I’d get from Rand. She sort of chided me and stated that this was Rand’s primary job but not mine. It was like comparing apples and oranges. The exchange stuck with me. I’m not Superman. Hell, I’m not even Batman.

I do the very best I can but that doesn’t mean that I don’t make mistakes or drop the ball. And that’s okay. Wake up the next day and do the very best you can again. Seems like that’s worked out well so far.

Rev The Engine

All of my work is online. That’s just the nature of my business. But I find that taking care of some offline tasks can help to rev the engine and get me going online. Folding my laundry is like Liquid Draino to work procrastination.

I don’t know if it’s just getting away from the computer or the ability to finish a task and feel good about it that makes it so effective. I just know it works.

I’m grateful for where I am in my life. I know I didn’t get here alone. My wife is simply … amazing. And I’m consistently stunned at what my daughter says and does as she grows up. And it’s great to have my parents nearby.

There have also been numerous people throughout my life who have helped me in so many ways. There was Terry ‘Moonman’ Moon who I played video games with at the local pizza place growing up. “You’re not going down the same road,” he told me referring to drugs. There was Jordan Prusack, who shielded me from a bunch of high school clique crap by simply saying I was cool. (He probably doesn’t even remember it.)

In business, I’ve had so many people who have gone out of their way to help me. Someone always seemed there with a lifeline. Just the other day I connected with someone and we had a mutual friend in common – Tristan Money – the guy who gave me my second chance in the dot com industry. I remember him opening a beer bottle with a very large knife too.

Kindness comes in many sizes. Sometimes it’s something big and sometimes it’s just an offhand comment that makes the difference. My life is littered with the kindness of others. I like to remember that so that I make it habit to do the same. And that’s as good a place to stop as any.

]]>http://www.blindfiveyearold.com/what-i-learned-in-2016/feed13http://www.blindfiveyearold.com/what-i-learned-in-2016The Future of Mobile Searchhttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/znsilqtzxTU/the-future-of-mobile-search
http://www.blindfiveyearold.com/the-future-of-mobile-search#commentsMon, 29 Aug 2016 19:17:04 +0000http://www.blindfiveyearold.com/?p=9849What if I told you that the future of mobile search was swiping.

I don’t mean that there will be a few carousels of content. Instead I mean that all of the content will be displayed in a horizontal swiping interface. You wouldn’t click on a search result, you’d simply swipe from one result to the next.

This might sound farfetched but there’s growing evidence this might be Google’s end game. The Tinderization of mobile search could be right around the corner.

Horizontal Interface

Google has been playing with horizontal interfaces on mobile search for some time now. You can find it under certain Twitter profiles.

There’s one for videos.

And another for recipes.

There are plenty of other examples. But the most important one is the one for AMP.

But you have to wonder how Google will deliver this type of AMP carousel interface with AMP content sprinkled throughout the results. (They already reference the interface as the ‘AMP viewer’.)

What if you could simply swipe between AMP results? The current interface lets you do this already.

Once AMP is sprinkled all through the results wouldn’t it be easier to swipe between AMP results once you were in that environment? They already have the dots navigation element to indicate where you are in the order of results.

I know, I know, you’re thinking about how bad this could be for non-AMP content but let me tell you a secret. Users won’t care and neither will Google.

User experience trumps publisher whining every single time.

In the end, instead of creating a carousel for the links, Google can create a carousel for the content itself.

AMP

For those of you who aren’t hip to acronyms, AMP stands for Accelerated Mobile Pages. It’s an initiative by Google to create near instantaneous availability of content on mobile.

The way they accomplish this is by having publishers create very lightweight pages and then cacheing them on Google servers. So when you click on one of those AMP results you’re essentially getting the cached version of the page direct from Google.

The AMP initiative is all about speed. If the mobile web is faster it helps with Google’s (not so) evil plan. It also has an interesting … side effect.

Google could host the mobile Internet.

That’s both amazing and a bit terrifying. When every piece of content in a search result is an AMP page Google can essentially host that mobile result in its entirety.

Why make users click if every search result is an AMP page? Seriously. Think about it.

Google is obsessed with reducing the time to long click, the amount of time it takes to get users to a satisfactory result. What better way to do this than to remove the friction of clicking back and forth to each site.

No more blue links.

Why make users click when you can display that content immediately? Google has it! Then users can simply swipe to the next result, and the next, and the next and the next. They can even go back and forth in this way until they find a result they wish to delve into further.

Swiping through content would be a radical departure from the traditional search interface but it would be vastly faster and more convenient.

How much better would it be to search for a product and swipe through the offerings of those appearing in search results?

New Metrics of Success

If this is where the mobile web is headed then the game will completely change. Success won’t be tied nearly as much to rank. When you remove the friction of clicking the number of ‘views’ each result gets will be much higher.

The normal top heavy click distribution will disappear to be replaced with a more even ‘view’ distribution of the top 3-5 results. I’m assuming most users will swipe at least three times if not more but that there will be a severe drop off after that.

When a user swipes to your result you’ll still get credit for a visit by implementing Google Analytics or another analytics package correctly. But users aren’t really on your site at that point. It’s only when they click through on that AMP result that they wind up in your mobile web environment.

So the new metric for mobile search success might be getting users to stop on your result and, optimally, click-through to your site. That’s right, engagement could be the most important metric. Doesn’t that essentially create alignment between users, Google and publishers?

Funny thing is, Google just launched the ability to do A/B testing for AMP pages. They’re already thinking about how important it’s going to be to help publishers optimize for engagement.

Hype or Reality?

Google, as a mobile first company, is pushing hard to reduce the distance between search and information. I don’t think this is a controversial statement. The question is how far Google is willing to go to shorten that distance.

I’m putting a bunch of pieces together here, from horizontal interfaces, to AMP to Google’s obsession with speed to come up with this forward looking vision of mobile search.

I think it’s in the realm of possibility, particularly since the growth areas for Google are in countries outside of the US where mobile is vastly more dominant and where speed can sometimes be a challenge.

TL;DR

When every search result is an AMP page there’s little reason for users to click on a result to see that content. Should Google’s AMP project succeed, the future of mobile search could very well be swiping through content and the death of the blue link.

]]>http://www.blindfiveyearold.com/the-future-of-mobile-search/feed17http://www.blindfiveyearold.com/the-future-of-mobile-searchRankBrain Survival Guidehttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/ydg01DcnYBQ/rankbrain-survival-guide
http://www.blindfiveyearold.com/rankbrain-survival-guide#commentsThu, 09 Jun 2016 16:30:25 +0000http://www.blindfiveyearold.com/?p=9813This is a guide to surviving RankBrain. I created it, in part, because there’s an amazing amount of misinformation about RankBrain. And the truth is there is nothing you can do to optimize for RankBrain.

I’m not saying RankBrain isn’t interesting or important. I love learning about how search works whether it helps me in my work or not. What I am saying is that there are no tactics to employ based on our understanding of RankBrain.

So if you’re looking for optimization strategies you should beware of the clickbait RankBrain content being pumped out by fly-by-night operators and impression hungry publishers.

You Can’t Optimize For RankBrain

I’m going to start out with this simple statement to ensure as many people as possible read, understand and retain this fact.

You can’t optimize for RankBrain.

You’ll read a lot of posts to the contrary. Sometimes they’re just flat out wrong, sometimes they’re using RankBrain as a vehicle to advocate for SEO best practices and sometimes they’re just connecting dots that aren’t there.

Read on if you want proof that RankBrain optimization is a fool’s errand and you should instead focus on other vastly more effective strategies and tactics.

What Is RankBrain?

RankBrain is a deep learning algorithm developed by Google to help improve search results. Deep learning is a form of machine learning and can be classified somewhere on the Artificial Intelligence (AI) spectrum.

Knowing how RankBrain works is important because it determines whether you can optimize for it or not. Despite what you might read, there are only a handful of good sources of information about RankBrain.

RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.

Word2Vec is most often referenced when talking about vectors. And it should be noted that Jeff Dean, Greg Corrado and many others were part of this effort. You’ll see these same names pop up time and again surrounding vectors and deep learning.

I think we will have a much better handle on text understanding, as well. You see the very slightest glimmer of that in word vectors, and what we’d like to get to where we have higher level understanding than just words. If we could get to the point where we understand sentences, that will really be quite powerful. So if two sentences mean the same thing but are written very differently, and we are able to tell that, that would be really powerful. Because then you do sort of understand the text at some level because you can paraphrase it.

I was really intrigued by the idea of Google knowing that two different sentences meant the same thing. And they’ve made a fair amount of progress in this regard with research around paragraph vectors (pdf).

It’s difficult to say exactly what type of vector analysis RankBrain employs. I think it’s safe to say it’s a variable-length vector analysis and leave it at that.

So what else did we learn from the Corrado interview? Later in the piece there are statements about how much Google relies on RankBrain.

The system helps Mountain View, California-based Google deal with the 15 percent of queries a day it gets which its systems have never seen before, he said.

That’s pretty clear. RankBrain is primarily used for queries not previously seen by Google, though it seems likely that its reach may have grown based on the initial success.

Unfortunately the next statement has caused a whole bunch of consternation.

RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

This provoked the all-too-typical reactions from the SEO community. #theskyisfalling The fact is we don’t know how Google is measuring ‘importance’ nor do we understand whether it’s for just that 15 percent or for all queries.

Andrey Lipattsev

To underscore the ‘third-most important’ signal boondoggle we have statements by Andrey Lipattsev, Search Quality Senior Strategist at Google, in a Q&A with Ammon Johns and others.

In short, RankBrain might have been ‘called upon’ in many queries but may not have materially impacted results.

Or if you’re getting technical, RankBrain might not have caused a reordering of results. So ‘importance’ might have been measured by frequency and not impact.

Later on you’ll find that RankBrain has access to a subset of signals so RankBrain could function more like a meta signal. It kind of feels like comparing apples and oranges.

But more importantly, why does it matter? What will you do differently knowing it’s the third most important signal?

I mean, if you think about, for example, a query like, “Can you get a 100 percent score on Super Mario without a walk-through?” This could be an actual query that we receive. And there is a negative term there that is very hard to catch with the regular systems that we had, and in fact our old query parsers actually ignored the “without” part.

And RankBrain did an amazing job catching that and actually instructing our retrieval systems to get the right results.

I was lucky enough to see this presentation live and it is perhaps the best and most revealing look at Google search. (Seriously, if you haven’t watched this you should turn in your SEO card now.)

It’s in the Q&A that Haahr discusses RankBrain.

RankBrain gets to see some subset of the signals and it’s a machine learning or deep learning system that has its own ideas about how you combine signals and understand documents.

I think we understand how it works but we don’t understand what it’s doing exactly.

It uses a lot of the stuff that we’ve published on deep learning. There’s some work that goes by Word2Vec or word embeddings that is one layer of what RankBrain is doing. It actually plugs into one of the boxes, one of the late post retrieval boxes that I showed before.

Danny then asks about how RankBrain might work to ascertain document quality or authority.

This is all a function of the training data that it gets. It sees not just web pages but it sees queries and other signals so it can judge based on stuff like that.

These statements are by far the most important because it provides a plethora of information. First and foremost Haahr states that RankBrain plugs in late post-retrieval.

This is an important distinction because it means that RankBrain doesn’t rewrite the query before Google goes looking for results but instead does so afterwards.

So Google retrieves results using the raw query but then RankBrain might rewrite the query or interpret it differently in an effort to select and reorder the results for that query.

In addition, Haahr makes it clear that RankBrain has access to a subset of signals and the query. As I mentioned this makes RankBrain feel more like a meta-signal instead of a stand-alone signal.

What we don’t know are the exact signals that make up that subset. Many will take this statement to theorize that it uses link data or click data or any sundry of signals. The fact is we have no idea which signals RankBrain has access to nor with what weight RankBrain might be using them or if they’re used evenly across all queries.

The inability to know the variables makes any type of regression analysis of RankBrain a non-starter.

Of course there’s also the statement that they don’t know what RankBrain is doing. That’s because RankBrain is a deep learning algorithm performing unsupervised learning. It’s creating its own rules.

More to the point, if a Google Ranking Engineer doesn’t know what RankBrain is doing, do you think that anyone outside of Google suddenly understands it better? The answer is no.

You Can’t Optimize For RankBrain

You can’t optimize for RankBrain based on what we know about what it is and how it works. At its core RankBrain is about better understanding of language, whether that’s within documents or queries.

So what can you do differently based on this knowledge?

Google is looking at the words, sentences and paragraphs and turning them into mathematical vectors. It’s trying to assign meaning to that chunk of text so it can better match it to complex query syntax.

The only thing you can do is to improve your writing so that Google can better understand the meaning of your content. But that’s not really optimizing for RankBrain that’s just doing proper SEO and delivering better user experience (UX).

By improving your writing and making it more clear you’ll wind up earning more links and, over time, be seen as an authority on that topic. So you’ll be covered no matter what other signals RankBrain is using.

The one thing you shouldn’t do is think that RankBrain will figure out your poor writing or that you now have the license to, like, write super conversationally you know. Strong writing matters more now than it ever has before.

TL;DR

RankBrain is a deep learning algorithm that plugs in post-retrieval and relies on variable-length text vectors and other signals to make better sense of complex natural language queries. While fascinating, there is nothing one can do to specifically optimize for RankBrain.

]]>http://www.blindfiveyearold.com/rankbrain-survival-guide/feed37http://www.blindfiveyearold.com/rankbrain-survival-guideQuery Classeshttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/1dMLgxHDJTI/query-classes
http://www.blindfiveyearold.com/query-classes#commentsTue, 09 Feb 2016 18:00:58 +0000http://www.blindfiveyearold.com/?p=9744Identifying query classes is one of the most powerful ways to optimize large sites. Understanding query classes allows you to identify both user syntax and intent.

I’ve talked for years about query classes but never wrote a post dedicated to them. Until now.

Query Classes

What are query classes? A query class is a set of queries that are well defined in construction and repeatable. That sounds confusing but it really isn’t when you break it down.

A query class is most often composed of a root term and a modifier.

vacation homes in tahoe

Here the root term is ‘vacation homes’ and the modifier is ‘in [city]’. The construction of this query is well defined. It’s repeatable because users search for vacation homes in a vast number of cities.

Geography is often a dynamic modifier for a query class. But query classes are not limited to just geography. Here’s another example.

midday moon lyrics

Here the root term is dynamic and represents a song, while the modifier is the term ‘lyrics’. A related query class is ‘[song] video’ expressed as ‘midday moon video’.

Another simple one that doesn’t contain geography is ‘reviews’. This modifier can be attached to both products or locations.

This often happens as part of a query reformulation when people are looking for the most up-to-date information on a topic and this is the easiest way for them to do so.

Sometimes a query class doesn’t have a modifier. LinkedIn and Facebook (among others) compete for a simple [name] query class. Yelp and Foursquare and others compete for the [venue name] query class.

Of how about food glorious food.

That’s right, there’s a competitive ‘[dish] recipe’ query class up for grabs. Then there are smaller but important query classes that are further down the purchase funnel for retailers.

You can create specific comparison pages for the query class of ‘[product x] vs [product y]’ and capture potential buyers during the end of the evaluation phase. Of course you don’t create all of these combinations, you only do so for those that have legitimate comparisons and material query volume.

If it isn’t obvious by now there are loads of query classes out there. But query classes aren’t about generating massive amounts of pages but instead are about matching and optimizing for query syntax and intent.

User Syntax

One reason I rely on query classes is that it provides a window to understanding user syntax. I want to know how they search.

Query classes represent the ways in which users most often search for content. Sure there are variations and people don’t all query the same way but the majority follow these patterns.

Do you want to optimize for the minority or the majority?

Here are just a few of the ‘[dish] recipe’ terms I thought of off the top of my head.

Look at that! And that’s just me naming three dishes off the top of my head. Imagine the hundreds if not thousands of dishes that people are searching for each day. You’re staring at a pile of search traffic based on a simple query class.

It’s super easy when you’re dealing with geography because you can use a list of top cities in the US (or the world) and then with some simple concatenation formulas can generate a list of candidates.

Sometimes you want to know the dominant expression of that query class. Here’s one for bike trails by state.

Here I have a list of the different variants of this query class. One using ‘[state] bike trails’ and the other ‘bike trails in [state]’. Using Google’s keyword planner I see that the former has twice the query volume than the latter. Yes, it’s exact match but that’s usually directionally valid.

I know there’s some of you who think this level of detail doesn’t matter. You’re wrong. When users parse search results or land on a page they want to see the phrase they typed. It’s human nature and you’ll win more if you’re using the dominant syntax.

Once you identify a query class the next step is to understand the intent of that query class. If you’ve got a good head on your shoulders this is relatively easy.

Query Intent

Not only do we want to know how they search, we want to know why.

The person searching for ‘vacation homes in tahoe’ is looking for a list of vacation rentals in Lake Tahoe. The person searching for ‘midday moon lyrics’ is looking for lyrics to the Astronautalis song. The person looking for ‘samsung xxx’ vs ‘sony xxx’ is looking for information on which TV they should purchase.

Knowing this, you can provide the relevant content to satisfy the user’s active intent. But the sites and pages that wind up winning are those that satisfy both active and passive intent.

The person looking for vacation homes in tahoe might also want to learn about nearby attractions and restaurants. They may want to book airfare. Maybe they’re looking for lift tickets.

The person looking for midday moon lyrics may want more information about Astronautalis or find lyrics to his other songs. Perhaps they want concert dates and tickets. The person looking for a TV may want reviews on both, a guide to HDTVs and a simple way to buy.

Sometimes the query class is vague such a [name] or [venue] and you’re forced to provide answers to multiple types of intent. When I’m looking up a restaurant name I might be looking for the phone number, directions, menu, reviews or to make a reservation to name but a few.

On larger sites the beauty of query classes is that you can map them to a page type and then use smart templates to create appropriate titles, descriptions and more.

This isn’t the same as automation but is instead about ensuring that the page type that matches a query class is well optimized. You can then also do A/B testing on your titles to see if a slightly different version of the title helps you perform across the entire query class.

Sometimes you can play with the value proposition in the title.

Vacation Homes in Tahoe vs Vacation Homes in Tahoe – 1,251 Available Now

It goes well beyond just the Title and meta description. You can establish consistent headers, develop appropriate content units that satisfy passive intent and ensure you have the right crosslink units in place for further discovery.

The wrinkle usually comes with term length. Take city names for instance. You’ve got Rancho Santa Margarita clocking in at 22 characters and then Ada with a character length of 3.

So a lot of the time you’re coming up with business logic that delivers the right text, in multiple places, based on the total length of the term. This can get complex, particularly if you’re matching a dynamic root term with a geographic modifier.

Smart templates let you scale without sacrificing quality.

Rank Indices

The other reason why query classes are so amazing, particularly for large sites, is that you can create rank indices based on those query classes and determine how you’re performing as a whole across that query class.

Here I’ve graphed four similar but distinct query class rank indices. Obviously something went awry there in November of 2015. But I know exactly how much it impacted each of those query classes and then work on ways to regain lost ground.

Query classes usually represent material portions of traffic that impact bottomline business metrics such as user acquisition and revenue. When you get the right coverage of query classes and create rank indices for each you’re able to hone in on where you can improve and react when the trends start to go in the wrong direction.

Hopefully you’ve already figured out how to identify query classes. But if you haven’t here are a few tips to get you started.

First, use your head. Some of this stuff is just … right there in front of you. Use your judgement and then validate it through keyword research.

Second, look at what comes up in Google’s autocomplete suggestions for root terms. You can also use a tool like Ubersuggest to do this at scale and generate more candidates.

Third, look at the traffic coming to your pages via Search Analytics within Google Search Console. You can uncover patterns there and identify the true syntax bringing users to those pages.

Fourth, use paid search, particularly the report that shows the actual terms that triggered the ad, to uncover potential query classes.

Honestly though, you should really only need the first and second to identify and hone in on query classes.

TL;DR

Query classes are an enormously valuable way to optimize larger sites so they meet and satisfy patterns of query syntax and intent. Query classes let you understand how and why people search. Pages targeted at query classes that aggregate intent will consistently win.

]]>http://www.blindfiveyearold.com/query-classes/feed8http://www.blindfiveyearold.com/query-classesDo 404 Errors Hurt SEO?http://feedproxy.google.com/~r/BlindFiveYearOld/~3/5K9chsm58pA/do-404-errors-hurt-seo
http://www.blindfiveyearold.com/do-404-errors-hurt-seo#commentsMon, 01 Feb 2016 17:27:08 +0000http://www.blindfiveyearold.com/?p=9692Do 404 errors hurt SEO? It’s a simple question. However, the answer is far from simple. Most 404 errors don’t have a direct impact on SEO, but they can eat away at your link equity and user experience over time.

There’s one variety of 404 that might be quietly killing your search rankings and traffic.

404 Response Code

What is a 404 exactly? A 404 response code is returned by the server when there is no matching URI. In other words, the server is telling the browser that the content is not found.

404s are a natural part of the web. In fact, link rot studies show that links regularly break. So what’s the big deal? It’s … complicated.

404s and Authority

One of the major issues with 404s is that they stop the flow of authority. It just … evaporates. At first, this sort of bothered me. If someone linked to your site but that page or content is no longer there the citation is still valid. At that point in time the site earned that link.

But when you start to think it through, the dangers begin to present themselves. If authority passed through a 404 page I could redirect that authority to pages not expressly ‘endorsed’ by that link. Even worse, I could purchase a domain and simply use those 404 pages to redirect authority elsewhere.

And if you’re a fan of conspiracies then sites could be open to negative SEO, where someone could link toxic domains to malformed URLs on your site.

404s don’t pass authority and that’s probably a good thing. It still makes sense to optimize your 404 page so users can easily search and find content on your site.

Types of 404s

Google is quick to say that 404s are natural and not to obsess about them. On the other hand, they’ve never quite said that 404s don’t matter. The 2011 Google post on 404s is strangely convoluted on the subject.

The last line of the first answer seems to be definitive but why not answer the question simply? I believe it’s because there’s a bit of nuance involved. And most people suck at nuance.

While the status code remains the same there are different varieties of 404s: external, outgoing and internal. These are my own naming conventions so I’ll make it clear in this post what I mean by each.

Because some 404s are harmless and others are downright dangerous.

External 404s

External 404s occur when someone else is linking to a broken page on your site. Even here, there is a small difference since there can be times when the content has legitimately been removed and other times when someone is linking improperly.

Back in the day many SEOs recommended that you 301 all of your 404s so you could reclaim all the link authority. This is a terrible idea. I have to think Google looks for sites that employ 301s but have no 404s. In short, a site with no 404s is a red flag.

A request for domain.com/foobar should return a 404. Of course, if you know someone is linking to a page incorrectly, you can apply a 301 redirect to get them to the right page, which benefits both the user and the site’s authority.

External 404s don’t bother me a great deal. But it’s smart to periodically look to ensure that you’re capturing link equity by turning the appropriate 404s into 301s.

Outgoing 404s

Outgoing 404s occur when a link from your site to another site breaks and returns a 404. Because we know how often links evaporate this isn’t uncommon.

Google would be crazy to penalize sites that link to 404 pages. Mind you, it’s about scale to a certain degree. If 100% of the external links on a site were going to 404 pages then perhaps Google (and users) would think differently about that site.

They could also be looking at the age of the link and making a determination on that as well. Or perhaps it’s fine as long as Google saw that the link was at one time a 200 and is now a 404.

Overall these are the least concerning of 404 errors. It’s still a good idea, from a user experience perspective, to find those outgoing 404s in your content and remove or fix the link.

Internal 404s

The last type of 404 is an internal 404. This occurs when the site itself is linking to another ‘not found’ page on their own site. In my experience, internal 404s are very bad news.

Over the past two years I’ve worked on squashing internal 404s for a number of large clients. In each instance I believe that removing these internal 404s had a positive impact on rankings.

Of course, that’s hard to prove given all the other things going on with the site, with competitors and with Google’s algorithm. But all things being equal eliminating internal 404s seems to be a powerful piece of the puzzle.

Taken a step further, could Google determine that the odds of a user encountering a 404 on a site and then use that to demote sites from search? I think it’s plausible. Google doesn’t want their users having a poor experience so they might steer folks away from a site they know has a high probability of ending in a dead end.

That leads me to think about the user experience when encountering one of these internal 404s. When a user hits one of these they blame the site and are far more likely to leave the site and return to the search results to find a better result for their query. This type of pogosticking is clearly a negative signal.

Internal 404s piss off users.

The psychology is different with an outgoing 404. I believe most users don’t blame the site for these but the target of the link instead. There’s likely some shared blame, but the rate of pogosticking shouldn’t be as high.

In my experience internal 404s are generally caused by bugs and absolutely degrade the user experience.

Finding Internal 404s

You can find 404s using Screaming Frog or Google Search Console. I’ll focus on Google Search Console here because I often wind up finding patterns of internal 404s this way.

In Search Console you’ll navigate to Crawl and select Crawl Errors.

At that point you’ll select the ‘Not found’ tab to find the list of 404s Google has identified. Click on one of these URLs and you get a pop-up where you can select the ‘Linked from’ tab.

I was actually trying to get Google to recognize another internal 404 but they haven’t found it yet. Thankfully I muffed a link in one of my posts and the result looks like an internal 404.

What you’re looking for are instances where your own site appears in the ‘Linked from’ section. On larger sites it can be easy to spot a bug that produces these types of errors by just checking a handful of these URLs.

In this case I’ll just edit the malformed link and everything will work again. It’s usually not that easy. Most often I’m filing tickets in a client’s project tracking system and making engineers groan.

Correlation vs Causation

Some of you are probably shrieking that internal 404s aren’t the problem and that Google has been clear on this issue and that it’s something else that’s making the difference. #somebodyiswrongontheinternet

You’re right and … I don’t care.

You know why I don’t care? Every time I clean up internal 404s, it produces results. I’m not particularly concerned about exactly why it works. Mind you, from an academic perspective I’m intrigued but from a consulting perspective I’m not.

In addition, if you’re in the new ‘user experience optimization’ camp, then eliminating internal 404s fits very nicely, doesn’t it? So is it the actual internal 404s that matter or the behavior of users once they are eliminated that matters or something else entirely? I don’t know.

This is particularly true since 404 maintenance is entirely in our control. That doesn’t happen much in this industry. It’s shocking how many people ignore 404s that are staring them right in the face. Whether it’s not looking at Google Search Console or not tracking down the 404s that crop up in weblog reports or deep crawls.

Make it a habit to check and resolve your Not found errors via Search Console or Screaming Frog.

TL;DR

404 errors themselves may not directly hurt SEO, but they can indirectly. In particular, internal 404s can quietly tank your efforts, creating a poor user experience that leads to a low-quality perception and pogosticking behavior.

]]>http://www.blindfiveyearold.com/do-404-errors-hurt-seo/feed24http://www.blindfiveyearold.com/do-404-errors-hurt-seoAcquisition SEO and Business Crowdinghttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/7FVn-1weh-Q/acquisition-seo-and-business-crowding
http://www.blindfiveyearold.com/acquisition-seo-and-business-crowding#commentsWed, 20 Jan 2016 16:13:12 +0000http://www.blindfiveyearold.com/?p=9619There’s an old saying that if you can’t beat ’em, join ’em. But in search, that saying is often turning into something different.

If you can’t beat ’em, buy ’em.

Acquisition search engine optimization is happening more often as companies acquire or merge, effectively taking over shelf space on search results. Why settle for having the top result on an important term when I can have the first and second result?

Should this trend continue you could find search results where only a handful of companies are represented on the first page. That undermines search diversity, one of the fundamentals of Google’s algorithm.

This type of ‘business crowding’ creates false choice and is vastly more dangerous than the purported dread brought on by a filter bubble.

Acquisition SEO

SEO stands for search engine optimization. Generally, that’s meant to convey the idea that you’re working on getting a site to be visible and rank well in search engines.

However, you might see diminishing returns when you’re near the top of the results in important query classes. Maybe the battle with a competitor for those top slots is so close that the effort to move the needle is essentially ROI negative.

In these instances, more and more often, the way to increase search engine traffic, and continue on a growth trajectory, is through an acquisition.

That’s not to say that Zillow or Trulia is doing anything wrong. But it brings up a lot of thorny questions.

Search Shelf Space

About seven years ago I had an opportunity to see acquisition SEO up close and personal. Caring.com acquired Gilbert Guide and suddenly we had two results on the first page for an important query class in the senior housing space.

It’s hard not to get Montgomery Burns at that point and look at how you can dominate search results by having two sites. All roads lead to Rome as they say.

I could even rationalize that the inventory provided on each platform was different. A venn diagram would show a substantial overlap but there was plenty of non-shaded areas.

But who wants to maintain two sets of inventory? That’s a lot of operational and technical overhead. Soon you figure out that it’s probably better to have one set of inventory and syndicate it across both sites. Cost reduction and efficiency are powerful business tenets.

At that point the sites are, essentially, the same. They offer the same content (the inventory of senior housing options) but with different wrappers. It idea was awesome but also made my stomach hurt.

(Please note that this is not how these two sites are configured today.)

Host Crowding

The funny thing is that if I’d tried to do this with a subdomain on Caring.com I’d have run afoul of something Google calls host crowding.

For several years Google has used something called “host crowding,” which means that Google will show up to two results from each hostname/subdomain of a domain name. That approach works very well to show 1-2 results from a subdomain, but we did hear complaints that for some types of searches (e.g. esoteric or long-tail searches), Google could return a search page with lots of results all from one domain. In the last few weeks we changed our algorithms to make that less likely to happen.

In essence, you shouldn’t be able to crowd out competitors on a search result through the use of multiple subdomains. Now, host crowding or clustering as it’s sometimes called has seen an ebb and flow over time.

I know I was complaining. My test Yelp query is [haircut concord ca], which currently returns 6 results from Yelp. (It’s 8 if you add the ‘&filter=0’ parameter on the end of the URL.)

I still maintain that this is not useful and that it would be far better to show fewer results from Yelp and/or place many of those Yelp results as sitelinks under one canonical Yelp result.

But I digress.

Business Crowding

The problem here is that acquisition SEO doesn’t violate host crowding in the strict sense. The sites are on completely different domains. So a traditional host crowding algorithm wouldn’t group or cluster those sites together.

But make no mistake, the result is essentially the same. Except this time it’s not the same site. It’s the same business.

Business crowding is the advanced form of host crowding.

It can actually be worse since you could be getting the same content delivered from the same company under different domains.

The diversity of that result goes down and users probably don’t realize it.

Doorway Pages

When you think about it, business crowding essentially meets the definition of a doorway page.

Doorways are sites or pages created to rank highly for specific search queries. They are bad for users because they can lead to multiple similar pages in user search results, where each result ends up taking the user to essentially the same destination.

When participating in business crowding you do have similar pages in search results where the user is taken to the same content. It’s not the same destination but the net result is essentially the same. One of the examples cited lends more credence to this idea.

Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page

In business crowding you certainly have multiple domain names but there’s no funnel necessary. The content is effectively the same on those multiple domains.

Business crowding doesn’t meet the letter of the doorway page guidelines but it seems to meet the spirt of them.

Where To Draw The Line?

This isn’t a cut and dry issue. There’s quite a bit of nuance involved if you were to address business crowding. Lets take my example above from Caring.

If the inventory of Caring and Gilbert Guide were never syndicated, would that exempt them from business crowding? If the inventories became very similar over time, would it still be okay?

In essence, if the other company is run independently, then perhaps you can continue to take up search shelf space.

But what prevents a company from doing this multiple times and owning 3, 4 or even 5 sites ranking on the first page for a search result? Even if they’re independently run, over time it will make it more difficult for others to disrupt that space since the incumbents have no real motivation to improve.

With so many properties they’re very happy with the status quo and are likely not too concerned with any one site’s position in search as long as the group of sites continues to squeeze out the competition.

Perhaps you could determine if the functionality and features of the sites was materially different. But that would be pretty darn difficult to do algorithmically.

Or is it simply time based? You get to have multiple domains and participate in business crowding for up to, say, one year after the acquistion. That would be relatively straight-forward but would have a tremendous impact on the mergers and acquisitions space.

If Zillow knew that they could only count on the traffic from Trulia for one year after the acquisition they probably wouldn’t have paid $3.5 billion (yes that’s a ‘b’) for Trulia. In fact, the deal might not have gotten done at all.

So when we start talking about addressing this problem it spills out of search and into finance pretty quickly.

What’s Good For The User?

At the end of the day Google wants to do what is best for the user. Some of this is altruistic. Trust me, if you talk to some of the folks on Google’s search quality team, they’re serious about this. But obviously if the user is happy then they return to Google and perform more searches that wind up padding Google’s profits.

Doing good by the user is doing good for the business.

My guess is that most users don’t realize that business crowding is taking place. They may pogostick from one site to the other and wind up satisfied, even if those sites are owned by the same company. In other words, search results with business crowding may wind up producing good long click and time to long click metrics.

It sounds like an environment ripe for a local maxima.

If business crowding were eliminated then users would see more options. While some of the metrics might deteriorate in the short-term would they improve long-term as new entrants in those verticals provided value and innovation?

There’s only one way to find out.

Vacation Rentals

One area where this is currently happening is within the vacation rentals space.

In this instance two companies (TripAdvisor and HomeAway) own the first six results across five domains. This happens relatively consistently in this vertical. (Please note that I do have a dog in this fight. Airbnb is a client.)

A venn diagram of inventory between TripAdvisor properties would likely show a material overlap but with good portions unshaded. They seem to have a core set of inventory that is on all properties but aren’t as aggressive with full on syndication.

Let me be clear here. I don’t blame these companies for doing what they’re doing. It’s smart SEO and it’s winning within the confines of Google’s current webmaster guidelines.

My question is whether business crowding is something that should be addressed? What happens if this practice flourishes?

Is the false choice being offered to users ultimately detrimental to users and, by proxy, to Google?

The Mid-Life Crisis of Search Results

Search hasn’t been around for that long in the scheme of things. As the Internet evolved we saw empires rise and fall as new sites, companies and business models found success.

Maybe you remember Geocities or Gator or Lycos or AltaVista or Friendster. Now, none of these fall into the inventory based sites I’ve referenced above but I use them as proxies. When it comes to social, whether you’re on Facebook or Instragram or WhatsApp, one company is still in control there.

The days in which successful sites could rise and fall – and I mean truly fall – seem to be behind us.

The question is whether search results should reflect and reinforce this fact or if it should instead continue to reflect diversity. It seems like search is at a crossroads of sorts as the businesses that populate results have matured.

Can It Be Addressed Algorithmically?

The next question that comes to mind is whether Google could actually do anything about business crowding. We know Google isn’t going to do anything manual in nature. They’d want to implement something that dealt with this from an algorithmic perspective.

I think there’s a fairly straight forward way Google could do this via the Knowledge Graph. Each business is an entity and it would be relatively easy to map the relationship between each site as a parent child relationship.

Some of this can be seen in the remnants of Freebase and their scrape of CrocTail, though the data probably needs more massaging. But it’s certainly possible to create and maintain these relationships within the Knowledge Graph.

Once done, you can attach a parent company to each site and apply the same sort of host crowding algorithm to business crowding. This doesn’t seem that farfetched.

But the reality of implementing this could have serious implications and draw the ire of a number of major corporations. And if users really don’t know that it’s all essentially the same content I’m not sure Google has the impetus to do anything about it.

Too Big To Fail (at Search)

Having made these acquisitions under the current guidelines, could Google effectively reduce business crowding without creating a financial meltdown for large corporate players.

SimilarWeb shows that Trulia gets a little over half of its traffic from organic search. Any drastic change to that channel would be a material event for the parent company.

Others I’ve mentioned in this post are less dependent on organic search to certain degrees but a business crowding algorithm would certainly be a bitter pill to swallow for most.

Selfishly, I’d like to see business crowding addressed because it would help one of my clients, Airbnb, to some degree. They’d move up a spot or two and gain additional exposure and traffic.

But there’s a bigger picture here. False diversity is creeping into search. If you extrapolate this trend search results become little more than a corporate shell game.

On the other hand, addressing business crowding could dramatically change the way sites deal with competitors and how they approach mergers and acquisitions. I can’t predict how that would play out in the short or long-term.

What do you think? I’m genuinely interested in hearing your thoughts on this topic so please jump in with your comments.

]]>http://www.blindfiveyearold.com/acquisition-seo-and-business-crowding/feed25http://www.blindfiveyearold.com/acquisition-seo-and-business-crowdingThat Time I Had Cancerhttp://feedproxy.google.com/~r/BlindFiveYearOld/~3/vfCaA5RWDVU/that-time-i-had-cancer
http://www.blindfiveyearold.com/that-time-i-had-cancer#commentsSat, 31 Oct 2015 17:29:05 +0000http://www.blindfiveyearold.com/?p=9576(This is a highly personal post so if that isn’t your thing then you should move on.)

On Friday, October 23 I breathed a sigh of relief as my oncologist told me that my six month PET/CT scan was clear. I am cancer free!

High Noon

It’s an odd thing to sit on that thin crinkly sheet of sanitary paper in a small bland room staring at your oncologist’s framed diplomas, trying to keep yourself distracted from the news you’re about to receive. You get to thinking about how the vowels and consonants that make up that crucial sentence can change the course of your life.

On October 3rd my family celebrated my 44th birthday by eating at Fleming’s Steakhouse. My birthday is an important date but not exactly for the traditional reason. It was a year ago on that date that I was diagnosed with Follicular Lymphoma after landing in the emergency room after dinner.

Part of my decision to eat at Fleming’s was to thumb my nose at cancer and what it had done to me. In the year leading up to my diagnosis I’d had ever frequent bouts of stomach pain. Over time I figured out that it was often linked to eating steak.

Since the end of my treatment I’d been feeling great. I could eat and drink anything again. So I was going to go all Genghis Khan on things and get a truly epic steak for my birthday.

But later that night I didn’t feel well. I had pain and other symptoms that felt all too familiar. Over the course of the next few days I was in various levels of discomfort. I was waking up in the middle of night. I even had to dip back into my stash of anti-nausea medicine so I could drive my daughter to school.

I was freaking out. I was certain my Lymphoma was back.

I took walks with my wife around the neighborhood and talked about how we might handle things and what it might mean. It wasn’t so much having to go through the chemotherapy again that scared me. I could handle that. And I knew that the treatment was effective. But the question was for how long? If it came back so quickly, how long would I be able to use that treatment? And would I be consigned to doing three rounds of chemotherapy a year just to … stay alive?

I was psyching myself up to tackle whatever it was that was put in front of me. I refused to lose and knew I had to be in the right frame of mind. I was really more concerned about how it would impact my wife and daughter. Being a spectator to a loved one going through something like this is no picnic. I didn’t want my daughter to grow up with me constantly going through chemotherapy. Don’t get me wrong, it’s better that than me not being there but it made me sad and very angry.

I finally sucked it up and bullied my way into getting a blood test at my oncologist’s office and moved my PET/CT scan up by two weeks. I got a copy of the initial blood test results and my white blood cell count was elevated. I feared the worst.

A few days later I was able to go back into the oncologist’s office to review my blood test.

Science

The nurse practitioner I saw regularly during my treatments (who is awesome) gave me the news. While my white blood cell count was slightly elevated the liver enzyme that was the best indicator of cancer was … normal. It looked like I had some sort of infection but, from where she stood, it wasn’t cancer. She theorized that it might be my gall bladder since it had looked a bit enflamed on one of my early PET/CT scans.

So I wasn’t nearly as terrified as I might have been sitting in that room waiting for the news. Because I’d handicapped things since getting this additional information. There was an alternate theory for my symptoms. And it was based in science and interpreted by experts. Who should I believe? My own passing analysis or hard chemistry and decades of expertise?

I was still crazy worried but my (dominant) logical side was able to talk down the emotional side from going completely apeshit. Sure enough it turned out I had nothing to worry about. I’d kicked cancer’s ass and it had decided not to come back for another beating.

This is a good segue to talking about what it’s like having cancer.

That’s Not Helping

Almost all of the messages I received were positive and helpful. But just like when you’re expecting a child you wind up seeing more pregnant women I noticed a lot more posts about cancer.

One of the things that irked me the most were posts that claimed traditional treatment (chemotherapy) is just a big pharmaceutical profit center. The idea being that they don’t want to cure cancer, they just want to treat it.

Screw you.

I’m not joking. If you’ve shared something like that you’ve hurt people. Full stop. No wiggle room. Because what you’re saying is that I’m stupid for trusting my oncologist. You’re also throwing shade on a group of doctors who truly do care about the people who are unlucky enough to get cancer. And yes, it’s just luck.

I don’t want to hear about anti-cancer diets. Again, when you post about that you’re basically telling people they ate their way to their cancer. Think about that. That’s a pretty shitty thing to do to someone. “Hey, you probably got cancer because you ate wrong.”

You smoke two packs of cigarettes a day or eat five pounds of bacon every week you’re certainly upping your odds. But most people don’t fall into these categories. I certainly don’t. I never smoked. I haven’t had fast food in twelve years and gave up soda five years ago. I got cancer. It wasn’t my fault.

I researched my ass off when I got my diagnosis. See, I’m pretty good at digging things up on Google. Hell, I knew so much that I had a decent conversation with my oncologist about the potential treatment regimens I had available. Yeah, she was impressed I referenced the BRIGHT study comparing the two treatment protocols.

Yet, look what happened? I had myself tied up in knots thinking my cancer was back. But all it took was looking at one liver enzyme level and knowing that my other readings were “always all over the place” (i.e. – noise) to know that it wasn’t.

You don’t know better.

I’m not saying you should blindly accept everything as fact. But the tin foil hat conspiracy theory stuff is not helping anyone. You should not be messing around with the mental state of someone with cancer.

Positive Work

Staying positive while you’re going through cancer is … work. I know I lectured you about how science is what truly matters but it’s not always that black and white. I believe that staying positive and believing that you’re going to beat cancer helps. When sick I often visualize my white blood cells attacking and destroying whatever is trying to take me down.

I would often chant ‘I’m going to be okay’ over and over again for long periods of time. It was just something I would do reflexively to convince myself. To give myself strength. To give whatever my body was doing extra momentum to kill what was trying to kill me.

But you’d have to be certifiably insane (or on some seriously good meds) to not think about the alternative from time to time. I’m an introspective guy too so I could go deep down that rabbit hole if I let myself. So it was an effort to stay positive. It was … a persona I had to create to ensure I survived.

Silence

After my sixth and final round of chemotherapy and the resulting clean PET/CT scan I stopped updating my CaringBridge page and essentially stopped talking about cancer. I didn’t even return emails from a few friends and family congratulating me on the great news. #sorry

The thing is, I was tired. I’d been thinking about cancer every day for the last seven months. Sure, my day to day life hadn’t changed that much but it really had … consumed me. No matter what you were doing it was always lurking there in the back of my mind. I didn’t want to think about it anymore. Even though it was a good result I just wanted to move forward and have things go back to normal.

It’s also strange for me to process. I’m proud to be a cancer survivor but I also don’t want to wave that around like some sort of ‘card’ I can play. Many of you also heaped praise on me for how I wrote about and approached my cancer. While I sincerely appreciate those kind words it sometimes makes me uncomfortable. What I did and my writing about it helped me. So in many ways I feel like I’m being complemented on being selfish.

But I’m glad that others have taken something positive from my journey. I sincerely wish all of those going through cancer (or any hardship) the very best.

Friends

I was also overwhelmed with the outpouring of support from friends and colleagues. That was … very special. Of course I expected some responses. I’m not a complete social pariah. But what I got was so much more than what I expected. I hope I can give some of that back to you (in a less dire way) in the future.

I can’t thank all of you enough for the kind words, unexpected Tweets, random IMs and emails. I hope you know just how important that out of the blue message can mean to someone. It certainly made me think about reaching out to folks, even if I’d lost touch, which is something I’m apt to do.

One special thank you to Leeza Rodriguez who provided some incredible insight and recommendations, particularly on dealing with nausea. Because of her help I was able to find the right mix of drugs in a shorter amount of time. It made the last three cycles of chemotherapy far more manageable.

Overall I’m just humbled by your collective kindness.

Winning

I will still periodically hoot or shout or grin like a maniac thinking about how I did it. I beat cancer! Doing so was both very difficult but also not so bad either. I try to downplay it sometimes but why should I really?

I had the idea for the title of this post for a few months. I was sort of scared of it. Because it treats cancer in an almost flippant way. Then I had my little rollercoaster ride and I thought my fear was warranted. I’d taken things too lightly. Karma.

But as you can see I wound up using the title. I won and cancer doesn’t deserve my respect. It may come back at some point but I’m not going to let myself think that’s going to happen anytime soon. And if it does come back I’ll kick it’s ass again.

Future

You always hear how having cancer or having a brush with death changes you. Suddenly every day is supposed to be more precious. Priorities are supposed to change and you’ll do those things that you were putting off for some future date.

Maybe that’s how it is for some people. But not me. Part of this is because I’m already living like that. I’m my own boss and make a very good living. I work from a great home and get to spend my days with my gorgeous wife. I am really there as my daughter grows up. My parents live 45 minutes away and I see them at least a couple times a month.

I’ve lived in places ranging from Washington, DC to San Diego. I’ve traveled abroad and can afford vacations to Hawaii or anywhere else I want to really. I get to sit and binge watch Dark Matter. My daughter and I rush out to the back yard to stare up together and watch the International Space Station pass overhead.

Are click through rates on search results a ranking signal? The idea is that if the third result on a page is clicked more often than the first that it will, over time, rise to the second or first result.

I remember this question being asked numerous times when I was just starting out in the industry. Google representatives employed a potent combination of tap dancing and hand waving when asked directly. They were so good at doing this that we stopped hounding them and over the last few years I rarely hear people talking about, let alone asking, this question.

Perhaps it’s because more and more people aren’t focused on the algorithm itself and are instead focused on developing sites, content and experiences that will be rewarded by the algorithm. That’s actually the right strategy. Yet I still believe it’s important to understand the algorithm and how it might impact your search efforts.

Following is an exploration of why I believe click-through rate is a ranking signal.

Occam’s Razor

Though the original principle wasn’t as clear cut, today’s interpretation of Occam’s Razor is that the simplest answer is usually the correct one. So what’s more plausible? That Google uses click-through rate as a signal or that the most data driven company in the world would ignore direct measurement from their own product?

It just seems like common sense, doesn’t it? Of course, we humans are often wired to make poor assumptions. And don’t get me started on jumping to conclusions based on correlations.

The argument against is that even Google would have a devil of a time using click-through rate as a signal across the millions of results for a wide variety of queries. Their resources are finite and perhaps it’s just too hard to harness this valuable but noisy data.

The Horse’s Mouth

It gets more difficult to make the case against Google using click-through rate as a signal when you get confirmation right from the horse’s mouth.

Now, perhaps Google wants to play a game of semantics. Click-through rate isn’t a ranking signal. It’s a feedback signal. It just happens to be a feedback signal that influences rank!

Call it what you want, at the end of the day it sure sounds like click-through rate can impact rank.

[Updated 7/22/15]

Want more? I couldn’t find this quote the first time around but here’s Marissa Mayer in the FTC staff report (pdf) on antitrust allegations.

According to Marissa Mayer, Google did not use click-through rates to determine the position of the Universal Search properties because it would take too long to move up on the SERP on the basis of user click-through rate.

In other words, they ignored click data to ensure Google properties were slotted in the first position.

It’s pretty clear that any reasonable search engine would use click data on their own results to feed back into ranking to improve the quality of search results. Infrequently clicked results should drop toward the bottom because they’re less relevant, and frequently clicked results bubble toward the top. Building a feedback loop is a fairly obvious step forward in quality for both search and recommendations systems, and a smart search engine would incorporate the data.

So is Google a reasonable and smart search engine?

The Old Days

There are other indications that Google has the ability to monitor click activity on a query by query basis, and that they’ve had that capability for dog years.

We hold them to a very high click through rate expectation and if they don’t meet that click through rate, the OneBox gets turned off on that particular query. We have an automated system that looks at click through rates per OneBox presentation per query. So it might be that news is performing really well on Bush today but it’s not performing very well on another term, it ultimately gets turned off due to lack of click through rates. We are authorizing it in a way that’s scalable and does a pretty good job enforcing relevance.

So way back in 2007 (eight years ago folks!) Google was able to create a scalable solution to using click-through rate per query to determine the display of a OneBox.

That seems to poke holes in the idea that Google doesn’t have the horsepower to use click-through rate as a signal.

We are looking to see if we show your result in a #1, does it get a click and does the user come back to us within a reasonable timeframe or do they come back almost instantly?

Do they come back and click on #2, and what’s their action with #2? Did they seem to be more pleased with #2 based on a number of factors or was it the same scenario as #1? Then, did they click on anything else?

We are watching the user’s behavior to understand which result we showed them seemed to be the most relevant in their opinion, and their opinion is voiced by their actions.

This and other conversations I’ve had make me confident that click-through rate is used as a ranking signal by Bing. The argument against is that Google is so far ahead of Bing that they may have tested and discarded click-through rate as a signal.

Duane’s remarks also tease out a little bit more about how click-through rate would be used and applied. It’s not a metric used in isolation but measured in terms of time spent on that clicked result, whether they returned to the SERP and if they then refined their search or clicked on another result.

When you really think about it, if pogosticking and long clicks are real measures then click-through rate must be part of the equation. You can’t calculate the former metrics without having the click-through rate data.

However, critics will point out that the result in question is once again at #4, indicating that click-through rate isn’t a ranking signal.

But clearly the burst of searches and clicks had some sort of effect, even if it was temporary, right? So might Google have developed mechanisms to combat this type of ‘bombing’ of click-through rate? Or perhaps the system identifies bursts in query and clicks and reacts to meet a real time or ‘fresh’ need?

Either way it shows that the click-through behavior is monitored. Combined with the admission from Udi Manber it seems like the click-through rate distribution has to be consistently off of the baseline for a material amount of time to impact rank.

In other words, all the testing in the world by a small band of SEOs is a drop in the ocean of the total click stream. So even if we can move the needle for a small time, the data self-corrects.

But Rand isn’t the only one testing this stuff. Darren Shaw has also experimented with this within the local SEO landscape.

Darren’s results aren’t fool proof either. You could argue that Google representatives within local might not be the most knowledgable about these things. But it certainly adds to a drumbeat of evidence that clicks matter.

But wait, there’s more. Much more.

Show Me The Patents

For quite a while I was conflicted about this topic because of one major stumbling block. You wouldn’t be able to develop a click-through rate model based on all the various types of displays on a result.

The result that had a review rich snippet gets a higher click-through rate because the eye gravitates to it. Google wouldn’t want to reward that result from a click-through rate perspective just because of the display.

Or what happens when the result has an image result or a answer box or video result or any number of different elements? There seemed to be too many variations to create a workable model.

The second patent seems to build from the first with the inventor in common being Hyung-Jin Kim.

Both of these are rather dense patents and it reminds me that we should all thank Bill Slawski for his tireless work in reading and rendering patents more accessible to the community.

I’ll be quoting from both patents (there’s a tremendous amount of overlap) but here’s the initial bit that encouraged me to put the headphones on and focus on decoding the patent syntax.

The basic rationale embodied by this approach is that, if a result is expected to have a higher click rate due to presentation bias, this result’s click evidence should be discounted; and if the result is expected to have a lower click rate due to presentation bias, this result’s click evidence should be over-counted.

Very soon after this the patent goes on to detail the number of different types of presentation bias. So this essentially means that Google saw the same problem but figured out how to deal with presentation bias so that it could rely on ‘click evidence’.

Then there’s this rather nicely summarized 10,000 foot view of the issue.

In general, a wide range of information can be collected and used to modify or tune the click signal from the user to make the signal, and the future search results provided, a better fit for the user’s needs. Thus, user interactions with the rankings presented to the users of the information retrieval system can be used to improve future rankings.

Again, no one is saying that click-through rate can be used in isolation. But it clearly seems to be one way that Google has thought about re-ranking results.

But it gets better as you go further into these patents.

The information gathered for each click can include: (1) the query (Q) the user entered, (2) the document result (D) the user clicked on, (3) the time (T) on the document, (4) the interface language (L) (which can be given by the user), (5) the country (C) of the user (which can be identified by the host that they use, such as www-google-co-uk to indicate the United Kingdom), and (6) additional aspects of the user and session. The time (T) can be measured as the time between the initial click through to the document result until the time the user comes back to the main page and clicks on another document result. Moreover, an assessment can be made about the time (T) regarding whether this time indicates a longer view of the document result or a shorter view of the document result, since longer views are generally indicative of quality for the clicked through result. This assessment about the time (T) can further be made in conjunction with various weighting techniques.

Here we see clear references to how to measure long clicks and later on they even begin to use the ‘long clicks’ terminology. (In fact, there’s mention of long, medium and short clicks.)

But does it take into account different classes of queries? Sure does.

Traditional clustering techniques can also be used to identify the query categories. This can involve using generalized clustering algorithms to analyze historic queries based on features such as the broad nature of the query (e.g., informational or navigational), length of the query, and mean document staytime for the query. These types of features can be measured for historical queries, and the threshold(s) can be adjusted accordingly. For example, K means clustering can be performed on the average duration times for the observed queries, and the threshold(s) can be adjusted based on the resulting clusters.

This shows that Google may adjust what they view as a good click based on the type of query.

But what about types of users. That’s when it all goes to hell in a hand basket right? Nope. Google figured that out.

Moreover, the weighting can be adjusted based on the determined type of the user both in terms of how click duration is translated into good clicks versus not-so-good clicks, and in terms of how much weight to give to the good clicks from a particular user group versus another user group. Some user’s implicit feedback may be more valuable than other users due to the details of a user’s review process. For example, a user that almost always clicks on the highest ranked result can have his good clicks assigned lower weights than a user who more often clicks results lower in the ranking first (since the second user is likely more discriminating in his assessment of what constitutes a good result).

Users are not created equal and Google may weight the click data it receives accordingly.

But they’re missing the boat on topical expertise, right? Not so fast!

In addition, a user can be classified based on his or her query stream. Users that issue many queries on (or related to) a given topic (e.g., queries related to law) can be presumed to have a high degree of expertise with respect to the given topic, and their click data can be weighted accordingly for other queries by them on (or related to) the given topic.

Google may identify topical experts based on queries and weight their click data more heavily.

Frankly, it’s pretty amazing to read this stuff and see just how far Google has teased this out. In fact, they built in safeguards for the type of tests the industry conducts.

Note that safeguards against spammers (users who generate fraudulent clicks in an attempt to boost certain search results) can be taken to help ensure that the user selection data is meaningful, even when very little data is available for a given (rare) query. These safeguards can include employing a user model that describes how a user should behave over time, and if a user doesn’t conform to this model, their click data can be disregarded. The safeguards can be designed to accomplish two main objectives: (1) ensure democracy in the votes (e.g., one single vote per cookie and/or IP for a given query-URL pair), and (2) entirely remove the information coming from cookies or IP addresses that do not look natural in their browsing behavior (e.g., abnormal distribution of click positions, click durations, clicks_per_minute/hour/day, etc.). Suspicious clicks can be removed, and the click signals for queries that appear to be spammed need not be used (e.g., queries for which the clicks feature a distribution of user agents, cookie ages, etc. that do not look normal).

As I mentioned, I’m guessing the short-lived results of our tests are indicative of Google identifying and then ‘disregarding’ that click data. Not only that, they might decide that the cohort of users who engage in this behavior won’t be used (or their impact will be weighted less) in the future.

What this all leads up to is a rank modifier engine that uses implicit feedback (click data) to change search results.

Here’s a fairly clear description from the patent.

A ranking sub-system can include a rank modifier engine that uses implicit user feedback to cause re-ranking of search results in order to improve the final ranking presented to a user of an information retrieval system.

It tracks and logs … everything and uses that to build a rank modifier engine that is then fed back into the ranking engine proper.

But, But, But

Of course this type of system would get tougher as more of the results were personalized. Yet, the way the data is collected seems to indicate that they could overcome this problem.

Google seems to know the inherent quality and relevance of a document, in fact of all documents returned on a SERP. As such they can apply and mitigate the individual user and presentation bias inherent in personalization.

Perhaps it’s a semantics game and if we asked if some combination of ‘click data’ was used to modify results they’d say yes. Or maybe the patent work never made it into production. That’s a possibility.

But looking at it all together and applying Occam’s Razor I tend to think the click-through rate is used as a ranking signal. I don’t think it’s a strong signal but it’s a signal none the less.

Why Does It Matter?

You might be asking, so freaking what? Even if you believe click-through rate is a ranking signal, I’ve demonstrated that manipulating it may be a fool’s errand.

The reason click-through rate matters is that you can influence it with changes to your title tag and meta description. Maybe it’s not enough to tip the scales but trying is better than not isn’t it?

Those ‘old school’ SEO fundamentals are still important.

Or you could go the opposite direction and build your brand equity through other channels to the point where users would seek out your brand in search results irrespective of position.

Over time, that type of behavior could lead to better search rankings.

TL;DR

The evidence suggests that Google does use click-through rate as a ranking signal. Or, more specifically, Google uses click data as an implicit form of feedback to re-rank and improve search results.

Despite their denials, common sense, Google testimony and interviews, industry testing and patents all lend credence to this conclusion.