Posts Tagged ‘search engine optimization’

Well, it’s after Thanksgiving and I finally get back to the blog. Feels good. This is the next installment about site performance analysis and how to deal with a site with worrisomely slow page loading times. It turns out I had a case study right under my nose. This site, the OnlineMatters blog. Recently, I showed a client my site and watched as it loaded, and loaded and….loaded. I was embarassed but also frustrated. I had just finished my pieces on site performance and knew that this behavior was going to cause my rankings in the SERPs to drop, even before Google releases Caffeine. While I am not trying to publish this blog to a mass audience – to do that I would need to write every day – I still wanted to rank well on keywords I care about. Given what I do, it’s an important proof point for customers and prospects.

So I am going to take advantage of this negative and turn it into a positive for you. You will see how to perform various elements of site analysis by watching me debug this blog in near real time. Yesterday, I spent three hours working through the issues, and I am not done yet. So this first piece will take us about halfway there. But even now you can learn a lot from my struggles.

The first step was to find out just how bad the problem was. The way to do this is to use Pingdom’s Full Page Analysis tool. This tool not only tests page loading speeds but also visualizes which parts of the page are causing problems. An explanation of how to use the tool can be found here, and you should read it before trying to interpret the results for your site. Here is what I got back when I ran the test:

A load time of 11.9 seconds? Ouch! Since Pingdom runs this on their servers, the speed is not influenced by my sometime unpredictably slow Comcast connection.

Pingdom shows I had over 93 items loading with the home page of which the vast majority were images (a partial listing is shown below). There were several (lines 1, 36, 39, 40, 41, 54) where a significant part of the load time was occurring during rendering (that is, after the element had been downloaded into the browser). This is indicated by the blue part of the bar. But in the majority of cases, the load time was mainly caused by the time it took from either the first request to the server until the content began downloading (the yellow portion of the bar), or from the time of downloading to the time rendering began (the green portion). This suggested that

I had too big a page, because the download time for all the content to the browser was very long.

I might have a server bandwidth problem.

But rather than worrying about item 2, which would require a more extensive fix – either an upgrade in service or a new host – I decided to see how far I could get with some simple site fixes.

The first obvious thing to fix was the size of the home page, which was 485 KB – very heavy. I tend to write long (no kidding?) and add several images to posts, so it seemed only natural to reduce the number of entries on my home page below the 10 I had it set for. I set the allowable number in WordPress to 5 entries, saved the changes, and ran the test again.

Miracle of miracles: My page now weighed 133 KB (respectable), had 72 total objects, and downloaded in six seconds. That was a reduction in load time by almost 50% for one simple switch.

Good, but not great. My page loading time was still 6 seconds – it needed to be below 2. So more work was needed.

If you look at the picture above, you can just make out that some of the slowest loading files – between 4 and 6 of them – were .css or Javascript files. Since these are files that are part of WordPress, I chose to let them go for the moment and move onto the next obvious class of files – images. Since images usually represent 80% of page loading times, this was the next obvious place to look. There were between 6 and 10 files – mainly .png files – that were adding substantially to download times. Most of these were a core portion of the template I was using (e.g. header.png). So they effected the whole site and, more importantly, they had been part of the blog before I ever made one entry. The others were the icons in my Add-to-Any toolbar, which also showed on every post on the site.

Since I developed the template myself using Artisteer when I was relatively new to WordPress, I hypothesized that an image compression tool might make a substantial improvement for little effort.

Fortunately, the ySlow Firefox plugin, which is a site performance analyzer we will examine in my next entry, contains smushit, an image compression tool created by Yahoo! that is easy to use, identifies and shows just how much bandwidth it saves, produces all compressed files at the push of a single button, and produces excellent output quality.

So I ran the tool (I sadly did not keep a screenshot of the first run, but a sample output is below), and Smushit reduced image sizes overall by about 8%, and significantly compressed the size of the template elements. So I downloaded the smushed images and uploaded them to the site

As you can see below – my home page was now 89.8 KB, but my load time had increased to 8.8 seconds! – and note on the right of the image that several prior runs confirmed the earlier 6 second load time. So either compression did not help or some other factor was at play.

The fact is the actual rendering times had basically reduced from measurable amounts (e.g. 0.5 seconds) to milliseconds – so the actual file sizes had improved rendering performance. Download times had increased – once again pointing to my host. But before going there, i wanted to see if there were any other elements on the site I could manipulate to improve peformance.

More in next post. BTW, as I go to press this am, my site speed was 5.1 seconds – a new, positive record. Nothing has changed – so more and more I’m suspecting my ISP and feeling I need a virtual private server.

NOTE: Even more important: as I go to press Google has just announced that it is adding a site performance tool to Google Webmaster Tools in anticipation of site performance becoming a ranking factor.

In my last post, I discussed the underlying issues regarding site loading times and SEO rankings. What I tried to do was help the reader understand why site loading times are important from the perspective of someone designing a search engine that has to crawl billions of pages. The post also outlines a few of the structures that they would have to put in place to accurately and effectively crawl all the pages they need in a limited time with limited processing power. I also tried to show that a search engine like Google has a political and economic agenda in ensuring fast sites, not just a technical agenda. Google wants as many people/eyeballs on the web as possible, so it is to their advantage to ensure that web sites provide a good user experience. As a result, they feel quite justified in penalizing sites that do not have good speed/performance characteristics.

As you would expect, the conclusion is that if your site is hugely slow you will not get indexed and will not rank in the SERPs. What is “hugely slow”? Google has indicated that slow is a relative notion and is determined based on the loading times typical of sites in your geographical region. Having said that, relative or not, from an SEO perspective I wouldn’t want to have a site where pages are taking more than 10 seconds on average to load. We have found from the sites we have tested and built that average load times higher than approximately 10 seconds to completely load a page will have a significant impact on being indexed. From a UE perspective, there is some interesting data that the limit on visitors patience is about 6-8 seconds. Google has studied this data, so it would probably prefer to set its threshhold in that region. But I doubt it can. Many small sites are not that sophisticated, do not know these kinds of rules, and do not know how to check or evaluate their site loading times. Besides this, there are often problems with hosts that cause servers to run slowly at times. Google has to take that into account, as well. So I believe that the timeout has to be substantially higher than 6-8 seconds, but 10 seconds as a crawl limit is a guess,

I have yet to see a definitive statement by anyone as to what the absolute limit is for site speed before indexing ceases altogether (if you have a reference, please post it in the comments). I’m sure that if a bot comes to a first page and it exceeds the bot’s timeout threshold in the algorithm, your site won’t get spidered at all. But once the bot gets by the first page, it has to do an on-going computation of average page loading times for the site to determine if the average exceeds the built-in threshold, so at least a few pages would have to be crawled in that case.

Now here’s where it gets interesting. What happens between fast (let’s say < 1-2 second loading times, although this is actually pretty slow but a number Matt Cutts in the video below indicates is ok) and the timeout limit? And how important is site speed as a ranking signal? Let’s answer one question at a time.

When a site is slow but not slow enough to hit any built-in timeout limits (not tied to the number of pages), a couple of things can happen. We do know that Google allocates bot time by the number of pages on the site and the number of pages it has to index/re-index. So for a small site that performs poorly, it is likely that most of the pages will get indexed. Likely, but not a guarantee. It all depends on the cumulative time lag versus the average that a site creates. If a site is large, then you can almost guarantee that some pages will not be indexed, as the cumulative time lag will ultimately hit the threshold set by the bots for a site of that number of pages. By definition, some of your content will not get ranked and you will not get the benefit of that content in your rankings.

As an aside, by the way, there has been a lot of confusion around the <meta name=”revisit-after”> tag. The revisit-after meta tag takes this form <meta name=”revisit-after” content=”5 days”>.
This tag supposedly tells the bots how often to come back to the site to reindex this specific page (in this case 5 days). The idea is that you can improve the crawlability of your site by telling the bots not to index certain pages all the time, but only some of the time. I became aware of this tag at SMX East, when one of the “authorities” on SEO mentioned it as usable for this purpose. The trouble is that, from everything I have read, the tag is completely unsupported by any of the major engines, and was only supported by one tiny search engine (SearchBC) many years ago.

But let’s say you are one of the lucky sites where the site runs slowly but all the pages do get indexed. Do Google or any of the other major search engines use the site’s performance as a ranking signal? In other words, all my pages are in the index. So you would expect that they would be ranked based on the quality of their content and their authority derived from inbound links, site visits, time-on-site, and other typical ranking signals. Performance is not a likely candidate for a ranking signal and isn’t important.

If you thought that, then you were wrong. Historically, Google has said, and Matt Cutts reiterates this in the video below, that site load times do not influence search rankings. But while that may be true now, it may not be in the near future. And this is where Maile’s comments took me by surprise. In a small group session at SMX East 2009, Maile was asked about site performance and rankings. She indicated that for the “middle ground” sites that are indexing but loading slowly, site performance may already be used to influence rankings. Who is right, I can’t say. These are both highly respected professionals who choose their words carefully.

Whatever is true, Google is sending us signals that this change is coming. Senior experts like Matt and Maile don’t say these things lightly. They are well considered and probably approved positions that they are asked to take. This is Google’s way of preventing us from getting mad when the change occurs. Google has the fallback of saying “we warned you this could happen.” Which from today’s viewpoiint means it will happen.

Conclusion: Start working on your site performance now, as it will be important for SEO rankings later.

Oh and, by the way, your user experience will just happen to be better, which is clearly the real reason to fix site performance.

With so many websites and web pages being available and with varying hardware and software configurations, it may be beneficial to identify which web documents may lead to a desired user experience and which may not lead to a desired user experience. By way of example but not limitation, in certain situations it may be beneficial to determine (e.g., classify, rank, characterize) which web documents may not meet performance or other user experience expectations if selected by the user. Such performance may, for example, be affected by server, network, client, file, and/or like processes and/or the software, firmware, and/or hardware resources associated therewith. Once web documents are identified in this manner the resulting user experience information may, for example, be considered when generating the search results.

In does not appear Yahoo! has implemented any aspect of this patent yet, and who knows what the Bing agreement will mean for site performance and search. But clearly this is a “problem” that the search engine muftis have set their eyes on and I would expect that if Google does implement it, others will follow.

Yesterday, a reasonably well-known blogger, Derek Powazek, (whose article, against my strongest desire to give it any further validation in the search engine rankings where this article now ranks #10, gets a link here because at the end of the day the Web is about transparency and the I truly believe that any argument must win out in the realm of ideas) let out a rant against the entire SEO industry. The article, and the responses both on his website and on SearchEngineLand upset me hugely for a number of reasons:

The tone was so angry and demeaning. As I get older (and I hope wiser), I want to speak in a way that bridges differences and heals breaches, not stokes the fire of discord.

I believe the tone was angry in order to evoke strong responses in order to build links in order to rank high in the search engines. Linkbuilding is a tried-and-true, legitimate SEO practice and so invalidates the entire argument Derek makes that understanding and implementing a well thought-out SEO program is so much flim-flam. Even more important to me, do we need to communicate in angry rants in order to get attention in this information and message-overwhelmed universe? Is that what we’ve come to? I sure hope not.

The article’s advice about user experience coming first was right (and has my 100% agreement). But it’s assumptions about SEO and therefore its conclusions were incorrect.

The article’s erroneous conclusions will hurt a number of people who could benefit from good SEO advice. THAT is probably the thing that saddens me most – it will send people off in a direction that will hurt them and their businesses substantially. Good SEO is not a game. It has business implications and by giving bad advice, Derek is potentially costing a lot of good people money that they need to feed their families in these tough times.

The number of responses in agreement with his blog was overwhelming relative to the number that did not agree. That also bothered me – that the perception of our industry is such that so many people feel our work does not serve a legitimate purpose.

The comments on Danny Sullivan’s response to Derek were few, but they were also pro-SEO (of course). Which means that the two communities represented in these articles aren’t talking to each other in any meaningful way. You agree with Derek, comment to him. You agree with Danny, comment there. Like attracts like, but it doesn’t ultimately yield to two communities bridging their difference.

I, too, started to make comments on both sites. But my comments rambled (another one of those prerogatives I maintain in this 140 character world) , and so it became apparent that I would need to create a blog entry to respond to the article – which I truly do not want to do because, frankly, I really don’t want to "raise the volume" of this disagreement between SEO believers and SEO heretics. But I have some things to say that no one else is saying, and it goes to the heart of the debate on why SEO IS important and is absolutely not the same thing as a good user experience of web development.

So to Danny, to Derek, and to all the folks who have entered this debate, I hope you find my comments below useful and, if not, my humble apologies for wasting your valuable time.

Good site design is about the user experience. I started my career in online and software UE design when that term was an oxymoron. My first consulting company, started in 1992, was inspired by David Kelley, my advisor at Stanford, CEO of IDEO (one of the top design firms in the world), and now founder and head of the Stanford School of Design. I was complaining to David about the horrible state of user interfaces in software and that we needed an industry initiative to wake people. His response was "If it’s that bad, go start a company to fix it." Which I did. That company built several products that won awards for their innovative user experience.

That history, I hope, gives credibility to next next statement: I have always believed, and will always believe, that good site experience trumps anything else you do. Design the site for your customer first. Create a "natural" conversation with them as they flow through the site and you will keep loyal customers.

Having said that, universal search engines do not "think" like human beings. They are neither as fast or as capable of understanding loosely organized data. They work according to algorithms that attempt to mimic how we think, but they are a long way from actually achieving it. These algorithms, as well as the underlying structures used to make them effective, also must run in an environment of limited processing power (even with all of Google’s server farms) relative to the volume of information, so they have also made trade-offs between accuracy and speed. Examples of these structures are biword indices and positional indices. I could go into the whole theory of Information architecture, but leave it to say that a universal search engine needs help in interpreting content in order to determine relevance.

Meta data is one area that has evolved to help the engines do this. So, first and foremost, by expecting this information, the search engines expect and need us to include data especially for them that has nothing to do with the end user experience and everything with being found relevant and precise. This is the simplest form of SEO. There are two points here:

Who is going to decide what content goes into these tags? Those responsible for the user experience? I think not. The web developers? Absolutely positively not. It is marketing and those who position the business who make these decisions.

But how does marketing know how a search engine thinks? Most do not. And there are real questions of expertise here, albeit for this simple example, small ones that marketers can (and are) learning. What words should I use for the search engines to consider a page relevant that then go into the meta data? For each meta data field, what is the best structure for the information? How many marketers, for example, know that a title tag should only be 65 characters long, or that a description tag needs to be limited to 150 characters, that the words in anchor text are a critical signaling factor to the search engines, or that alt-text on an image can help a search engine understand the relevance of a page to a specific keyword/search? How many know the data from the SEOMoz Survey of SEO Ranking Factors showing that the best place to put that keyword in a title tag for search engine relevance is in first position, and that the relevance drops off in an exponential manner the further back in the title the keyword sits? On this last point, there isn’t one client who hasn’t asked me for advice. They don’t and can’t track the industry and changes in the algorithms closely enough to follow this. They need SEO experts to help them – a member of the trained and experienced professionals in the SEO industry, and this is just the simplest of SEO issues.

How about navigation? If you do not build good navigational elements into deeper areas of the site (especially large sites) that are specifically for search engines and/or you build it in a way that a search engine can’t follow (e.g. by the use of Javascript in the headers or flash in a single navigation mechanism throughout the site), then the content won’t get indexed and the searcher won’t find it. Why are good search-specific navigational elements so important? It comes back to limited processing power and time. Each search engine has only so much time and power to crawl the billions of pages on the web, numbers that grow every day and where existing pages can change not just every day but every minute. These engines set rules about how much time they will spend crawling a site and if your site is too hard to crawl or too slow, many pages will not make it into the indices and the searcher, once again, will never find what could be hugely relevant content.

Do UE designers or web developers understand these rules at a high level? Many now know not to use Javascript in the headers, to be careful how they use flash and, if they do use it in the navigation, to have alternate navigational elements that help the bots crawl the site quickly. Is this about user experience? Only indirectly. It is absolutely positively about search engine optimization, however, and it is absolutely valid in terms of assuring that relevant content gets put in front of a searcher.

Do UE designers or web developers understand the gotchas with these rules? Unlikely. Most work in one organization with one site (or a limited number of sites). They haven’t seen the actual results of good and bad navigation across 20 or 50 or 100 sites and learned from hard experience what is a best practice. They need an SEO expert, someone from the SEO industry, to help guide them.

Now let’s talk about algorithms. Algorithms, as previously mentioned, are an attempt (and a crude one based on our current understanding of search) at mimicking how searchers (or with personalization a single searcher) think so that searches return relevant results to that searcher. If you write just for people, and structure your pages just for readers, you are doing your customers a disservice because what a human can understand as relevant and what a search engine can grasp of meaning and relevance are not the same. You might write great content for people on the site, but if a search engine can’t understand its relevance, a searcher who cares about that content will never find it.

Does that mean you sacrifice the user experience to poor writing? Absolutely, positively, without qualification not. But within the structure of good writing and a good user experience, you can design a page that helps/signals the search engines, with their limited time and ability to understand content, what keywords are relevant to that page.

Artificial constraint, you say? How is that different than the constraints I have when trying to get my message across with a good user experience in a data sheet? How is that different when I have 15 minutes to get a story across in a presentation to my executive staff in a way that is user friendly and clear in its messaging? Every format, every channel for marketing has constraints. The marketer’s (not the UE designer’s and not the web developer’s) job is to communicate effectively within those constraints.

Does a UE designer or the web developer understand how content is weighted to create a ranking score for a specific keyword within a specific search engine? Do they know how position on the page relates to how the engines consider relevance? Do they understand how page length effects the weighting? Take this example. If I have two pages, one of which contains two exact copies of the content on the first page, which is more relevant? From a search engine’s perspective they are equally relevant, but if a search engine just counted all the words on the second page, it would rank higher. A fix is needed.

One way that many search engines compensate for page length differences is through something called pivoted document length normalization (write me if you want a further explanation). How do I know this? Because I am a search engine professional who spends time every day learning his trade, reading on information architecture and studying the patents filed by the major search engines to understand how the technology of search can or may be evolving. Because – since I can’t know exactly what algorithms are currently being used – I run tests on real sites to see the impact of various content elements on ranking. Because I do competitive analysis on other industry sites to see what legitimate, white hat techniques they have used and content they have created (e.g. videos on a youtube channel that then point to their main site) to signal the relevance of their content to the search engines.

And to Derek’s point, what happens when the algorithms change? Who is there watching the landscape for any change, like an Indian scout in a hunting party looking for the herd of buffalo? Who can help interpret the change and provide guidance on how to adapt content to maintain the best signals of relevance for a keyword to the search engines? Derek makes this sound like an impossible task and a lot of hocus-pocus. It isn’t and it’s not. Professional SEO consultants do this for their clients all the time, by providing good maintenance services. They help their clients content remain relevant, and hopefully ranking high in the SERPs, in the face of constant change.

So to ask again, do UE designers or product managers understand these issues around content? At some high level they may (a lot don’t). Do web developers? Maybe, but most don’t because they don’t deal in content – it is just filler that the code has to deal with (it could be lorem ipsum for their purposes). Do any of these folks in their day-to-day struggles to do their jobs under tight time constraints have the time to spend, as I do, learning and understanding these subtleties or running tests? Absolutely, positively not. They need an SEO professional to counsel them so that they make the right design, content and development choices.

I’ll stop here. I pray I’ve made my point calmly and with a reasoned argument. Please let me know. I’m not Danny Sullivan, Vanessa Fox, Rand Fishkin, or Stephan Spencer, to name a few of our industry’s leading lights. I’m just a humble SEO professional who adores his job and wants to help his clients rank well with their relevant business information. My clients seem to like me and respect what I do, and that gives me an incredible amount of satisfaction and joy.

I’m sorry Derek, but I respect your viewpoint and I know that you truly believe what you are saying. But as an honest, hard-working SEO professional, I couldn’t disagree with you more.

I have avoided (like the plague) weighing in on the tempest Matt Cutts unleashed at SMX Advanced in June regarding Google’s change to the use of the <nofollow> tag for PageRank sculpting. I have avoided it for two reasons:

In my mind, more has been made of it than its true impact on people’s rankings.

As far as I’m concerned, in general (and note those two words) the use of the <nofollow> tag is a last resort and a crutch for less than optimal internal cross-linking around thematic clusters. When internal cross-linking is done right, I don’t believe the use of the <no follow> tag is that impactful.

Bruce Clay had a great show on Webmaster Radio on the subject of the <nofollow> controversy, and basically he was of the same opinion as me. There are also many more heavyweights who have weighed in than I care to name. So adding my comments to the mix isn’t all that helpful to my readers or the SEO community generally.

But I was searching today for some help on undoing 301 redirects when I found this section on the SEOMoz blog (click here for the whole article) from 2007 that provides some historical context for these conversations – so I thought I’d share it here. My compliments to Rand Fiskin of SEOMoz for reproduction of this content:

“2.Does Google recommend the use of nofollow internally as a positive method for controlling the flow of internal link love?

A) Yes – webmasters can feel free to use nofollow internally to help tell Googlebot which pages they want to receive link juice from other pages

(Matt’s precise words were: The nofollow attribute is just a mechanism that gives webmasters the ability to modify PageRank flow at link-level granularity. Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt’ed out), but nofollow on individual links is simpler for some folks to use. There’s no stigma to using nofollow, even on your own internal links; for Google, nofollow’ed links are dropped out of our link graph; we don’t even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level.)

B) Sometimes – we don’t generally encourage this behavior, but if you’re linking to user-generated content pages on your site who’s content you may not trust, nofollow is a way to tell us that.

C) No – nofollow is intended to say “I don’t editorially vouch for the source of this link.” If you’re placing un-trustworthy content on your site, that can hurt you whether you use nofollow to link to those pages or not.”

If you look on the About page of my blog, you’ll see that one of the key audiences I am concerned with are search marketers who for one reason or another came late to the game. While I have been doing online products and marketing since 1992 (think about that one for a second…), I did come late to the search marketing party because at the time that these markets evolved I hired people to sweat the details of day-to-day implementation. I was actually pretty knowledgeable and could do many things myself that most CMOs couldn’t do – e.g. develop extensive keyword lists and upload them into the Adwords Editor, write VB scripts – but I was still a long way away from all the intricacies of the practice.

And let’s start with that as my first statement on the science of online marketing: developing online strategies is relatively easy. It is in the details/intricacies of implementation that great search marketers make their bones. Details come in many forms and, in the interest of time, I will not go into categorizing these. We’ll do that at the end of the series on “From the Trenches.” In the meantime, we’ll just work through them for each area that I’ll cover.

The initial portion of this series will focus on Search Engine Optimization, since this is a very hot topic in the current economy. The approach – given this is a blog – will be to do relatively short modules on one subject within each major topic. Each module will begin with the name of the section and then the topic at hand (e.g. Keyword Analysis – Building the Initial Keyword Universe). I am going to add presentation material in the form of audio powerpoints whuch will provide a bit more extensive coverage of each topic. How long will the presentations be – not sure yet. We’ll have to try it out and see – after all, I’m learning how to present in blog mode just as you are learning how to learn in blog mode.

The sections for basic SEO will run as follows:

Introduction to SEO

Keyword Analysis

Site Architecture Issues

On-Page Content and Meta Data

Link Building

Combining the Basics into an SEO Program

Looking forward to these sessions. I expect to start them shortly – once I get the presentation technology set up.