Unlike 08 numbers, 03 numbers cost the same to call as geographic landline numbers (starting 01 and 02), even from a mobile phone. They are also normally included in your inclusive call minutes. Please note we may record some calls.

6 Ways To Minimise Wasted Crawl Budget

‘Crawl Budget’; you’ve probably heard the phrase many times, but what is it and what does it mean? The Internet is an ever expanding resource for information and as much as Google would love to be able to crawl every piece of content that exists on it, this is somewhat impossible. As such, domains are assigned a crawl budget – something you must to consider for your SEO campaigns.

It’s important to Google (and other search engines alike) to crawl and index ‘the good stuff’ on the Internet and so to ensure they’re doing this whilst making the most of their limited resources, they allocate each domain a certain amount of crawl budget.

The crawl budget assigned to a domain is how much time they (the search engines) spend crawling a domain each day. This budget varies from domain to domain as it is based on a huge number of factors including the authority and trust of a website, how often it’s updated and much more.

Make The Most Of Your Budget

So, as Google allocates your website a finite crawl budget, isn’t it a good idea to ensure they’re able to search your website efficiently? Well, of course.

It’s important that Google (and in turn users) are able to navigate around your site with ease. This increases the likelihood of Google being able to crawl those important pages on your website and improves the experience for users on your website.

There are a number of common errors found across many websites that can really waste crawl budget. I have highlighted 6 of those and ways you can ensure the wastage of your allocated crawl budget is minimal.

1. Internal And External Linking Issues

There are a number of errors to be aware of when it comes to internal and external linking issues. It goes without saying that if Google and other search engines crawl your website and are met with continual link errors, valuable crawl budget is being wasted. Below are two types of linking issues every webmaster should be aware of:

Internal Redirects

As a rule of thumb, redirects should be 301 redirects wherever possible (as opposed to 302) in order to flow ‘link juice’ through to the new page. If 301 redirects are linked to internally, the links should instead point directly to the live source, not through a redirect, as crawlers that flow through the link have to take more time to get to the destination page. This is wasting valuable crawl budget and means search engines spend less time looking at live pages that you want them to crawl.

Whilst reviewing internal redirects, you also want to ensure no redirect chains or loops exist on your website as this makes it a lot more difficult for both users and crawlers to access pages on your website. There are a number of desktop SEO spider tool programmes available that help to identify technical issues including those discussed, such as Screaming Frog.

Broken Links

It is of course important to ensure no broken links exist on your website, not only is this detrimental to a user’s experience on your site, but it also makes it very difficult for crawlers to navigate around your website. If a crawler can’t get to a page, they can’t index it. It’s important that regular link checks are undertaken across a website to ensure any broken links are fixed as soon as they are discovered, regular checks can be done using a variety of tools, such as Google’s Search Console and Screaming Frog.

2. Internal Linking Structure

Meaningful and user-friendly internal linking helps to pass link value and keyword relevancy around your website whilst also allowing users and robots to navigate through your pages. By not ensuring internal links are used where relevant, you’re missing an opportunity to channel users and robots through your site and build keyword relevancy through natural use of keyword anchor text.

By ensuring proper interlinking is in place and pages are linked to where relevant, you’re making the most of the crawl budget that has been allocated to your website, vastly improving site crawlability.

3. Page Speed

Page speed is an important factor for improving site crawlability. Not only is this an important ranking factor, it can also determine whether or not those all-important pages on your website get seen by search engines.

Albeit common sense, the faster a website is at loading, the more time crawlers can spend crawling different pages on your website. Along with increasing the amount of pages that get crawled, improved page speed also provides the user with a greater experience on your website (winning all around). So make sure time is spent improving the speed of your website if not for site crawlability, for the user!

4. Robots.txt

Blocking

If correctly used, a robots.txt can increase the crawl rate of your website; however it can quite often be used incorrectly and if done so, can greatly affect the crawlability and indexation of your website.

When blocking pages via robots.txt you’re telling a crawler not to access the page or index it, so it’s important to be certain that the pages being blocked do not need to be crawled and indexed. The best way to determine this is by asking yourself; would I want my audience to see this page from search engine results pages?

By efficiently instructing crawlers to not crawl certain pages on your website, crawlers are able to spend their crawl budget navigating pages that are important to you.

Sitemap

As the robots.txt file is one of the first places a crawler looks when first going to a website, it is best practice to use this to direct search engines to your sitemap. This makes it easier for crawlers to index the whole site.

5. URL Parameters

URL parameters are often a major cause of wasted crawl budget, especially with ecommerce websites. In the Google Search Console (formerly Webmaster Tools), you’re offered the easiest way to indicate to Google how to handle parameters in URLs found across your website.

Before using the ‘URL Parameter’ feature, it’s important to understand how parameters work as you could end up excluding important pages from crawl. Google provide a handy resource to learn about this, find out more here.

6. XML and HTML Sitemap

Sitemaps are used by both users and search engines to discover important pages around your website. An XML Sitemap is specifically used by search engines; this is used as to help crawlers discover new pages across your website.

HTML Sitemaps are used by both users and search engines and are again useful in helping crawlers find pages across your site. As Matt Cutts discusses in the video below, it is best practice to have both an XML and HTML Sitemap in place on your website.

Final Thoughts

It is clear that there are a number of ways a website can make the most of the crawl budget it has been allocated. Making it easy for a crawler to navigate your site ensures that your important pages are being seen, by following the 6 tips I have provided, you will be greatly improving your website’s crawlability.

Have any more tips on improving site crawlability? I’d love to hear them, leave a comment below or contact me via Twitter @LukeTheMono

Share this post

Luke’s a pretty chilled guy with a dry sense of humour. He loves his music and is your go-to guy for all things Adele. If he could have a song to describe his life, it would be Queen, Don’t Stop Me Now, so he’s pretty ambitious and up for a good time. His party trick is making the sound of dripping water with his mouth – see… that’s a good time!

Although this is partly true, all of the big searches do obey the rules set in a robots.txt file. Of course another blocking method is to password-protect the relevant directory, this way all web crawlers won’t be able to access and index the confidential/ private content.

Hmm.. Yes, big search engines respect robots.txt but having a backlink from another website to a page being blocked by robots may still result to that page being indexed.

Just clarifying the part “you’re telling a crawler not to access the page or index it”. Crawling and indexing are different. You can explicitly tell search engines not to index something by using meta robots noindex.

Of course, using Meta robots noindex tags is fine if you explicitly don’t want pages indexed, but blocking via robots.txt is a good way to conserve crawl budget which is what the tips in this post are to help achieve.

If it is important that a page isn’t indexed, I would definitely recommend Meta noindex, but for the benefit of saving crawl budget for more valuable pages, blocking via robots.txt is an effective technique.

What do you think?

The Psychology Of Colour In Marketing

When it comes to building a content marketing campaign, it can be difficult to know where to start. You may have an initial idea but bringing it to life and getting your message seen are always harder than initially thought.(more…)

Nicola Churchill @with_nic

30th Jun 2017 Content Marketing

Koozai Launches Free Breakfast & Learn Events

We’re excited to announce that we’re launching a series of free Breakfast & Learn events for brand-side marketers. Our digital marketing experts will help you to boost your SEO, paid media, paid social and content marketing knowledge over breakfast.(more…)

Cat Birch @MissCatBirch

25th May 2017 News

Digital Ideas Monthly

Sign up now and get our free monthly email. It’s filled with our favourite pieces of the news from the industry, SEO, PPC, Social Media and more. And, don’t forget - it’s free, so why haven’t you signed up already?