Crawl Optimization Issues and How to Fix Them #semrushchat

Crawlers (or “spiders”) are scripts that Google and other search engines use to look in every nook and cranny of your webpage.

By doing this they are able to understand the general purpose of a page and also decide if it is worthy of being included in the SERP. Unfortunately, crawling often returns different issues and errors, which will block spiders from doing their jobs and hurt your rankings.

In SEMrush Chat, we discussed different crawling optimization issues and ways to fix them with Dawn Anderson @dawnieando and our other participants.

A robots.txt is a document that allows you to control how crawlers behave by restricting scripts from accessing certain pages. It helps when a website has a lot of content that doesn’t behave well in organic search: download pages, specific “thanks” pages, etc.

Adding new content to your webpage is always important, because it signals to Google that your site is developing. But let’s not forget about sitemaps – where all this new content should be mentioned – and other important techniques.

The best way to think about internal links is as pathways for crawlers; it’s important to optimize your website’s internal link structure.Avoid using redirects too often, because doing so can stop crawlers from indexing linked pages.

Headlines are important. They’re, well, not the first, but definitely the second or third thing crawlers looks at to determine how useful a page would be to someone who types a query into a search box. Remember to include keywords you’re targeting in your

and

and use appropriate tags for your pages and publications.

Fetch as Google is a tool inside the Search Console; it’s very useful when a marketer wants to evaluate a page as Google would. Don’t forget to send new pages to be indexed so crawlers can find them quicker.

Before fixing common crawling issues, each marketer must identify them first. So how exactly do you do that? We asked our participants.

The very first things to consider are the Google Search Console tools that work in conjunction with crawlers and support marketers with fresh information about the crawling process. The next thing you should do is use some third-party crawlers that simulate GoogleBot’s behavior.

Most of our participants agree that Google’s native tools are almost godlike when it comes to determining issues and errors, and that if you also use third-party bots, it will make your webpage as crawler-friendly as possible.

Stephen Kenwright @stekenwright adds that Search Console is best for qualitative feedback, server logs are best for quantitative feedback. You should see the problem in WMT and see the scale in logs to improve webpage for crawlers.

Let’s recap all these opinions and techniques. To identify spider traps, SEOs should use GSC and ScreamingFrog to crawl websites by themselves.

Above we talked about what companies should do to get search engines to index as manypages as possible. The next thing we’re about to cover is what they should do to avoid being indexed. In what situations do we need this? Let’s find out.

Agent Palmer (@AgentPalmer) was one of the first to answer and said that temporary content (like some promotional or landing pages) used for a short timefor social media,might not want crawling.

As we mentioned above, duplicate content can be a big problem for companies that are growing quickly, as they may not be able to manage all the content on their site at the same time. Some of us choose to use rel=canonical tags, but the easiest to do this is to mark a certain page with “noindex” in its robots.txt.

Companies that respect your personal data are usually concerned with security. That’s why some of them can restrict search crawlers from going on certain pages. Chris Desadoy @EliteYouTubePro noticed that companies also may not want to have their contact pages deep crawled for other obvious reasons -- spam. But it depends on company.

Indeed, aggregation pages are not worthy of being included in an index, because too many links and related pieces of content would confuse spiders and result in Google awarding spam points to your site’s profile.

A “crawl budget” is the amount of pages Google or other search engines will crawl during each visit. The bigger your crawl budget is, the more crawler visits you get. Therefore, your chances of appearing near the top of search engine results pages increases. So how do you measure your crawl budget and what should you do to increase it?

Dan O Brian @DanBlueChief was the first to answer this question by sharing a useful concept packed into six simple words: Test and tweak, test and tweak. And Google Search Console, which was mentioned by Sam Barnes @Sam_Barnes90, will help you track crawls after making changes in order to test your results.

Redirects are not advised, and neither are broken links that return a 404 error or duplicate content. Another thing to pay attention to is the loading speed of your pages. Keep your website fast and Google will like it.

A Rel=”nofollow” attribute helps to restrict spiders from following a link. While it may seem strange, it’s used by marketers who want to improve their crawl budget. The purpose of adding an attribute is to keep search weight on targeted pages. Next comes a checkbox from Martin Kůra.

In order to improve their crawl budget, marketers should optimize their site’s structure and URLs, remove 404s, improve page speed and minimize redirects.

Above we mentioned Screaming Frog and Google Search Console, but are there any other tools marketers can use to predict crawler behavior and fix possible issues?

Here’s what our participants added to the list.

After a recent change in design or usability, you may need for crawlers to revisit your site. What techniques should you use to attract them? The next question reveals the best practices for redesigning and tweaking your site.

Adding a new sitemap is a simple yet effective decision. You can also use an automated sitemap generator, but you should always check your automated sitemaps manually to avoid critical issues or mistakes.

Fetch as Google is a first-rate tool here, because it has the ability to send new or redesigned pages to the index. And GoogleBot will find them very quickly. Peter Nikolow suggests the following checklist.

In conclusion, we’d like to sum up all the steps for getting a website recrawled as quickly as possible after implementing a new design or adding new content: submit a new sitemap, add links to any new content, promote your new content on social media, and finally, fetch and render using Google Search Console.

That’s it for today! Thank you for your attention and brilliant answers.

Special thanks goes to Dawn Anderson @dawnieando for her expertise. See you at the next SEMrush Chat next Wednesday to discuss a new topic: “SEO and UX.”

Elena Terenteva, Product Marketing Manager at SEMrush. Elena has eight years public relations and journalism experience, working as a broadcasting journalist, PR/Content manager for IT and finance companies.
Bookworm, poker player, good swimmer.

56

3

...

Wow-Score

collecting now

The Wow-Score shows how engaging a blog post is. It is calculated based on the correlation between users’ active reading time, their scrolling speed and the article’s length.