blog

JavaScript SEO – How to Crawl JavaScript Rich Websites

Author

SEO for JavaScript websites is considered one of the most complicated fields of technical SEO. Fortunately there is more and more data, case studies and tools to make this a little bit easier, even for technical SEO rookies.

Why is crawling JavaScript complicated?

The answer to this question is rather complex and could just as well be a separate article. To simplify this topic, let’s just say that it is all about computing power. With HTML (PHP, CSS, etc.) based websites, crawlers can “see” website’s content just through analyzing the code.

With JavaScript and dynamic content based websites, a crawler has to read and analyze the Document Object Model (DOM). Such a website has to be fully rendered too, after loading and processing all the code. The simplest tool we can use to see rendered website is… a browser. This is why crawling JavaScript is often referred as crawling using “headless browsers”.

Crawling JavaScript websites without rendering or reading DOM

Before moving forward, let me show you an example of a JavaScript website that you all know: http://www.hulu.com/. To make it even more specific, let’s have a look at the “Casual” TV show landing page – http://www.hulu.com/casual.

Here’s where it gets tricky. If you now use the right tools – e.g. “Inspect code” feature in Google Chrome – you won’t see how it really appears. What you’ll see is DOM-processed & JavaScript Rendered code.

Basically what you see above is code “processed” by the browser.

To see how the source code looked like before rendering you need to use “View Page Source” option.

After doing so, you can quickly notice, that all the content that you saw on the page isn’t actually present within the code.

As you can see above, with JavaScript rendering disabled, crawlers can’t process a website’s code or content and, therefore, the crawled data is useless.

How to start crawling JavaScript?

The simplest possible way to start with JavaScript crawling is by using Screaming Frog SEO Spider. Few people know that, since version 6.0, Screaming Frog supports rendered crawling.

If you already have Screaming Frog installed on your computer, all you have to do is go to Configuration → Spider → Rendering and select JavaScript and enable “Rendered Page Screen Shots.”

After setting this up, we can start crawling data and see each page rendered.

That’s – we are now successfully crawling JavaScript with Screaming Frog.

Word of warning

Please have in mind that the data you get from Screaming Frog is basically how correctly rendered JavaScript should look like. However, Google doesn’t crawl JavaScript in the same way. This is why so many JS websites are investing in prerendering services.

Let me show you an example – as you saw above, Screaming Frog properly crawled and rendered this URL: www.hulu.com/casual. However, this URL isn’t properly indexed by Google.

Here is proof. Google cache:

And – if you don’t believe that the screenshot above proves that Google isn’t always crawling JavaScript properly, let me show you one more example.

Let’s copy and paste content from the Casual TV show landing page:

Unfortunately content from this page isn’t indexed in Google.

Summary

JavaScript is here to stay, and we can expect more of JS in upcoming years. JavaScript can get along with SEO and crawlers as long as it is consulted with SEOs in early stages of designing your website’s architecture.