JavaScript & SEO: Making Your Bot Experience As Good As Your User&nbspExperience

The author's views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Understanding JavaScript and its potential impact on search performance is a core skillset of the modern SEO professional. If search engines can’t crawl a site or can’t parse and understand the content, nothing is going to get indexed and the site is not going to rank.

The most important questions for an SEO relating to JavaScript: Can search engines see the content and grasp the website experience? If not, what solutions can be leveraged to fix this?

Fundamentals

What is JavaScript?

When creating a modern web page, there are three major components:

HTML – Hypertext Markup Language serves as the backbone, or organizer of content, on a site. It is the structure of the website (e.g. headings, paragraphs, list elements, etc.) and defining static content.

CSS – Cascading Style Sheets are the design, glitz, glam, and style added to a website. It makes up the presentation layer of the page.

JavaScript – JavaScript is the interactivity and a core component of the dynamic web.

JavaScript is either placed in the HTML document within <script> tags (i.e., it is embedded in the HTML) or linked/referenced. There are currently a plethora of JavaScript libraries and frameworks, including jQuery, AngularJS, ReactJS, EmberJS, etc.

JavaScript libraries and frameworks:

What is AJAX?

AJAX, or Asynchronous JavaScript and XML, is a set of web development techniques combining JavaScript and XML that allows web applications to communicate with a server in the background without interfering with the current page. Asynchronous means that other functions or lines of code can run while the async script is running. XML used to be the primary language to pass data; however, the term AJAX is used for all types of data transfers (including JSON; I guess "AJAJ" doesn’t sound as clean as "AJAX" [pun intended]).

A common use of AJAX is to update the content or layout of a webpage without initiating a full page refresh. Normally, when a page loads, all the assets on the page must be requested and fetched from the server and then rendered on the page. However, with AJAX, only the assets that differ between pages need to be loaded, which improves the user experience as they do not have to refresh the entire page.

One can think of AJAX as mini server calls. A good example of AJAX in action is Google Maps. The page updates without a full page reload (i.e., mini server calls are being used to load content as the user navigates).

What is the Document Object Model (DOM)?

As an SEO professional, you need to understand what the DOM is, because it’s what Google is using to analyze and understand webpages.

The DOM is what you see when you “Inspect Element” in a browser. Simply put, you can think of the DOM as the steps the browser takes after receiving the HTML document to render the page.

The first thing the browser receives is the HTML document. After that, it will start parsing the content within this document and fetch additional resources, such as images, CSS, and JavaScript files.

The DOM is what forms from this parsing of information and resources. One can think of it as a structured, organized version of the webpage’s code.

Nowadays the DOM is often very different from the initial HTML document, due to what’s collectively called dynamic HTML. Dynamic HTML is the ability for a page to change its content depending on user input, environmental conditions (e.g. time of day), and other variables, leveraging HTML, CSS, and JavaScript.

Simple example with a <title> tag that is populated through JavaScript:

HTML source

DOM

What is headless browsing?

Headless browsing is simply the action of fetching webpages without the user interface. It is important to understand because Google, and now Baidu, leverage headless browsing to gain a better understanding of the user’s experience and the content of webpages.

Crawlability

Are bots able to find URLs and understand your site’s architecture? There are two important elements here:

Blocking search engines from your JavaScript (even accidentally).

Proper internal linking, not leveraging JavaScript events as a replacement for HTML tags.

Why is blocking JavaScript such a big deal?

If search engines are blocked from crawling JavaScript, they will not be receiving your site’s full experience. This means search engines are not seeing what the end user is seeing. This can reduce your site’s appeal to search engines and could eventually be considered cloaking (if the intent is indeed malicious).

The easiest way to solve this problem is through providing search engines access to the resources they need to understand your user experience.

!!! Important note: Work with your development team to determine which files should and should not be accessible to search engines.

Internal linking

Internal linking should be implemented with regular anchor tags within the HTML or the DOM (using an a hrefs="www.example.com" HTML tag) versus leveraging JavaScript functions to allow the user to traverse the site.

Essentially: Don’t use JavaScript’s onclick events as a replacement for internal linking. While end URLs might be found and crawled (through strings in JavaScript code or XML sitemaps), they won’t be associated with the global navigation of the site.

Internal linking is a strong signal to search engines regarding the site’s architecture and importance of pages. In fact, internal links are so strong that they can (in certain situations) override “SEO hints” such as canonical tags.

URL structure

The Lone Hash (#) – The lone pound symbol is not crawlable. It is used to identify anchor link (aka jump links). These are the links that allow one to jump to a piece of content on a page. Anything after the lone hash portion of the URL is never sent to the server and will cause the page to automatically scroll to the first element with a matching ID (or the first <a> element with a name of the following information). Google recommends avoiding the use of "#" in URLs.

Hashbang (#!) (and escaped_fragments URLs) – Hashbang URLs were a hack to support crawlers (Google wants to avoid now and only Bing supports). Many a moon ago, Google and Bing developed a complicated AJAX solution, whereby a pretty (#!) URL with the UX co-existed with an equivalent escaped_fragment HTML-based experience for bots. Google has since backtracked on this recommendation, preferring to receive the exact user experience. In escaped fragments, there are two experiences here:

Original Experience (aka Pretty URL): This URL must either have a #! (hashbang) within the URL to indicate that there is an escaped fragment or a meta element indicating that an escaped fragment exists (<meta name="fragment" content="!">).

Escaped Fragment (aka Ugly URL, HTML snapshot): This URL replace the hashbang (#!) with “_escaped_fragment_” and serves the HTML snapshot. It is called the ugly URL because it’s long and looks like (and for all intents and purposes is) a hack.

pushState History API – PushState is navigation-based and part of the History API (think: your web browsing history). Essentially, pushState updates the URL in the address bar and only what needs to change on the page is updated. It allows JS sites to leverage “clean” URLs. PushState is currently supported by Google, when supporting browser navigation for client-side or hybrid rendering.

A good use of pushState is for infinite scroll (i.e., as the user hits new parts of the page the URL will update). Ideally, if the user refreshes the page, the experience will land them in the exact same spot. However, they do not need to refresh the page, as the content updates as they scroll down, while the URL is updated in the address bar.

Example: A good example of a search engine-friendly infinite scroll implementation, created by Google’s John Mueller (go figure), can be found here. He technically leverages the replaceState(), which doesn’t include the same back button functionality as pushState.

Obtainability

Search engines have been shown to employ headless browsing to render the DOM to gain a better understanding of the user’s experience and the content on page. That is to say, Google can process some JavaScript and uses the DOM (instead of the HTML document).

At the same time, there are situations where search engines struggle to comprehend JavaScript. Nobody wants a Hulu situation to happen to their site or a client’s site. It is crucial to understand how bots are interacting with your onsite content. When you aren’t sure, test.

Assuming we’re talking about a search engine bot that executes JavaScript, there are a few important elements for search engines to be able to obtain content:

If the user must interact for something to fire, search engines probably aren’t seeing it.

Google is a lazy user. It doesn’t click, it doesn’t scroll, and it doesn’t log in. If the full UX demands action from the user, special precautions should be taken to ensure that bots are receiving an equivalent experience.

If the JavaScript occurs after the JavaScript load event fires plus ~5-seconds*, search engines may not be seeing it.

*John Mueller mentioned that there is no specific timeout value; however, sites should aim to load within five seconds.

If there are errors within the JavaScript, both browsers and search engines won’t be able to go through and potentially miss sections of pages if the entire code is not executed.

How to make sure Google and other search engines can get your content

1. TEST

The most popular solution to resolving JavaScript is probably not resolving anything (grab a coffee and let Google work its algorithmic brilliance). Providing Google with the same experience as searchers is Google’s preferred scenario.

Google first announced being able to “better understand the web (i.e., JavaScript)” in May 2014. Industry experts suggested that Google could crawl JavaScript way before this announcement. The iPullRank team offered two great pieces on this in 2011: Googlebot is Chrome and How smart are Googlebots? (thank you, Josh and Mike). Adam Audette’s Google can crawl JavaScript and leverages the DOM in 2015 confirmed. Therefore, if you can see your content in the DOM, chances are your content is being parsed by Google.

All of these studies are amazing and help SEOs understand when to be concerned and take a proactive role. However, before you determine that sitting back is the right solution for your site, I recommend being actively cautious by experimenting with small section Think: Jim Collin’s “bullets, then cannonballs” philosophy from his book Great by Choice:

“A bullet is an empirical test aimed at learning what works and meets three criteria: a bullet must be low-cost, low-risk, and low-distraction… 10Xers use bullets to empirically validate what will actually work. Based on that empirical validation, they then concentrate their resources to fire a cannonball, enabling large returns from concentrated bets.”

Consider testing and reviewing through the following:

Confirm that your content is appearing within the DOM.

Test a subset of pages to see if Google can index content.

Manually check quotes from your content.

Fetch with Google and see if content appears.

Fetch with Google supposedly occurs around the load event or before timeout. It's a great test to check to see if Google will be able to see your content and whether or not you’re blocking JavaScript in your robots.txt. Although Fetch with Google is not foolproof, it’s a good starting point.

After you’ve tested all this, what if something's not working and search engines and bots are struggling to index and obtain your content? Perhaps you’re concerned about alternative search engines (DuckDuckGo, Facebook, LinkedIn, etc.), or maybe you’re leveraging meta information that needs to be parsed by other bots, such as Twitter summary cards or Facebook Open Graph tags. If any of this is identified in testing or presents itself as a concern, an HTML snapshot may be the only decision.

2. HTML SNAPSHOTS

What are HTmL snapshots?

HTML snapshots are a fully rendered page (as one might see in the DOM) that can be returned to search engine bots (think: a static HTML version of the DOM).

If search engines (or sites like Facebook) cannot grasp your JavaScript, it’s better to return an HTML snapshot than not to have your content indexed and understood at all. Ideally, your site would leverage some form of user-agent detection on the server side and return the HTML snapshot to the bot.

At the same time, one must recognize that Google wants the same experience as the user (i.e., only provide Google with an HTML snapshot if the tests are dire and the JavaScript search group cannot provide support for your situation).

Considerations

When considering HTML snapshots, you must consider that Google has deprecated this AJAX recommendation. Although Google technically still supports it, Google recommends avoiding it. Yes, Google changed its mind and now want to receive the same experience as the user. This direction makes sense, as it allows the bot to receive an experience more true to the user experience.

A second consideration factor relates to the risk of cloaking. If the HTML snapshots are found to not represent the experience on the page, it’s considered a cloaking risk. Straight from the source:

“The HTML snapshot must contain the same content as the end user would see in a browser. If this is not the case, it may be considered cloaking.” – Google Developer AJAX Crawling FAQs

Benefits

Despite the considerations, HTML snapshots have powerful advantages:

Knowledge that search engines and crawlers will be able to understand the experience.

Certain types of JavaScript may be harder for Google to grasp (cough... Angular (also colloquially referred to as AngularJS 2) …cough).

Other search engines and crawlers (think: Bing, Facebook) will be able to understand the experience.

Bing, among other search engines, has not stated that it can crawl and index JavaScript. HTML snapshots may be the only solution for a JavaScript-heavy site. As always, test to make sure that this is the case before diving in.

Site latency

When browsers receive an HTML document and create the DOM (although there is some level of pre-scanning), most resources are loaded as they appear within the HTML document. This means that if you have a huge file toward the top of your HTML document, a browser will load that immense file first.

The concept of Google’s critical rendering path is to load what the user needs as soon as possible, which can be translated to → "get everything above-the-fold in front of the user, ASAP."

However, if you have unnecessary resources or JavaScript files clogging up the page’s ability to load, you get “render-blocking JavaScript.” Meaning: your JavaScript is blocking the page’s potential to appear as if it’s loading faster (also called: perceived latency).

Render-blocking JavaScript – Solutions

If you analyze your page speed results (through tools like Page Speed Insights Tool, WebPageTest.org, CatchPoint, etc.) and determine that there is a render-blocking JavaScript issue, here are three potential solutions:

!!! Important note: It's important to understand that scripts must be arranged in order of precedence. Scripts that are used to load the above-the-fold content must be prioritized and should not be deferred. Also, any script that references another file can only be used after the referenced file has loaded. Make sure to work closely with your development team to confirm that there are no interruptions to the user’s experience.

TL;DR - Moral of the story

Crawlers and search engines will do their best to crawl, execute, and interpret your JavaScript, but it is not guaranteed. Make sure your content is crawlable, obtainable, and isn’t developing site latency obstructions. The key = every situation demands testing. Based on the results, evaluate potential solutions.

Thanks: Thank you Max Prin (@maxxeight) for reviewing this content piece and sharing your knowledge, insight, and wisdom. It wouldn’t be the same without you.

About alexis-sanders —

Alexis Sanders is an Account Manager with Merkle, a full-service digital marketing agency. At Merkle, she supports Fortune 500 accounts drive their organic performance. She has a passion for many areas in SEO, including: performance analytics, semantic search, local, mobile, technical SEO, and developing stellar user experiences. In her free time, she serves as an active leader in Toastmasters International, practices Judo and Jiu Jitsu, and enjoys the arts.

Love your work. There are real gems in the #! story. Perhaps this is my Microsoft bias showing through but, anything that only Bing does probably a bad idea. Friends don't let friends use Bing.

Do you have a good example of files that should or should not be accessible to search engines? I could follow the methodology of internal linking and the URL structure, but I am still a little fuzzy on the big picture idea of when to use it.

I forgot a thing. I live in Spain. Here in Spain we have a problem with the cookies. The problem is because un user enter to any web, the website have to show a popup with a notice with information about cookies. This popup have to be on the header or footer the web page. This factor is very important since it is not only that to appear that notice, we have to implement javascript that is not very recommendable for Seo terms. But it also causes the loading speed of the page to slow down. I have been doing tests and the download can take a percentage higher than 10% unlike if javascritp was not inserted. In any case the law here in Spain forces you to place this message, and in case of not putting it, there are enough important sanctions.

This post is really well defined and executed. The most important thing is DOM. Many SEOs are not much aware of it. It is equally important to understand that how JavaScript works with the DOM interface. I feel that Fetch and Render implementation on search console was one of the remarkable steps from Google.

Thanks for publishing the details. It is not often someone in the marketing world groks technical specifics. I have specialized in single page apps and web performance for almost a decade now and discovered many of these principles to be truths a long time ago. I have a couple of thoughts.

I don't recommend the History API. While it looks good in theory, it is a mess in practice. You are much better off managing your content and assets yourself. The best practice here is to cache them in the browser. When I first started building mobile first SPAs localStorage was the way to do this. Today IndexedDB, unless you can use a service worker. Service workers and caching make my original architecture native in the browser, plus offload the processing to a separate thread.

Speaking of threads, you want to minimize the amount of client-side JS needed to parse and execute. JS execution is rendering blocking. This is why the Page Insights tool indicates tag managers are killing your performance.

Way too many sites add too much JavaScript to their payload. 90-95% is never used. However the browser much download the file, uncompress (if properly compressed), parse the script, then execute the script. On your developer's i7 with 16GB of ram running over the bus there is no perceived latency. Add the network and most likely a mobile processor, now that 20ms needed to parse jQuery now turns into 1000+ms. Because the browser executes all rendering processes (HTML, CSS & JS) on the same thread, nothing else happens. The page just sites there.

Now add 500-800kb of 3rd party scripts, which you do not control and your page is sitting there, doing nothing, or worse once it is done parsing that script re-renders the entire page and most likely jumps around the screen.Limit your 3rd party scripts to only 1 Analytics provider (like Google Analytics) and only the remarketing pixels you absolutely need. If possible to place the pixel without JavaScript choose that option. 3rd party scripts are killing the web these days.

You also mentioned some of the popular fast food frameworks like Angular, React & Ember. These frameworks are all super slow (I don't care how Facebook markets React). Soasta released research they conducted on over 400 Billion user sessions last year at Velocity and show a definite consumer drop off when fast food frameworks were used.Kudos for mentioning the critical rendering path. I have been trying to teach this to developers for years. They don't seem to care. This is driving the browser vendors nuts! Understanding how the browser renders is critical (see what I did there LOL) to getting your content on the screen and a way the user can interact. You have less than 3 seconds and honestly you have 1 second before they become disengaged and start watching cat videos on YouTube :). That 3 seconds is mobile, 3G, on a crappy phone BTW.

And that is my final point. Always test your experience on a slow network connection, on a crappy phone. If it loads quickly there, it loads quickly everywhere.

History API as a mess - I'd be interested to hear more about your thoughts on this.

Leveraging PWAs (including IndexedDB capabilities) - PWAs are a really exciting field that crosses wed dev, SEO, and (even) mobile app dev. There is a lot of opportunity for sites without a native app experience (on android) and improving the UX. It's really fascinating to see some of the things ones can do with PWAs (my favorite for e-commerce is the ability to purchase items offline (and process upon re-connection). I find this to be incredibly seamless and impressive). PWAs and their capabilities/considerations for SEO is likely a whole whitepaper/blog post/presentation/guide/novel. TL;DR - I like where your mind's at here.

Too much JS/ Elements - Yep. #WebObesity. We have too much of everything. I love Google's mantra - the fastest request is the one not sent (lol). It is important to make sure that all resources are being used and that dead code/markup is not dragging performance down. There are tools that can help identify unnecessary elements, ex -

Frameworks/Libraries - It's going to be an uphill battle with developers to talk frameworks and libraries (the big mac is too tempting!!). However, I love the research and it's definitely a worth-while conversation!

CRP and Developers relationship - Thank you! Definitely an uphill battle with this one. Hopefully, the more people talk about it, the more it'll be built into the development process!

we think a lot alike ;) Are you sure you have not been in the audience at one of my talks :)BTW PWAs are more than Android. Edge is shipping full support in the next update. Plus the way I have been building sites uses the same techniques so you can 'polyfil' on iOS and other out dated platforms.

Alexis, have you done any testing around the weight of content when it's rendered on the page with JS instead of HTML appearing directly in the source code?

I've seen Google is spotty, at best, when understanding a site's JS and remain incredibly skeptical about its ability to a) crawl JS and b) weight the content the same as if it were to appear in the source code. Great intro of JS to forward onto a non-technical SEO team :).

Great article - I do have a couple questions for people who work in the SEO field who know Javascript/other coding languages...

I work mainly with Local SEO clients, however have a few national clients as well. I've been working in the SEO field for about 2 years now, and hope to open my own agency soon.

I'd love to learn how to code Javascript, however only if it'll be beneficial for my career. I'd love to get some opinions on if it's worthwhile to learn, if a different language might be better, or if I should keep advancing my article writing (which I know has room for improvement).

It's important to understand JavaScript, but beyond that it's not worth your time to memorize/know intricacies that can easily be looked up in a seconds on Github or Stack Overflow.

I always encourage people to learn and play around with Python. It's incredibly powerful and has so many libraries to do nearly anything!!! Plus, it's an essential player in most Machine Learning/Data Science applications, which is the future (imo).

There are a ton of great resources online. YouTube is also a great place to get started and watch demos that you can try out.

Hi Alexis! Thank you for your post! It's actually interesting. Your explanations about how Javascript works and how Crawlers can understand them are very clear. It's very interesting too your appointment about asynchronous javascripts. In fact, now i'm working to avoid blocking javascrits in my web pages.

This is brilliant, i am actually trying to get very technical with my SEO and want to learn a lot more Java Script. Infact does anyone know any really good Technical (javascript) SEO courses near the Manchester Area.

From what I've seen, Google is doing a pretty good job indexing textual content that's populated with JavaScript. There are of course extenuating circumstance, which make testing important. Every implementation has it's own quirks.

I'm hopeful for Bing. I've heard they crawl headless, but from an indexation of textual content I haven't seen they're there yet with clients or public sites. There's a neat public site by @stepchowfun, http://www.doesgoogleexecutejavascript.com/ that I'll check every few weeks for Bing and typically I'll see the "does not execute JavaScript" populate within the SERP. (Google shows the opposite, positive messaging) within the SERP.

Great piece Alexis! Some great insights there in to the more technical sides of it. We are currently progressing our js knowledge, and to have some things to bear in mind is great. The critical path comes up a lot on Google training tools, so glad you mentioned that.

The CRP is a really great concept Google introduced. It seems that it is sometimes harder to put in practice (especially retroactively); however, the more knowledgeable dev teams are about such concepts, the more it can be ingrained into the initial creation process.

Thanks for sharing it. It will really help us. There is a major role of HTML, CSS & javascript to develop a webpage. We should keep in mind these all strategy because page, webpage loading time also increase our bounce rate. Most of the person do not wait for loading a website.

Congratulations, incredible work. Moreover, the illustrated examples are fantastic. I have been very helpful in using async in the javascript. Any library could be added by async method?? Thanks so much

Hi Alexis just signed up to Moz and finding a lot of interesting articles. And really getting into this SEO stuff (not understanding it all just yet but getting there) I have a query about the internal links part of your blog and the # can you tell me if this tag at the end of my link, harm my crawlability within google e.g driving-lessons-in-burton#dm as this is what appears in the browser bar once clicked from a button on my site. Thanks in advance Mark

Google will not specifically crawl or index /driving-lessons-in-burton#dm (however, since it crawled /driving-lessons-in-burton the content on the page will likely be indexed and ranking for related queries (assuming #dm's content lives on driving-lessons-in-burton)).

Hi Alexis - lots to take in! Thanks for the heads-up about ‘The Lone Hash ’ I imagine it could impact seo for one page websites? - for example, using URLs such as #services #portfolio #contact to reach a section on the page. Obviously it can be harder to rank a single page website for multiple keywords anyway so maybe best avoided, although the layout is popular with some clients.

Great tutorial Alexis..! firstly, i am going to bookmark. The post cleared my doubts and Interesting to see dancing JavaScript dead-pool toy and Vin Diesel for DOM :)

True, JavaScript/AJAX is a best option for a great web experience, but unless it's being indexed, it's probably useless! I am using fetch as google, but sometimes the api calls not complete by this, so i personally prefer server side rendering option. one more thing - what's your opinion about isomorphic JavaScript approach? Is it really useful?

I was exactly looking for this certain information for a long time. Thanks a lot Alexis for this post.

Hi Kuldeep. I completely agree. If search engines are not able to index content, both the UX and BX (bot experience) suffer. In terms of the fetch/render, it can definitely be a bit spotty (which is not confidence inspiring). For Google, I would still stick to testing, because the Google team really wants the UX.

So, Isomorphic JavaScript is a form of server-side rendering, that works in unison with the client-side. High-level SEO answer - It can solve a JS/SEO problem eloquently. Longer answer - it also comes with some dev challenges. AirBnB has a solid write up and Lullabot covers the challenges well:

I liked your article. Quite extensive and with a lots of information. I think the best option is make easy webs for search engines. Making webs with some programming languages, will not help us in our work as Seo. So create friendly webs for search engines, is a good idea.

My chat site doesn't use a lot of java scripts. So I check it up on google speed test and it shows I still have to reduce a lot of scripts on my site. I mean like if google themselves don't follow the guidelines they provide then why do they push others to follow it in order to achieve a better on page S.E.O. Just check it up google.com like literally hits all the red marks on a page speed test.

Interesting. I've found Page Speed Insights to be really useful, but I've also seen it return with some non-relevant recommendations (example, mentioning tag managers as render-blocking, technically they should be synchronous since they manage other tags).

Interesting. If they're not using the AJAX crawling scheme, it might be worth updating that; however, if they are, even though the method is deprecated by Google, it is still supported. I wouldn't condone it, but situation is vital here.

Great information sharing.. It's a really helpful to SEO person..Now I can easily understand of basic Java Scripts and DOM related information and now I can deeply understand all things regarding crawlable... Thanks for sharing..