International SEO: what problem is there with the home page?

Every agency doing international SEO faces a series of problems that can be resolved in many different ways.

Despite the numerous posts and presentations about international SEO published all around the Web, search engine optimization of multilingual and multiregional websites always implies multiple challenges and trials for any SEO expert. This is due to the astounding variety of situations that can arise, for which it is hard to figure out specific and verified solutions that can also be easily transferred from one case and applied into another.

The scenarios can be so diverse that notwithstanding multiple recommendations for or against one or another course of action, in the end, it is the SEO expert’s responsibility to correctly evaluate the multiple variables involved in a project and to adopt the best solution in each particular case.

This solution is, rather frequently, a compromise between the optimal scenario from an SEO point of view, which is technically possible, and what is desirable from a corporate standpoint.

The issue of the home page in multilingual and multiregional websites

There is one aspect in particular, regarding which it is rather difficult to find information, and it is what we call here “the home page problem”.

And really: when a website has multiple-language versions, or even specific versions for the many different countries and markets, what content should be displayed at the root page of the domain, the default home page?

Even though, as I’ve previously mentioned, the casuistry is almost infinite. In this post we are going to consider the scenario of international versions distributed into subdirectories. This content structure is always, and by default, my favorite option.

In this scenario, we could structure the different versions as subdirectories by language only (for example: http://www.mydomain.com/es/, http://www.mydomain.com/en/, and so on) or by language and country (for example: http://www.mydomain.com/es/MX or http://www.mydomain.com/es-MX).

And in each case, we would have to configure the correct international targeting from Google Search Console and the corresponding hreflang/alternate link elements, if we wish to target one specific country.

In this scenario, each subdirectory would display the default home page for its version.

Thus, in the first case, http://www.mydomain.com/es/ would be the default page for our Spanish-speaking audience, and http://www.mydomain.com/en/ would be the default page for our English-speaking audience.

And in the second case, http://www.mydomain.com/es/MX/ or http://www.mydomain.com/es-MX/ would be the default page for our audience in Mexico.

Perfect!

Except for one little detail: what content would our users find if they accessed the root domain, http://www.mydomain.com/?

And what is even more important from the point of view of an SEO: what content would Google find at the URL with the most popularity on our website?

This dilemma is what we call “the home page problem”.

And this is the way it is, because there isn’t an obvious decision to this issue. There are many options, and each has its own advantages and disadvantages.

Let’s consider some possible solutions:

Pre-home page with a language selector.

Page at the root domain with a default preferred language or language/country.

IP or UA-based customization content at the root domain.

Non-indexable root domain URL.

Now, let’s see the advantages and disadvantages of each solution.

Pre-home page with a language/country selector

This solution proposes placing a language and/or country selector as (essentially) main content at the root domain.

For example, zara.com implemented this solution, and the root domain is exclusively dedicated to establishing the language and market of the user who is browsing the page.

The server analyzes the IP address and user-agent of the user to establish the default option in the language/country selector, and once these settings are confirmed by the visitor, they are stored in a cookie file in the user’s browser.

This way, next time the same user visits the root domain of the website, this configuration of the cookie file will automatically redirect them to the chosen subdirectory, which could be the one corresponding, for example, to Spain in Spanish: www.zara.com/es/es.

Advantages

It enables us to establish language and market variables, to correctly configure the product portfolio, currency, prices, applicable taxes, shipping options, as well as many other variables required for an online store to function properly.

Given that these variables are stored in a cookie file, the user’s next visits to our website will direct them to the correct version in a rather transparent way, which favors engagement and conversion.

There isn’t a version that has more “weight” than others, when it comes to popularity concentration. The root domain equally distributes its popularity juice between the many different home versions, as it usually includes links to every one of them (as we can see in the cached version of the home page of Zara):

The root domain of Zara.com includes links allowing search engines to discover the rest of its website’s versions.

Disadvantages

Usually, the root domain is the URL receiving more internal and external links. If we only include a language/country selector as the main content of this URL, we are somehow wasting our most powerful URL address’ strength, which is kind of diluted between the multiple home pages of each country.

It adds an extra click for users visiting our page for the first time, counting from the moment they land at the root domain, to when they finally get to the products they are really interested in. We already know that the smaller the distance in clicks between the home page and any other page is, the better our rankings and conversion ratio get.

If we condition the navigation to the existence of these cookie files or session variables (for example, by forcing a redirection to the root domain, so that the user can set their language and country options), it is possible that the search engine bot will experience difficulties to correctly crawl our website, as they do not accept cookies.

When is it recommended to use this option?

When our audience is very fragmented between many different countries and there isn’t a country that concentrates a significantly greater share of our demand.

When language and country variables are crucial for the personalization of the offer: types of offered products, available product families, currency, prices, shipping options, applicable taxes, etc.

When brand awareness among users reduces the usual organic traffic proportion and the dependence upon good rankings for generic/non-branded search queries is inferior.

When the domain authority is so high, that despite our root domain’s popularity juice being distributed into multiple subdirectories, the brand still holds an advantage over its competitors.

Which aspects should we take into consideration?

We should set the “default” content that search engines can crawl in the absence of cookies and session variables.

We must keep in consideration the effect of the personalization settings the server could input, based on the IP address of origin or the User-Agent of the browser. IP geolocation and User-Agents of the search engine bots should not prevent them from crawling the multiple versions of our website.

We must configure the alternate/hreflang link element in the different versions of the home page (only there), pointing to the root domain with an “x-default” option (besides the corresponding alternate/hreflangs to the home pages of all versions).

We must include alternate/hreflang link elements at the root domain, pointing to the home page of each version, besides the self-referential alternate/hreflang link.

In this case, the home page displayed by Google in the results when we search the brand (for example, “zara”) is the one stated in the corresponding alternate/hreflang element. Thus, if we search “Zara” in Google.es, the displayed home page will be www.zara.com/es.

Home page with a preferred language or language/country by default

This scenario suggests setting up the default home page that corresponds to the largest market for our business at the root domain.

For example, apple.com by default displays the home page of its version for the US market at the root domain, and the rest of the countries “hang” from their corresponding subdirectories.

If we access www.apple.com/es, we will find Apple’s website for Spain, with its corresponding home page, whilst if we just enter www.apple.com in the address bar, the default page for the US market will be displayed.

Advantages

Compared to the previous option, setting up the default home page of our largest market at the root domain has the advantage of this content being supported by the popularity of the URL address with the greatest number of links of our website.

It shortens the distance in clicks from the moment a user lands on our website to when they see something they’re interested in. By skipping the language selector, users are exposed to our products, promotions, etc. from the very first moment, favoring a stronger engagement, lower bounce rate on the home page, better conversion rate, etc.

It concentrates the great portion of popularity in the version of the website competing on the most profitable market for the brand.

Both users and search engines can browse our website, without the need for cookies or session variables, minimizing the risk of indexability issues.

Disadvantages

Users who don’t speak the language used in the main version might find themselves a little lost before they find the language selector to access their preferred version.

It is much easier for users to browse the different versions and compare the prices of products in the various markets. This is something that often makes the owners of online stores feel kind of uncomfortable, as price differences between countries can raise suspicions among customers.

When is it recommended to use this option?

When one of the markets we cater to concentrates a large share of the total demand, so it is worth prioritizing the positioning of one particular version over all the available ones.

When we want to corporately identify a website with one country in particular, as the place of origin of a company or the national ownership of an institution.

Which aspects should we take into consideration?

In this scenario, there are no problems when it comes to crawlability of the website in the absence of cookies or session variables.

In principle, it is not recommended to configure any alternate/hreflang link elements with the “x-default” option of default content. However, it is recommended to provide default content directories in every language, to which we can direct users whose countries don’t have a specific version. For example, apple.com has the /lae/ subdirectory for the entire Latin American audience browsing Internet in English, which doesn’t correspond to any specific country directory of that geographic area.

Sometimes it is important to prioritize user experience over SEO guidelines.

Personalized content

This scenario corresponds to websites hosted in a server where we’ve implemented some kind of a sniffer tool, capable of identifying the IP address of the user, their browser and/or operative system language, and using this data to display personalized content on the home page. In this case, the personalized content would correspond to the location of the IP address and/or language of our user.

If we look at it from the visitor’s perspective, to be met with content in their language that is also relevant to their location obviously represents a very positive experience.

The problem appears when it comes to crawlability by search engines.

Even though since 2015 Google crawls websites from different IP addresses corresponding to locations all around the world, most of its visits still come from IPs located in the United States, and the user-agent of its crawlers identifies as a user browsing the Internet in English. These two aspects are something we need to keep in mind, as in this case they would bear an influence on the “default” version that Google would see.

Meaning, whilst every user visiting the website would see the correct content corresponding to their language and/or country, the home page that Google would find at the root domain is the one that the server would display to a North American user, in cases when a website offers the appropriate version for that user profile.

And then, following the links to the rest of the versions, it would find the rest of the available home pages.

In conclusion, we can say that this implementation is similar to the previous option, where Google would track one version as the “preferred” one, and the rest of them as “secondary”.

Except that in the previous scenario it is us who get to decide which version we want to give higher priority to, and in this case the version tracked as the main one depends on factors that are much more complex and harder to predict.

Advantages

The main advantage of this option resides in better user experience. Any person from any country and language accessing the root domain will see content that was specifically designed for their segment of audience.

Disadvantages

It is much more difficult to control how search engines will crawl the website, as their crawling bots can identify themselves as users coming from different IP addresses and geolocations, as well as browsers configured to different languages.

If the website is crawled later on from different IPs, it is possible that the root domain will be alternatively indexed in different languages.

There is a risk of duplicate content being detected between the crawled content at the root domain (for example, www.mydomain.com), which Google would see by default, and the one corresponding to the home page of the US version (for example, www.mydomain.com/us).

If we choose to include dynamic canonical link elements, we will lose the popularity of the root domain, as it might not be indexed.

When is it recommended to use this option?

When our top priority is user experience and the website is supported by other means of traffic generation, making the weight of organic search traffic over the total traffic only relative.

If we choose to implement dynamic canonical link elements, the root domain URL won’t be indexed, and the behavior in this case will be exactly as described in the next scenario.

Which aspects should we take into consideration?

If North America is not our priority or we don’t have a specific version for this market, we must foresee the content that the server will display by default when it can’t find the required variables to personalize the content, as this is what Google will probably see.

In this scenario, we could have, on one hand, a case of content indexed in several languages under the same URL (as a result of several visits by Googlebot from IP addresses geolocated in different countries); or on the other hand, indexation of the same content with two different URLs (duplicate content), as Google would find the same default content in the root domain and the home page of the subdirectory oriented to the North American market.

For that reason, it is recommended to include a dynamic canonical link element pointing to the canonical URL corresponding to each different case. This way, when Google crawled the “North American” content, it would find at the root domain a canonical link element pointing to www.mydomain.com/us/. This means that it wouldn’t index this content with the root domain URL, but with the URL of the correct subdirectory (which would help us avoid duplicate content, but at the same time would de-index the root domain).

Non-indexable URL

This last scenario combines certain aspects of the cases we’ve studied earlier in this post. In this scenario, the URL address corresponding to the root domain is not indexed.

This can happen due to the configuration of some sort of redirection at the root domain (redirection that is frequently personalized in a similar way to that of the previous case, taking into account the geolocation of the IP address or language identified by the user-agent of a browser), or due to the implementation of canonical link elements pointing to URLs that correspond to the different versions.

In this particular case, no content will benefit from the popularity of the URL with the highest number of inbound links of the entire domain (the root domain), so it will behave as it would in the scenario where the default page was entirely dedicated to a language/country selector.

Advantages

A personalised redirection is almost transparent for the user, so their user experience, on a content level, will be just as satisfactory as with the personalised home page seen in the previous (third) scenario.

Disadvantages

If all the redirections are 301, the root domain won’t be indexed. This will imply loss of the popularity of the page with the most weight of the domain.

The IP address/language detection, and the subsequent redirection will imply an additional delay for users in their access to the content of our website.

When is it recommended to use this option?

When we use some content management system that requires us to “start off” from a subdirectory, or there is another technical determinant demanding it to work this way.

Which aspects should we take into consideration?

If we personalize redirections on the home page to redirect users towards the subdirectory corresponding to their IP and/or language, then these redirections should always be 301.

Alternatively, and if we wanted any of these versions to be identified as the default version, this redirection could be 302. In that case, the canonical link element of the home page of this subdirectory should point to the root domain. Only one of the versions should be configured this way, and the alternate/hreflang link elements should be set in accordance with this configuration (all of them pointing to each subdirectory, except the one set by default, which should point to the root domain). Evidently, in this case we would step out of the scenario of the non-indexable URL, as the root domain would, in fact, be indexed.

Conclusion: there is no obvious solution

When testing all these behaviors server-side, tools like HMA!, VPN and Web Sniffer come in very handy, as they enable us to modify our IP address, as well as our user-agent to better understand a server’s configuration.

Extensions like Web Developer for Firefox are also incredibly useful for this purpose because they enable us to block cookies and even edit their values.

By playing with all these elements, we will be able to identify the different personalization settings of the server, and it will be much easier for us to understand how Google is crawling the content.

However, our job does not always involve analyzing the indexability of a configuration that has already been implemented, and instead, we need to offer recommendations regarding the best possible setup for a correct positioning in international search engines, namely Yandex or Google. In this instance, it is very important to analyze each possible scenario, the aspects we should prioritize in every case and consider the demands of our technological infrastructure in order to choose the most appropriate option for our home page.

What do you think? What is the best solution to the “home page problem” when it comes to doing international SEO?

Comments

Hola Fernando! As the others said, it is certainly the best post on the subject that I have ever read. Thank yo so much for this great post.

Your post helped me to understand some things, but multi-regional and multi-language things are driving me crazy. We run an international e-commerce site and have different regional/language versions. We have not so much products, but we have a version for each country in its own language and also in English.

Google didn’t index our regional version but index our main version.

Let me tell you what did:
-set up hreflang tags
-set up canonicals pointing to the canonical URL corresponding to each different case
-set up specific subdirectories
https://www.website.com/ (en-us site and our main site)
https://www.website.com/en-gb
https://www.website.com/en-ca
https://www.website.com/fr-ca
https://www.website.com/fr-fr
https://www.website.com/en-fr
https://www.website.com/es-es
https://www.website.com/en-es..…
-set up automatic GEO IP redirects (301 redirects)
-created a sitemap index and a different sitemaps for each regional/language version
-created translations for each different language (but pages using English have same content)

It was not working, so we tried to “fetch as google” and understand that google bot was redirected to the USA/main version and didn’t crawl our regional sites. At that point, we thought that the automatic GEO IR was the problem and we tried to make some changes. We are using JS-redirects in this solution.

We implemented he following logic for trying to solve the situation and let google index our site:
-When someone visit any of our pages, if there is no cookie enabled: all redirects between locales are disabled and he will land in the requested page and doesn’t be redirected
-When someone visit any of our pages, if there is a cookie enabled: he will be redirected based on cookie or based on browser’s language and IP settings.

But even that we are still having problems. Google is not indexing our different regional versions. We tried “fetch as google” again and it looks like google bot is not redirected, but if you use “fetch and render” it is redirected.
Right now I don’t know what is the problem. I am thinking about 301 redirects, maybe we should change them to 302. I am also thinking about all that pages with English content, maybe we have many pages sharing same content and it would make sense to remove all English content pages that are not for US, CA or UK.

Hi, Fernando. I really never thought that I will find a solution to my problem just by googling and just hit your post. I was suspecting that this solution is what I needed but I was not sure. Thank you too much! In my case I was searching for that:

Alternatively, and if we wanted any of these versions to be identified as the default version, this redirection could be 302. In that case, the canonical link element of the home page of this subdirectory should point to the root domain. Only one of the versions should be configured this way, and the alternate/hreflang link elements should be set in accordance with this configuration (all of them pointing to each subdirectory, except the one set by default, which should point to the root domain). Evidently, in this case we would step out of the scenario of the non-indexable URL, as the root domain would, in fact, be indexed.

Because I have duplicate URL between mydomain.com/en/ and mydomain.com. The latest is not used. I think the key as you say is putting a canonical but just in one language version in this case mydomain.com/en to point to mydomain.com. And don't put any canonical in mydomain.com/fr/ or mydomain.com/es/ . I keep hreflang without change.

I will give a try and see if mydomain.com/en/ disappear from google search because normally with this change I must see only the canonical version mydomain.com

Hi Youssef. Thanks a lot for your feedback. I am very glad that my post was helpful for you. I think you've got it right. Put a canonical in mydomain.com/en/ pointing to mydomain.com. Also, don't forget to change links pointing to mydomain.com/en/ and point them to mydomain.com instead. You could also set a canonical in both mydomain/fr/ and mydomain/es/ but in this case, both canonicals should be pointing to their own URLs, not to the root domain. If you give it a try, I will appreciate if you share the results here. Thanks a lot.

The result now is that the mydomain.com/en/ didn't show in search result anymore. So it's ok I don't have duplicate content anymore.
Just wanted to point that the first option is was I used before :

If we personalize redirections on the home page to redirect users towards the subdirectory corresponding to their IP and/or language, then these redirections should always be 301

But now I avoid 301 redirects whenever I can because of a nasty behavior of Chrome caching 301 redirects. So if an user hit mydomain.com and 301 redirect it to mydomain.com/en/ but the user changed language to Spanish for instance mydomain.com/es/. The next time he/she hit mydomain.com it will redirect it to English version again and will not take in account the cookie beacuause of 301 cache behavior.

Hi Fernando. This is without question the best post I've seen on this topic. Thank you. I'd like to ask your opinion on a variation of the "Personalized Content" option that I'm considering. Imagine a global e-commerce site where US is the most important market, but there are at least 15 other critical markets worldwide. Each and every URL on the core domain (www.example.com), including the homepage (www.example.com/homepage), employs a redirect (could be 301 or 302 -- I'd prefer the former) that sends the user to the correct language-region variant of that URL (www.example.com/en-us/homepage), based on a combination of the user's geoIP and browser language. Hreflang would be implemented sitewide to help Google with targeting and to reduce the likelihood of duplicate content issues. The x-default would point to en-ww URLs. Country-specific prices and currency (USD in the USA, GBP in the UK, etc) appearing on the pages would also be a strong signal to the search engines that the pages are not duplicates. A couple things concern me about this setup, especially the redirects themselves. What are your thoughts? Thanks again for explaining this topic with such clarity.

Hi Brian. It makes me really happy that you liked my post. Thank you. In the scenario that you describe, you would have a serious problem with Googlebot because most of the time Googlebot identifies itself as an English-speaking user with a US geolocated IP. If I understand your setup right, you would configure 301 redirects on each and every URL of the site. So, you are forcing every browser (including robots) to go to the "right" URL, as far as the server can tell from the user-agent string and the IP geolocation. If this is your desired behaviour, then you would be forcing Googlebot to browse just the English/US targeted version of the site. And even if it can find the links to crawl other versions, redirect on these other versions would force it to go to the US version again. The most probable result would be that Google would index just the English/US targeted version of the Web site, in spite of the fact that you included alternate/hreflang link elements. So, I would not recommend that setup and would limit IP/UA based redirects just to the root domain. Besides, you should also consider that IP geolocation is far from 100% accurate, so you should let the user the ultimate decision about which version to browse.

Fernando it's very kind of you to take the time to respond individually. Much appreciated! One thing I explained poorly: the 301-redirects would not be on every URL, but only on the non-regional URL variants. So www.example.com/page would redirect, but www.example.com/en-us/page would not redirect. I would expect regional URL variants to rank the majority of the time. (We will need to use non-regional URLs for other purposes, not related to search). Does this change your opinion at all? Otherwise, what you wrote about the accuracy of IP geotargeting is very true. Good point. Thanks again.

Hi Debashis. I don't recommend using the keyword meta tag at all. Search engines ignore it and it only serves to inform your competitors which keywords are important for you. Adding H1 tag on your home page depends mostly on the WP template you are using.

You are right. I was considering just the SEO perspective of it. In case your CMS or your internal search engine needs the info from the keywords meta tag, then you should consider implementing it with relevant data.

Thanks for the great info!
I have implemented multilingual website for Betconstruct, where the default language is EN, and there are /es /ru versions as well!
I have chosen a tactic which is not listed in this post, which is the following:

I have removed /en from the URL as it is the default language so now the URLs for English look like this:
betconstruct.com/landing-page
and the spanish version of the same page is
betconstruct.com/es/landing-page!

I still pass the language code of the default EN language to the search engines through the HTML lang attribute lang="en-US"

If the US market is your priority, your choice looks fine to me. It is the same strategy followed by Apple and it is described in the post: you have chosen to display the content targeted to your most important market by default. However, using HTML lang is not the right way to target international markets. You should use international targeting on Google Search Console and implement alternate/hreflang link elements on your HTML code.