Stuck in geolocation hell. Full pages via CDN

We are a large international ecommerce brand operating in
10+ countries. We have a .com site and use a subdomain for each country
we operate in. Our site is centrally hosted in Spain and a CDN delivers
all of our
content including HTML.

Webmaster Tools accounts have been setup and the correct geotarget
selected for each country subdomain. A Webmaster Tools account has also been
setup for our WWW. subdomain (which contains only a 'choose your location'
entry page) and it's geotarget has been left unselected.

We have major geotargeting problems whereby the incorrect
country subdomain appears in the search results. For example US. appears in
google.co.uk and UK. appears in google's US results.

WWW. also outranks a lot of country subdomains - for example in the
US search results our WWW. outranks US. Our branded results show
sitelinks which contain links to the wrong country - for example a US
user clicking on a sitelink will often find themselves on our Swedish
site viewing prices in Euro's not Dollars. We've added hreflang alternate tags to the site and have seen
a marked improvement however it's not been good enough. I'm going to
change the Webmaster Tools geotarget for our WWW. page from unselected to
'unlisted' to see if this has an impact, however I'm not confident that this is
the root cause of all the problems.

As I understand it Google primarily uses the server IP to
geolocate a website and then falls back on GWT settings, on-page signals and inbound
links. These fallback signals aren't working for us so I believe we would be
best suited to direct our efforts into improving the IP signal. Presumably
sites who rely on the fallback methods also find it harder to rank highly in
search results as Google won't have as great a confidence level in the sites
relevancy?

The obvious solution is to overhaul our hosting and CDN
(hosting each subdomain in the respective country, delivering the HTML from our
servers and use the CDN only for images, Javascript and CSS) however I wanted to ask google and
the community if they knew of any other solutions as this overhaul is a tough
and expensive sell to our IT team.

Could an option be to have the CDN develop a special rule
for Google IP's - to serve UK. requests from UK servers, US. requests
from a US
server and so forth? Our CDN delivers the content from the nearest
server to the user's IP. Due to Google crawling entirely from the US all
of our
country subdomains must appear hosted in the US. Matt Cutt says to
'treat Googlebot
as you would a regular user' however Googlebot doesn't replicate the
actions of
a regular user because it crawls entirely from the US. I'd view this new
CDN
rule as a way of treating Google closer to how we would treat the
majority of
our users (the majority of UK. requests come from UK IP's so we'd
deliver from
a local UK server). However please let me know if the Google team would
view
this as a breach of TOS. I don't want to break any rules but at the same
time we
need to escape the Geolocation hell that we're currently in!

Unfortunately the business is very sensitive
about publishing their woes so I can't publish the website details, however I can send this info to Google
off-forum if it helps.

Without the site URL it is difficult to comment.Look with Fetch as Googlebot and see if the alternate hreflang attribute is in the source code and is correct.Look at the source code of Google cache of a few pages that are not working well in search results, see if they were cached after you added hreflang.

Hi Cristina,Yes I'm trying to convince the business that we need to publish the URL in order to secure the neccessary assistance from Google however it's a very large FTSE brand and they feel uncomfortable doing this. I can send details to Google employees as it will remain private but I'm not able to share with the wider community publically which is a real pain.

Fetch as Googlebot is showing the hreflang code being correct and is reflected in Google's cached copy too. Hreflang has been on the site for two full weeks now so the majority of the site has been recached and there has been a vast improvement but problems still remain.

For example one sitelink in the UK search results is a US URL which has been cached with hreflang. The equivalent UK URL has also been cached with hreflang so the reciprocal annotation is there but it's not working.

>>As I understand it Google primarily uses the server IP to geolocate a website and then falls back on GWT settings, on-page signals and inbound links. >>These fallback signals aren't working for us so I believe we would be best suited to direct our efforts into improving the IP signal. Presumably sites who >>rely on the fallback methods also find it harder to rank highly in search results as Google won't have as great a confidence level in the sites relevancy?

>>The obvious solution is to overhaul our hosting and CDN (hosting each subdomain in the respective country, delivering the HTML from our servers and use >>the CDN only for images, Javascript and CSS) however I wanted to ask google and the community if they knew of any other solutions as this overhaul is a >>tough and expensive sell to our IT team.

I would say this is a lot of work for a small amount of gain, if any gain. Since the Googlebots all come from a US IP, I assume they would only see the US IP anyway even if you did this.

>>We have a .com site and use a subdomain for each country we operate in. Our site is centrally hosted in Spain and a CDN delivers all of our content including HTML.

Does EACH of the sub-domains e.t.c. have entirely unique and useful content? Or are large chunks of it from the main site or simply refer to the main site? Either way this explains the situation perfectly. This would include product descriptions e.t.c.

Also do you actually have a reason to display the sub-sites at all if this is the case? Wouldn't a single collaborated site for each language rather than separating them via country be the most useful thing here? It would probably only be the local laws that are different for each site I believe?

I can't say much more than that without seeing the site and having details I'm afraid but they are my general thoughts on the issues you are having.

>>Since the Googlebots all come from a US IP, I assume they would only see the US IP anyway even if you did this.

I was suggesting hosting each country subdomain from its own IP. Google would see our UK html being delivered from a UK IP - only the images, javascript etc would be served by the CDN and come from a US IP.

>>Does EACH of the sub-domains e.t.c. have entirely unique and useful content? ... Wouldn't a single collaborated site for each language rather than separating them via country be the most useful thing here?

No, it's almost entirely duplicate content across the subdomains - product descriptions are the same, and category level pages only vary by product mix. I'd love to have a single collaborated site for each language however in practice it isn't that straightforward. Take our US & UK sites as an example: both sites are english langauge so in theory could be condensed into a single site, with prices dynamically changing based on IP location, however... a large number of products are exclusive to each country. The business categorically does not won't to show a US exclusive product to a UK audience and vice versa - but a single site would a) show the products in both US & UK search results and b) I don't think we could get away with changing the html by IP for each market.

Also, stock levels differ across countries - product A may have stock in US but not in UK - again the business categorically does not want to show UK Out of Stock items to a UK audience. Currently the separate UK & US sites only list in stock products. Upto 20% of UK products may be out of stock when compared against US. A single site would require us to have 20% of unavailable stock being listed on our UK site (and a similar equivalent on the US site). Clicking on products which then say 'out of stock' is a really bad user experience (especially as the majority of products are unlikely to come back in stock again).

I can't see a way around these differing product and stock problems exceot to keep them on separate sites, but would love to hear any suggestions if you have any. The above example is US & UK but we have other English language subdomains for Canada, and Australia - with upwards of 5,000 products on each its not possible to write differing, unique content for each - its a) a lot of resource and b) there's only so many ways that product descriptions can be written and still make sense.

If you know of any sites who're overcoming these difficulties I'd love to know. In the mean time I'm going to continue chasing down the IP based options.

Google will basically pick the one it likes best from the two of them. The IP signal isn't a big enough factor IMHO to make much of a difference here. I very much doubt this will address the problem.

The only realistic way I can see of doing this then would to be to eliminate all the duplicate content, this will mean a lot of work as you will have to rewrite all the content on all the sites which share the same language. The text on each one can then refer specifically to the country it is related to, which should make it more prominent in searches relating to that country.

Basically what your current plan is seems doomed to failure to me. Its not an approach I would suggest as I don't believe it will achieve the results you are looking for. Many UK site hosted in the UK rate below their .com equivalent hosted in the US still regardless of the hosting. The larger, more popular of the sites will almost certainly have more links e.t.c. back to it, so will display more prominently either way.

Thanks very much for your time and thoughts. You're entirely correct that we have a duplicate content nightmare and I completely agree that the Duplicate Content is a major problem - its something that I'm actively pursuing but I don't think this should be pursued exclusively and the Hosting/CDN aspect needs investigating too as theres some interesting activity going on...

I do see both copies (eg. UK and US) of our pages stored in Google's index, with the cached copies of each URL confirming this. Google aren't looking at both copies and deciding to only store one in the Index, they are storing both. Don't get me wrong - there are instances when Google does remove one and,
for example, returns the US version in both UK & US serps - this is definitely a Duplicate Content caused issue.

Why I'm looking at the IP/CDN setup is because, when Google are indexing both the UK and US versions of a page, they often (5-10% of the time) show the UK listing in the US serps, or the US listing in the UK serps. This feels like very strange behaviour and not a typical duplicate content caused scenario, hence wanting to look more closely at the IP/CDN setup. I've read on these forums some problems relating to CDN's and Geolocation (I decided to use 'Geolocation Hell' in this threads title after reading it on a similar thread about CDN's & Geolocation) and wanted to find out if anyone else has ever experienced this - but I can assure you the actual Duplicate Content issue is being pursued too and share your feelings on it.

Yep, thats how Google handles duplicate content, they store both pages, but when it comes to chosing to display a page, they only display one with duplicate content, the 2nd page simply gets filtered from the search results. Which is chooses is generally based on the page rating. Unless its a deliberately local search then its very unlikely the geo-location of the server will actually influence this enough to put the one you want ahead of the main page, simply due to the many more high-quality links the main page is likely to have.

>
This feels like very strange behaviour and not a typical duplicate content caused scenario, hence wanting to look more closely at the IP/CDN setup.

Thats actually pretty standard behaviour, its just picking out the page that has the best quality signals out of the two pages, you'll probably find the ones you are seeing pop up more often has more high quality websites linking directly to that page or similar quality signals.

I'm trying to think of a site structure for a company like your's that would incorporate this into an individual site but unfortunately its actually quite difficult. I was thinking of doing something like the single site you mentioned above but filtering both prices AND product by user location, so it would only display the products for your location BUT if it was Google or Bing bot then display both.

I'm not however sure if Google might class that as a type of cloaking however, which would be bad. I would assume if you had a click-able section called 'products available in other countries' that contained the other products, that would actually solve this, so basically it would just change which section (the main section or the 'hidden' other countries section) depending on the user's IP.

For any IP based solution however I would strongly advise giving the user the ability to select a country manually to over-ride the automatic detection, simply due to the use of VPNs/Proxies as well as mobile traffic which may appear to come from a location other than the user's actual location.

Thanks Zihara. It's not currently on the site but its in the Development stack along with meta language tags for Bing. Schema is also in there - pretty much anything related to Geo-location is in the stack, these are mostly meta related but don't want to ignore the IP/CDN impact as I understand its the primary method for Geolocating sites/users.

Thank you so much for the time you've spent on this, I
really do appreciate it. On the Dupe content getting filtered out - if I
search for 'blue widgets' in the UK then uk.example.com/blue-widgets ranks, if I search the same in the US then us.example.com/blue-widgets ranks. However when you search for our brand name in the UK and the blue widgets page appears in the sitelinks, its the us.example.com/blue-widgets
version that gets used. The US version does have more inbound links
than the UK version so I think you are correct in terms of linking
signals and believe it is the most likely cause, however...

In
the above 'blue widgets' example, the US version also appears in our
Australian and Canadian sitelinks - so I'd need to link build each
version of the page for each language, which irkes me as I'm a "spend
your time improving the website for users and the rest will follow" kind
of guy and link building every country version both goes against the
grain for me and doesn't feel a sustainable way to resolve this.

I
also am struggling to put together a suitable site structure - I'd also
considered filtering prices and products by IP (with a country selector
option) and although I know we could do this from a development
perspective, it's too similar to the BMW situation of showing different
content to users and search engines for my liking. As it stands the
'best fit' is an approach similar to how L'occitane deal with this -
they take a perfectly good site and then slightly tweak it for each
market to avoid duplicate content. For example: http://uk.loccitane.com/angelica,83,1,29954,0.htm & http://usa.loccitane.com/angelica,82,1,29278,0.htm

It
just feels wrong doing this, you end up spending your time writing and
developing for Search Engines rather than users. We spend our time
writing good product descriptions, making them the best we can - I
really don't want to spend $250k on a team of people to rewrite them for
each market, making each version slightly different from every other
version, when they are already the best descriptions they can be.
Purposefully making tweaked descriptions doesn't feel like a step
forward. As Google get better and better at detecting duplicate content,
how sustainable is this option going to be? Making a page 30% different
might work now, but in 6 months time this might be 40% - it's a game I
really don't want to play. It'll work in the short term but it doesn't
feel a long term solution.

I don't think I'm wrong in thinking it
SHOULD be fine to have a US & UK version of the same page if there
are valid scenarios for doing so - international companies do sell
varied product stock, and have differing stock levels across markets.
Google normally get most things right in the end so I think focusing on
Geo-Targetig the site correctly might be the best long term approach, if
not short term. In which case the best course of action is to work on
all Geo-related areas and lobbying Google to either a) state what their
view of how we should structure our site or b) consider making it
acceptable to have duplicates across sites.

If you think up any other ideas for a suitable site structure please let me know!

Thanks again Steven, I've really appreciated the time you've put into my problem

While I agree with you that it should be possible, unfortunately its puts too much emphasis on the webmaster to make sure every country is covered or its too exploitable by MFA sites. They could just copy your page, fill it full of adds, then add geo-location tags to a country you don't have a specific site for and then that would display instead of the legitimate site for searches from that country.

Its a bit of a nightmare when you have to think about the how the scammers/spammers could hack even the simplest thing like that. I would agree the best option would be for Google to allow you to make a single international site, then you could simply include tags at the top of each page that basically say 'If country = x use this page instead' which would provide specific instructions to the search engines about which page you would rather display for each country, however I think this is probably a while off yet! Having the code on the main international page telling the SEs about the other pages is the only way I can see of doing it for the search engines that isn't fairly easy for the bad people to exploit.

My Site does use an ccTLD (.de).
The Site is hosted on an German ISP (1&1).Most of my visitors are from Germany,

I would like to understand if there is a negative impact on serving my site via an CDN,The Page (html) and the Embedded Objects (gif, js, css, ...) would be served via an CDN.My Domain would be CNAMEed via DNS to point to an CDN Domain.

My side is currently hosted at Germany but as far as I understood, if I use an CDN it will be visible via an US IP Address to the Google Search Bot.Would that have a negative impact on my so called In-Country Ranking?

On the other side I have been believing that using an CDN would improve my Page Download Time and therefore also my SEPR Ranking but based on this discussion i am not sure anymore.Would the Page Download Time/Performance improvement help to lower the Risk (in case there is a risk)?

Would be great if somebody could help me to understood this in more detail.

There already is a solution for
'If country = x use this page instead' in the form of the hreflang alternate meta tags. We've implemented this and seen some really good results, it just hasn't done it perfectly (blue widgets example).

I think Google must go through a process of asking "is this a legitimate site? does it have a minimum level of quality links? does it have a minimum level of social 'follows'?" If the answer to these questions are all yes then the whole domain (and subdomains) should be trusted, and the hreflang alternate tags trusted above the other signals such as server IP/Inbound link location. If a webmaster actively states X site is for X country then that should be trusted, not suspect that the webmaster doesn't know what they're doing or has copied it from a friends template. Even that can be gone around with a GA/GWT file upload & verification process.

If a scammer/MFA does clone a site including the content and tags on their own site, then that site should have to go through the "is this a legitimate site?" process.

I've really enjoyed your thoughts, thanks again for taking the time to offer them. If you're ever in the big smoke looking for a SEO position send me a message :)

as you've got a .de site, thats already a very explicit instruction to Google that you're targeted to DE. I don't think you should have any worries about using a CDN, but... as with everything SEO related - make sure you can measure everything.

Use software to measure ranking positions, make sure your analytics is working properly and you have a GWT account then make the change confident that if anything unexpected does happen, you'll be able to spot it.

I'd recommend a code freeze whilst you do this. You don't want to make other changes that may affect your sites performance at the same time as this will cloud your analysis and decrease your confidence in knowing what has happened.

Also - let us know how you get on afterwards, its always good to hear from other webmasters.

Google - do you have a recommended approach for international sites like ours that have slightly differing product offerings, and differing stock levels? That doesn't involve showing 'Not for sale in this country' or 'Out of Stock in this Country' messages?

- There's no need to artificially use local IP addresses for each country-site

- Duplicate content for multi-regional sites is fine & common (it's not perfect, but we live with it)

- Use the hreflang where-ever possible

- Geotargeting does not guarantee that only one location's sites are visible in search

With the hreflang, keep in mind that you need to do this on a per-page basis, and it has to be confirmed from the other language/location pages. When using that markup, we try to swap out the URLs for better, local versions, where we have them. This doesn't change ranking though.

It sounds like you're doing most things right (it's hard to confirm without a URL).. As with other algorithms, we're always working on improving how our algorithms bubble up geotargeted results, but it's important to keep in mind that geotargeting is not something that you can always rely on. Just as a user may click on a link leading to the "wrong" country version, it's possible that we may show the wrong one in search as well (for whatever reason; for example, maybe the user searched in German but actually lives in the UK?). This is something that the site needs to handle on its own once a user is on the site. A common workaround is to use JavaScript to show a banner to users on the "wrong" version of the website, making it possible for users to stay there if they prefer (and for crawlers to crawl the content directly), while still directing them to the site that offers them the best user experience because it's localized for them.

If you have specific search results where you're seeing bad results due to geotargeting, I'd love to pass those on to the team to review.

We
have a multilingual web sites with different CC'TLD’s, Content across all
websites is almost same & we geo-located all the cc’TLD’s with respect to
their countries by using the Google webmasters tools,All the websites are loading from CDN's Systems

Now
the problems is when i search in Google
for site:http://wego.co.in/ , we can see
co.in domain indexed in google but if we look at the cached page its
showing the wego.com.au version. This problem is appearing in Google but not in
other search engines, is there any reason why Google is showing the wrong url
in their cached version. I check with my tech team, honestly we are not doing
any redirection against to the search engines.

Thanks for you r time,My problem is when the user's trying to land on wego portal from Google SRP , they can see the Australia websites metadata in SRP, that will gone effect my CTR from Google SRP. Why google cant able to understand the correct pages when it crawl's my portal.

Thanks for your time on this John. As time has progressed we've seen Google get better and better at displaying the correct URL's in their respective markets (it must have just taken a while for it to have made its way into Google's system). At this stage I'm not concerned about the SEO benefits or costs of doing this, only that users are seeing the relevant results for their market.