Reverse proxies, IPs and bad neighborhoods

&nbsp

ergophobe

1:25 am on Dec 10, 2012 (gmt 0)

I've been testing services like Cloudflare, Incapsula and other reverse proxies (RP). The main idea here is - the RP acts as a CDN, speeding up page loads - they also include various security measures to block bad bots and things like that.

The thing is, using one of these services implies a few things

1. Your site is no longer served from its original IP, the IP of your server. It's going to have instead the IP of the proxy server

3. It will share those IPs with other sites using the service. In the case of Cloudflare, that means 350,000 sites are on the same set of IPs. That includes some big players (StumbleUpon) but also includes no doubt some more unsavory characters.

So I'm just wondering what some of the implications for that are. There are a lot of upsides (speed, security), but unlike a classic CDN where people usually just offload static supporting resources, here you're sort of offloading the whole shebang.

Any particular thing you would watch out for in terms of SEO?

Obviously, you would want to verify that Google and Bing at least can crawl the site. What else?

goodroi

3:16 pm on Dec 10, 2012 (gmt 0)

Historically speaking, from a pure SEO perspective many people would have preferred to keep websites separate. IMHO if your website is big enough and popular enough to need these services then your IP quality signals are not a top issue.

If your website is popular enough that you need to use these tools then you most likely already have so many backlinks and usage signals that any possible negative signal from the IP level is negated.

Just make sure that Googlebot can crawl and index the site with no technical issues.

When I deal with popular websites I often find it is more profitable for the overall business to improve user experience even if it results in a small hit to perfect SEO.

Sgt_Kickaxe

3:24 pm on Dec 10, 2012 (gmt 0)

The CDN is also an extra potential point of failure so I would suggest working on other aspects of the site to improve user experience as goodroi said, at least until it's obvious that not having a CDN is causing slowdowns due to heavy traffic. There are countless other ways of speeding up a site and they should all come first, imo, and sites that rely on local traffic for sales should get a local host and not worry too much about a CDN, at least not until traffic dictates.

- the potential issues should be covered by the CDN provider directly, check out their support documents. - the tighter your security is the more visitors you will label wrongly and block. - the more security checks your CDN performs the slower your site becomes.

Are you having slowdowns due to heavy traffic?

ergophobe

11:34 pm on Dec 10, 2012 (gmt 0)

If your website is popular enough

The site I'm concerned about is indeed a top shelf site with a solid corporate reputation, great backlinks and decent traffic, but not like massive viral traffic. I'm new onboard and haven't seen analytics yet, but I'm guessing it's only 200K visitors per month or so.

That said, these services have a lot to offer to less popular sites. I'm testing a couple of them with less important sites that only have a 5K to 10K page views per month. The benefits, though, are things like:

- reducing the number of requests on the origin server - serving up assets from servers closer to the user - splitting requests between the cache and the origin server (again speed advantage) - identification of bad bots, spammers and so forth and various filtering as a first line defense against comment spammers. I'll know how effective this is in another week when I can look at Mollom/Akismet data before/after applying these. - some offer various file aggregations (combining CSS files or, sometimes, inlining it), image caching optimized to device size (essentially "responsive" images).

So you can get a nice performance boost reported in YSlow or Pagespeed even if the site only gets one request a day. Okay, in practice you need enough requests to keep assets live in the cache but a page request per minute would do that for shared assets.

But that brings us to....

Are you having slowdowns due to heavy traffic?

working on other aspects

You're certainly right that it's preferable to simply reduce requests at the origin if possible.

That's phase I and if we get the numbers we want, we'll stop there. I think we can handle this on our end by reducing the number of requests, static caching (possibly a local reverse proxy like Varnish rather than a service like Cloudflare), and so on.

But I'm trying to think ahead in case we can't get there and just generally trying to raise the topic. The site is already using sprites, for example, so I can't reduce request that way. At a certain point without a redesign (not in the budget), it may need extra help.

So to answer your question: I don't think the slowdowns are due to high traffic, but I've just come aboard and don't really have the data to support that. So hopefully not, but if so, I want to be prepared and understand the implications.

2. It isn't - if the origin server is down or slow, the reverse proxy can still serve up cached documents and keep your site online when it would otherwise be offline.

So before deciding on impact on uptime, I'd have to think about the whole system and watch the numbers, but it's not a simple is/isn't thing.

local traffic for sales should get a local host

Thanks for that observation. It doesn't apply to the site in question, but this was one of the issues that I was thinking about in terms of SEO effect of a CDN. Any additional thoughts? My thought is that since local hosting is relatively rare, it's far more important to have the address, including zip code, and phone area code on the site (even if the main phone number is toll free).

- the tighter your security is the more visitors you will label wrongly and block. - the more security checks your CDN performs the slower your site becomes.

Good points. It's relatively hard to see who is getting blocked illegitimately. For testing I've turned security to the lowest settings to do the fewest checks and block only the most nefarious (90% of the blocked IPs are from China and Russia and the test site has little to no appeal there; I don't think I've seen one US IP blocked yet; on the flip side, I'm still letting most bots through).

Anyway, thanks for the thoughtful feedback guys. I appreciate it.

Alex_TJ

11:37 pm on Dec 17, 2012 (gmt 0)

I've found cloudflare to be more reliable than our hosting, which is top tier, so your point two above applies. In almost a year we've never seen anyone blocked mistakenly (there's a messaging service, you'd think at least one out of a hundred up would use it if they're legit). In my opinion cloudflare is a no brainer, even the free version.

ergophobe

5:00 am on Dec 18, 2012 (gmt 0)

Thanks Alex.

I have to say, that I totally forgot about Cloudflare and was reminded of it because I was incorrectly blocked (at least I think I was - AV defs are up to date and brand new computer).

Still, I appreciate your observation based on a year of real-world use. That was my guess, but I don't have the track record with it to know.