Lot of talk about site speed recently – no wonder, with Google factoring page load speed into their Page Rank algorithm it’s becoming critically important for websites, including blogs. I was a bit surprised to read the list of the slowest loading websites on the TechMeme Leaderboard, lead by Scobleizer. Why surprised ? Because just days ago I read Robert bragging about how fast his site was – or so he thought:

Do you own a website or blog, like I do? Is it as fast as it absolutely could be? I thought so, after all, my site is hosted at Rackspace where we have huge datacenters and huge pipes and a great CDN.

Yet even he thought he could improve by installing CloudFlare, a CDN with a good deal of security features which has received a lot of publicity recently. Well, we don’t have those huge pipes, so why not try … installed CloudFlare both here @ Enterprise Irregulars and over @ CloudAve. The installation process was absolutely, amazingly easy – especially for a change involving DNS / nameserver setup, which normally is as painful as a root canal.

Next I checked page load times using Pingdom’s load test, which CloudFlare recommends themselves: our load times not only did not speed up, they got somewhat slower after the transition to CloudFlare I read a zillion reviews and 99% of them claim a speed improvement .. so this thing must be working… at least for everyone else. But speed issues aside, I decided to stay with CloudFlare at least for a while, as it promises to protect one’s website from all sorts of malware and attacks –ironically Public Enemy No 1 LulzSec survived several attacks since they were protected by CloudFlare.

OK, it’s not only LulzSec, all the good guys are also praising CloudFlare, so I am trying to love it… but numbers are numbers, and the first stats I’ve received from CloudFlare are rather shocking.

After the first day, CloudFlare registered over 60k pageviews between the two sites, which break down to 34K for CloudAve and somewhat less for Enterprise Irregulars. Wow, that’s nice traffic – too bad it’s unrealistic. There’s some blurb on the CloudFlare site on why their stats will be higher than Javascript-based reporting tool, but I seriously doubt that explains this much growth. And what’s that 6K threats? Where have all those been hiding before? But hey, what do I know… perhaps they really are right… except:

Let me get this straight: the 30-day numbers are actually lower than what we got in just one day? Something just does not compute. And it’s really too bad, because if I can’t trust these numbers, than why should I trust the threat control listing, or CloudFlare’s own numbers which show my sites run about 50% faster than before (contrary to Pingdom’s stats). I am trying to like this service, simply on the basis that so many smart people can’t be wrong, but these numbers are baffling.

What’s your experience? If you have a website, do you use a CDN, perhaps CloudFlare? Please chime in below. CloudFlare team is welcome to comment, too

Threats:
The confusing thing is that the reporting reflects *all* known threats on that overview page. If you go to your Threat Control panel, however, it is only going to reflect visitors that were actually challenged based on your security level settings (if threshold not met for security level = no challenge presented to visitor). If you have blocks or other custom rules in place as well, then these wouldn’t appear in your dashboard for challenged visitors.

Speeding up sites:
Most sites see at least a 40%-60% improvement in page loading times (will vary by site, site size, resources on site, etc.).

The difference in 30 days and yesterday is a little hard to explain & we’re working on a fix to make it the same across the board (broken down by hour or 15 mins., for example, based on the sorting chosen).

A bit more technical color: The yesterday vs. last 30 days bug is a particular annoyance of mine and we’re working on how to get it fixed. To give you some sense, in any given minute we generate several million log lines. In order to process this volume of logs we need to reduce them at various steps. These steps get broken down into “units” that for the last day are in 15-minute increments, but for the last month are in 1-day increments. Doing a lookup for the 15-minute increments to calculate yesterday, and then doing it across the 1-day increments for the rest of the month, turns out to double the lookup time. To get around that we cheated so that the Last 30 Days number is really the Last 30 Days except yesterday. After you’ve been on the system for a while it averages out. In the beginning it looks weird. It really bugs me, and bugs me now even more that you’ve written an article pointing it out. We’ll get it fixed, but that’s the explanation of what’s going on and why it is happening.

I wish I had as clear an answer for the Pingdom question. They are a partner and, usually, their service appears to track the experience of users. However, in some cases they will report radically different load times than browsers themselves report or any other speed monitoring service. We optimize for real world browser performance, not for Pingdom or any other testing tool. We’re working with them to figure out what is going wrong. Our working theory is that it has something to do with Browser Integrity Checks and/or Hotlink Protection. Other than the load-time-that-doesn’t-track-reality-in-some-cases issue, we love Pingdom, use them internally as one of our many monitoring services, and are proud to have them as a partner. We’ll get to the bottom of the problem and get it resolved.

PS – I’m 100% confident in the hits and uniques numbers. Any deviation from Google Analytics there can be explained by the Javascript issue you reference.

Page views depends on how you count them. AJAX-driven sites will tend to over-report Page Views (since a “page view” is a not-entirely-compatible concept in something that is AJAX-driven). That said, we use the industry standard method for counting — the same method Facebook uses. Of course, that’s part of why Facebook reports nearly 900 billion monthly page views — each “poke” counts as a Page View.

Thanks for the explanation – I understand some, but not all … but hey, I am not deeply technical:-)

I’m still stuck with the stats – yes, I get you capture more than Javascript based tools, so I expected a bit higher numbers. 10% 20% 30% more? Who knows. But I am seeing uniques double / triple, and pageviews 7-8x compared to Google Analytics, WordPress, Statcounter, Quantcast, which all deviate from each other a little, but are in the same ballpark. 7-8x is just mindblowing, either those numbers are wrong, or the entire monitoring industry is worth crap, if they are all wrong…

I flicked on all stats systems within the cPanel account for this domain and left it for a few days. When I came back and checked, it had detected wild stats of 6’000 – 8’000 between the different systems. I know they all use the same file to log the stats.

Now, this leads me to think that there is either one of two things happening:
1) The stats are correct and off site/off server stats systems such as Google Analytics and Statcounter are being blocked. Does the firefox “do not track” option prevent some tracking? Are visitors using something that is blocking external tracking scripts? After all, all of the systems that are showing low numbers are external, including our advert server.
2) Cloudflare and cPanel are picking up some activity and translating it to visits and loads. I know this is unlikely, but I can not prove #1.

My concern here is that if this extreme variation between stats can happen, I need to know why and how. Naturally, when you see 29 page loads, you do not expect the site to be using much resources, but in reality it is experiencing 12’000 page loads per day, that’s a difference of 414x or 41’379%!

Presently we have two sites both with a similar setup which are experiencing very similar differences. We have one other site that has less of a difference. All of the others seem to be normal.

I keep seeing that the only reason for a difference in stats is the script used, but I have tested with html tracking code and that developed the same low numbers. In fact, the only difference between javascript and html tracking code over 7 days was an average of 1 extra page load per day for the html code. That shows me that of the 29 page loads, there was likely an extra 1 page load from a user who was not using javascript.
But I can not accept that the differences we are seeing between on server tracking and off server tracking. Even if cloudlfare does track a call returning html as a page load, it does not explain the difference between the cPanel stats and off server stats.

The hits / unique visitor numbers are definitely correct with CloudFlare and are more accurate than any beacon-based tracking (e.g., Google Analytics or Quantcast). Beacon-based services don’t actually report hits because they have no way of tracking them. That’s ok for their purpose — typically beacon-based analytics is used by the advertising industry to measure how many impressions a site will get for an ad and, since the ads themselves are served by JavaScript-based tags, a JavaScript-based beacon yields the appropriate answer — but there are other important metrics these measurements don’t capture. For instance, we often hear about users complaining that their server is slow when Google Analytics doesn’t show any increased traffic. However, when you look at CloudFlare’s data you’ll see at the same time the site slowed down there was a huge increase in visits from the Google, Bing, and Yahoo search crawlers. While advertisers may not care about this traffic, a server admin who has to pay for the bandwidth and server resources it uses certainly does.

Page Views are a tricky beast. Google Analytics actually has this easier than we do. Where our base unit of measurement is a “hit” (i.e., a request for a resource from a server, whether it be HTML, an image, JavaScript, CSS, or anything else), Google Analytics’ is a page view (i.e., a request for a base HTML object). All page views are hits, but not all hits are page views. This means we actually see more data than an analytics program like Google Analytics, but our challenge is categorizing only certain hits into page views.

We follow the most widely accepted industry standard to do this. Unfortunately, that standard is entirely unsatisfying. To explain why, think about your experience on Facebook. What, on Facebook, should count as a page view? When you load the page the first time, of course. But when you comment on a wall post? Thumb through a photo gallery? Poke someone? The idea of a “page view” is born out of a web where every interaction caused the page to reload. In modern, AJAX-driven sites, where the URL in the browser may stay on the same “page” but data is loaded in dynamically, the challenge is trying to map the old analytics standard everyone still uses to the new reality of the web. So in our case, and in Facebook’s since they use the same industry standard, those AJAX requests which count as HITS also get counted as “page views.” Put another way, if Facebook were using Google Analytics to report their Page View numbers, they would be a fraction of the nearly 1 trillion they are closing in on.

The take away I think is this: Google Analytics (and similar services) and CloudFlare both are accurate in what they’re measuring and both have a place in the analytics landscape. GA is great if you’re trying to measure for advertising impressions. CloudFlare is great if you’re trying to measure actual server resources used (hits/bandwidth/bot traffic). There’s a reason that one of the first features we included was the ability to install Google Analytics on all your pages with a single click: we think that data complements the analytics picture we provide and, together, provides a full picture of what someone running a website needs to know to do it well.

Well you do… Your site is hosted at PressHarbor.com and is located in a well-connected data center with bandwidth from multiple Tier 1 providers. I think perhaps the reason you are not seeing the difference you thought you would see was because your performance at PressHarbor was quite excellent. You are on a well-tuned server in a high-peformance network. Perhaps the difference would be more dramatic for hosts that can’t quite offer the level of “out of the box” performance we do.

You said: “I am seeing uniques double / triple, and pageviews 7-8x compared to Google Analytics, WordPress, Statcounter, Quantcast, which all deviate from each other a little, but are in the same ballpark.”

Yes that is totally normal. They are seeing all sorts of traffic that Google Analytics, WordPress, Statcounter, and Quantcast never see. Remember back in the day when we hosted your sites on Blogware? Blogware would give you the real traffic stats which counted access from bots and spiders and other non-humans, and when you tried Google Analytics you wanted to know where all your traffic went? In some cases Blogware users saw 10% of the page views they thought they had – because they didnt realize how many hits their sites got from bots and spiders. That’s still true today and now you are seeing it in reverse.

As they point out, you can’t sell this traffic. It doesn’t change the answer to “How many people read my site?” That question is still best answered by Google Analytics.

But when your hosting provider says – wow, look how much bandwidth you are using! The Cloudflare numbers are more reflective of the total amount of traffic your site has to serve… When Google decides to crawl through 10,000 of your dynamically generated pages (oh the CPU!).

OK, I am starting to get it… CloudFlare numbers are accurate for assessing resource usage, vs “how many times my blog was read”. Perhaps pageview is not the best terminology to use.

The Cloudflare dashboard shows both pageviews and hits and provides this explanation:
– Page Views counts the number of requests to your site which return HTML.
– Hits returns the total number of requests to your site.

But here’s the kicker: the numbers shown are the same for both. Shouldn’t hits be > pageviews?

I envy you guys. I’m still trying to figure out how to add GA to CloudFlare. I read it’s just a button click but where’s the button? It’s not under settings. My biggest complaint is that CF’s dashboard/settings template is not user friendly. The analytics method of tracking views/hits is confusing, not the theory, but the actual reading of it. I have CF pro and I understand that numbers are tracked every 15 minutes but what next? If you add up the numbers in these incremental boxes will that equal the total? According to WP Stats I have 892 page viewsso far today (holiday and all that), but I have over 4K (3K+Visitors) using CF analytics, and around 600 views on GA. It goes without saying that this makes me happy as I run a web magazine, but who should I really believe? I hate to sound obtuse but this seemed to be an ideal forum for such questions, and I apologize if I wandered off topic.

Pros:
– Using the free service and experienced only a few downtimes so far
– Witnessed a small drop in spam so their threat control is working (read the cons regarding this issue)
– They lowered the load on my server
– The service serves static content faster then my server would from one location

Cons:
– Threat control panel is plane awful and it keeps on adding threats even if you whitelist IPs from a certain country or countries. Even got some complaints from visitors who entered additional message they can when they solve the CAPTCHA
– Cookies sent with every file they serve (very bad practice for static files and an issue when it comes to new laws against cookie serving)
– Stats are higher than ones on the server so I wouldn’t rely on them

Have similar, strange results…
Pageviews among to Piwik and Analytics (before I jumped to CloudFlare):100Pageviews per day
Pageviews among CloudFlare: 6000Pageviews per day…
That cannot be true!!
Don’t know a reason or solution…

we also are seeing massive deviation in VISITOR stats between cloudflare and our js analytics (backed up by image for non-js enabled users). the js figures much more closely match our experience of what’s happening. The cloudflare numbers don’t make sense.

@Chris, agree. But now I have new observation: while CloudFare numers are way overstated, I do think I lose some in the other, js based stats. Whenever I take CloudFlare offline for a few days, I see a 30% or so increase in WP Stats, Google Analytics..etc. With CloudFlare back on, the 30% disappears (could it be all the bots ..etc CloudFlare filters out?).

I am starting to think what CloudFlare labels “Unique visitors” is quite close to the total views (not unique) in these other analytics.

Old thread I know, but it’s the best discussion on this anywhere. I have a theory on the whole stats thing, which I experience as well (426k views/month on CF vs 144k on GA):

It’s possible that Cloudflare is logging anything it connects to the server, *including* non-2XX code responses. For example, my organization uses the root domain, and 501 redirects any www. subdomain traffic to the corresponding root URL. On CF, this might be marked as 2 separate requests.

On the other hand, I’m not sure this would explain the discrepancy between the 36k monthly *uniques* on GA, verses the 112k on CF. Any thoughts?