Is Log Analysis for Web Analytics a Dead Subject?

I’ve been involved in web analytics in one capacity or another for about 11 years. Back in 1998, when I was first getting started on this when working for Webtrends, there were only two ways to go about getting stats from your website: either get a program to crunch the logs for your site (of which there were many), or pay some ridiculous sum for a tool like Aria or NetGenesis or HitList which used packet-sniffers placed in front of your web server, to track various interactions with your site.

In either case, the entire subject of web analytics assumed that every interaction that your users were doing with your site was going to result in another page request back to the server, which you could then track via a log file.

Wow, how times have changed. Try going to a site like the main video channel for the Church of Scientology, or this one for the Volunteer Ministers. You can complete an hour-long stay at either website, and still have only looked at one HTML page. Doesn’t make log-based analytics too entertaining, especially when your videos are hosted off-site.

In late 2004, I did a project for for a company in Atlanta, testing out literally about 30 different web analytics solutions to work out the best one for them. At that point in time, out all the different packages that I reviewed, there was a pretty even split between the number of web analytics companies that were making a go at it with a shrink-wrap, software-based solution that would be hosted at the client side, and the other half were ASP’s.

Many organizations that I was working with were not all too interested in turning to an ASP-based setup for various security reasons, as well as the fact that ASP’s aren’t too great at doing analytics for intranet sites, when the client can’t access the external internet.

At that point in time, the landscape looked something like this:

Analytics Product

ASP or Software

Analysis Type

Webtrends

Either

Log Analysis or Page-tagging

Datanautics G2

Software

Log Analysis & Packet Sniffer

DeepMetrix LiveStats xSP

Software

Log Analysis

Pilot HitList

Software

Log Analysis & Packet Sniffer

Sane NetTracker

Software

Log Analysis or Page-tagging

Sawmill

Software

Log Analysis

SPSS NetGenesis

Software

Log Analysis, Page-tagging and Packet Sniffer

Urchin

Software or ASP

Log analysis or Page-tagging

Eloqua

ASP

Page-tagging

Elytics EAS

ASP

Page-tagging

Manticore Virtual Touchstone

ASP

Page-tagging

Omniture SiteCatalyst

ASP

Page-tagging, but they’d import your old logs for a fee

SageMetrics SageAnalyst

ASP

log analysis and page-tagging

WebSideStory HBX

ASP

Page-tagging, but they’d import your old logs for a fee

As you can see, it was about half-and-half, with most products still clinging to log analysis, but many more progressive (and sometimes completely frightening) products going to page-tagging exclusively to be able to trap and coordinate interactions with your sites.

But now, boy has the landscape changed. After Google bought Urchin and transformed it into a free product, countless companies have now successfully experimented with Google Analytics and found it (and with it, the whole premise of tagging pages) to be a reliable and insightful way of tracking interactions with pages.

Also, the fact of its being free has forced a lot of companies to either (a) drop the web analytics business alltogether, or (b) dramatically change their model so as to differentiate themselves from Google Analytics and move themselves way upmarket.

In any case, it’s a subject for another blog post as to what companies have to do to differentiate themselves from Google Analytics in order to make it worth the cash for users to upgrade from a free product.

The main case here, though, is if log analysis has any value or relevance still in the market? What do you get from a log analysis tool these days that you can’t get from a pixel tracker?

Advertisements

Click all of these:

Like this:

Related

10 comments on “Is Log Analysis for Web Analytics a Dead Subject?”

The Web Server is always going to produce a log file, so there will always be a place for log file analysers, this is as true of the Web as it is for any other maket – Firewalls, Mail servers and much of the new breed of security devices.

One problem you get with Google et al. is that there is no history, and the moment you stop there is no future. Sure the analysis is good, but you are stuck with it once you choose it. Log Analysis and the logs are alwyas there if things change – any why not run both?

Sawmill also now has a JavaScript page tagging option, you can run it in house so you can do your Intranet too) or host it, so Sawmill does both (and you can still run Google too!).

Graham – I definitely agree, and there is always going to be a place for analysis of a log file of some manner. I’m just of the opinion, right now, that whilst the former purpose of log analysis was for marketing & behaviour analysis, at this point the main driving force I see now is IT.

As you can tell from some of my newer posts, I’m becoming a fan of Splunk – which all of the sudden has made so many of my logs much more relevant again.

If we analyze log files coming from a webpage, I think it’s ok if we use an ASP solution or Google Analytics ( in this case we know the historical information will be lost someday).
But in case we want to measure the audience of a video (mp4, flv, or mp3 if audio) played directly trough a player: Windows Media Player, QuickTime Player, VLC, or a Flash Player (not tagged by you) the log analysis is the only way.
Do somebody know a good product focused in the Video and Audio analytics based in log files?

The problem with tagging links to PDFs or other non-HTML files is that it doesn’t take into account any direct traffic from search engines or other pages. If the javascript can’t execute, what can you do?

I think that’s one of the main areas where you do indeed still have to monitor your logs. But for this type of report, where you can’t really integrate it well into your Webtrends/Google Analytics/Omniture reports easily, you just get the data with a tool like Splunk. Over the last year, I’ve become quite a Splunk addict, as it makes questions like that easy to ask – especially ones that are ad-hoc and ones you weren’t expecting management to ask you for.

As you state, obviously a PDF reader is not going to execute Javascript – so indeed the only way to do this is by analyzing logs.

We run a Windows server cluster and had started to move from WebTrends to GA. Tracking external traffic to our PDFs is a big problem that we did not anticipate. Now we are evaluating – do we go back to WT?

Splunk is pretty rad. Free as long as you’re ingesting less than 500MB/day worth of logs into Splunk, and absurdly powerful query / charting / reporting capability. I’d try downloading it & pointing it at the logs that contain your external PDF hits. Especially if you do some pre-processing of the logs to grep -v out data that you don’t need, I’m sure you won’t hit over 500MB/day. And Splunk lets you do any sort of query you want in real-time, instead of the WebTrends way which was relying only on canned reports, and less on ad-hoc queries.

mswas

February 27, 2012

Do you know if Splunk can handle multiple logs created by load balancing systems?

It depends on what’s in the logs. If you’re trying to stitch together sessions that are appearing on multiple load-balanced servers, it’s easiest if you’re logging cookie or other unique identifier that Splunk can use to then establish that the hits came from a single session. Otherwise, you could tie it together with IP & User Agent, but that’s not as reliable. But either way, yes – you can handle multiple logs on load-balanced servers. That’s what I’m currently doing now at my work, we’ve got 4 active apache servers serving the site at any one time, and Splunk ingests all their logs simultaneously & can report on them easily.