Notes on usability and related things by a project manager who manages electronic publishing projects.

May 28, 2012

When they were younger, both my kids were fans of Dr Seuss's Sleep Book (as I was in my turn). Among the delights for a small bed-time reader, Dr Seuss provides real time statistics about the number of people currently asleep, and (like a good statistics provider) publishes his methodology:

"We find out how many, we learn the amount,

By an Audio-Telly-o-Tally-o Count.

On a mountain, halfway between Reno and Rome,

We have a machine in a plexiglass dome

Which listens and looks into everyone's home.

And whenever is sees a new sleeper go flop,

It jiggles and lets a new Biggel-Ball drop.

Our chap counts these ballls as they plup in a cup,

And that's how we know who is down and who's up."

There's also a wonderfully goofy illustration of the machine, and I think it was this, rather than any predestination to work on usage statistics that made this one of my favourite parts of the book when I was a child.

In the real world, web usage statistics sometimes seem to offer the power (and intrusion) of the Audio-Telly-o-Tally-o Count, only to snatch it away again, and offer subtly different statistics, with various caveats.

As an example, suppose you own a website, and understandably want to know "how many visits did people make to my site in the last week?"

If you had an Audio-Telly-o-Tally-o Count, your "chap" would magically listen and look into everyone's home or ofice, find people actually visiting the site ...and then all that remains is to count the Biggel-Balls.

Of course, that's not what a usage stats tool does.

When someone requests a page from your website, their browser sends one or more requests to your website's "webserver" to send the text, images and other content that the customer needs to view the page. Along with that request comes some information about the customer's computer - its IP address, operating system, screen size, and some information about the kind of browser the customer is using. It also often tells us the URL of the page the customer came from. The webserver can record all this (in a file called a "server log") along with the time of the request, for analysis later. This combination of facts is fairly unique to the computer - think of it perhaps as being like a footprint. When the customer requests the next page, all this happens again, creating a further "footprint". As the customer visits further pages, his or her computer creates a series of further "footprints" - the process of following their visit for analysis purposes is a bit like following a trail of footprints down a beach.

For competeness here I should say that not all analysis runs from server logs. In a popular alternative the webserver includes a small program with each webpage that the customer's browser runs when it assembles the page. The "small program" (used for example by Google Analytics) causes a message to be sent out to a log file for analysis later. And there are other methods. For the purposes of the discussion here, it comes to much the same thing.

As another aside, it is of course sometimes possible to requre every user to log in, and then to follow their individually-identified activity with a cookie. That provides more detailed information, but is not always desirable (in some circumstances requiring people to log in makes them go away instead; not everyone will accept cookies, and so on).

Pushing on with the "footprints on the beach" analogy, it's worth noting that we have a science ficton or fantasy beach here - trails can suddenly start as if someone was teleported in by futuristic technology or magic (e.g. the customer came in from a bookmark, or typed the URL of our page rather than following a link that we can detect). Similarly, trails of footprints almost always suddenly stop (e.g. the customer stopped using their browser, or went to another site). The way the Internet works means that customers don't have to do anything formally to leave your site; they just stop requesting pages.

Imagine now a detective following these following these trails of footprints around on the beach. How do the clues compare with the Audio-Telly-o-Tally-o Count?

The detective has the following problems:

"Footprints" are fairly unique to a given computer, but not completely so. If Big Corporation Inc. has bought a batch of identical computers and successfully forbids its staff from customizing them in any way, then all the computers will have identical footrprints. The detective may struggle to sort out all those size 42 Converse Sneakers. The usual counting rule is to count all this as one user (technically one "unique browser") whereas the the Audio-Telly-o-Tally-o Count can magicallly see several people, and so drops several Biggel-Balls.

The Audio-Telly-o-Tally-o Count magically sees exactly where people stop using your website and do something else - and where they are still on the website, but not requesting pages. The detective only has the observation that the footprints stopped (the standard is to declare that a user session has ended if there are no more page requests for 30 minutes). Clearly this is arbitrary - the Audio-Telly-o-Tally-o Count might know that the user is still avidly reading a long web page, has broken off to answer the phone etc. So we might get one Biggel-Ball, as opposed to counting a new session each time there is a 30-minute gap.

The detective is counting "footprints" of a computer, not the people behind it. So imagine a public library which has one computer, on which people come and go all day, many of them looking at your website. The Audio-Telly-o-Tally-o Count magicallly follows this, counting the people coming and going. The detective, following computer-generated "footprints" does not know that a different human is now filling those shoes. If there's a 30-minute break, of course the detective assumes this is a new session, but if the queue at the computer is moving swiftly enough, then this won't happen often, and several different humans will be counted as one visit.

The Audio-Telly-o-Tally-o Count magically watches as a user switches from their desktop PC to their laptop or mobile device, or their computer at home, and can tell that this is one human continuing his or her visit. But each of these devices has a different footprint for the detective - the trainers suddenly stop, and a pair of heels carry on down the beach. So (unless the customer identifies themselves, e.g. by logging in) the detective counts a new visit each time the user switches device.

So, since we do not have an Audio-Telly-o-Tally-o Count, we can't count "visits" exactly in the common-sense meaning of the term. We can count "unique browsers" and "sessions" and combine those into a "visit" - a statistic which has some sources of error, but at least the major sources of error are known and the statistic is captured by a known and reproducible method. Note that the methodology is such that errors will usually result in under-counting: probably better for the business and its advertisers than getting an inflated idea of the traffic. It's currently the best that can be done, not due to the limitations of your usage analytics tool or usage analytics people, but due to problems with what you can actually measure, and the decisions you have to make to interpret this.

Until recently, publishers typically sold by the Wolesale model. The publisher and wholesaler agree a price and a number of printed books to be shipped to the wholesaler. It is then the wholesaler's business to decide what price to charge their customer. Since the collapse of the Net Book Agreement in the 1990s, a UK publisher has had no legal right to dictate retail prices. Probably that has resulted in a lot of books that I have bought since then have been cheaper for me as a consumer. But it has caused some discomfort in the publishing industry, and probably not just moaning about having to work harder to survive in an industry that has become more competitive.

Critics of the wholesale model say that it leads towards monopolies in book-selling. A wholesaler doing large volume can arm-wrestle better with publishers over terms, and then gain competitive advantage from that lower price. That can drive up market share, enabling the seller to get better terms out of the publisher next time. And so it goes. Moreover, sellers with enough volume can afford to sell some titles as a loss-leader (i.e. making a loss on every copy sold but hoping to make up in other ways). Later Harry Potter novels famously were used as loss-leaders, with stories of small bookshops buying stock from supermarkets and reselling, as they could get a better price there than from the publisher or their usual wholesaler. Only those with deep pockets can loss-lead, and trade magazines at the time had a lot of copy about the fairness of all this.

Perhaps because of these problems, a number of publishers are using the Agency model for eBooks. In this, the publisher and wholesaler agree a price and a percentage commission that the wholesaler gets. With eBooks, it would not be necessary to agree a quantity of copies - the agent could sell what they can and the publisher could find out how things went from the sales data the seller provides. Apple use this model for iTunes, for example.

But once publishers are again able to set prices (nowadays without legal cover from the Net Book Agreeement) publishers have to be very careful what they discuss with each other, lest they move into cartel land.

It is a confusing time for the industry - I recently was discussing the fate of fiction publishing with my friend the author and lecturer Anthony Nanson. We couldn't decide whether a combination of price-cutting and piracy would render today's publishing models mostly un-economic, or whether there would be a new golden age.

"The platforms of social media are built around weak ties. Twitter is a way of following (or being followed by) people you may never have met. Facebook is a tool for efficiently managing your acquaintances, for keeping up with the people you would not otherwise be able to stay in touch with. That’s why you can have a thousand “friends” on Facebook, as you never could in real life.

This is in many ways a wonderful thing. There is strength in weak ties, as the sociologist Mark Granovetter has observed. Our acquaintances—not our friends—are our greatest source of new ideas and information. The Internet lets us exploit the power of these kinds of distant connections with marvellous efficiency. It’s terrific at the diffusion of innovation, interdisciplinary collaboration, seamlessly matching up buyers and sellers, and the logistical functions of the dating world. But weak ties seldom lead to high-risk activism."

His example of "high-risk activism" is involvement in the US Civil Rights movement - activists ran a substantial risk of threats, abuse and violence.

The article has been the source of several rebuttals - by Chris Lake on eConsultancy and by Leo Mirani in the Guardian, for example, but these authors seem not to be disagreeing with Malcolm Gladwell as much as they might seem to be - Mr Gladwell does not (as you can see from the quote above) say that social media are ineffective or useless, just that they are only good for certain things (and potentially VERY good at those).

What blurs this further is that the people who follow your tweets or are your facebook friends do quite likely include those who most passionately and unconditionally wish you or your cause success. But they are probably right up the top of the Zipf curve, greatly outnumbered by people to whom you are not massively important (at least not yet). Zipf curves seem very common in volunteer- or activist-powered areas: you get a Zipf curve when a small number of people contribute a lot, a larger number contribute some, and most contribute hardly at all - here's an earlier Usability Notes post on the zipf curve and user-generated sites.

There are many cases where getting many to do a little is more effective than getting few to do a lot. As a humble example, take the Oxford Oxfam Group street collection for the victims of the January 2010 earhtquake in Haiti (I'm picking this example because I'm currently the Chair of the Oxford Oxfam Group). Facebook was notably successful as one means of spreading the word that we needed collectors. Standing on a Oxford street corner for some hours on a winter Saturday strikes me as a reasonable-sized ask: it's cold and boring, though of course well worthwhile for the money collected. But, let's face it, it's most unlikely to risk intimidation, abuse or violence as Civil Rights campaigners did.

October 09, 2009

Flurry provide a usage statistics service for smartphone applications, collecting data when the application is downloaded or used. As a result they have interesting data on the state of the smartphone market.

The data probably underestimate the situation, as far from all eBooks will have Flurry embedded in them.

In another analysis "Mobile apps: Models, Money and Loyalty" Flurry have also looked at how frequently apps are used, and how likely users are to return to them after 90 days. That suggests that books are used intensely (maybe 10 times a week), but are not used much after 90 days. The customer has finished the book by then, presumably.

Flurry Analytics places a lightweight agent into an application, so
that performance data are tracked, logged and reported back for
analysis. This information is confidential and available only to the
developer to analyze in aggregate. Individual user data is not
identifiable. Developers are provided a wealth of metrics around usage
behavior, any custom event they choose to track and technical
information about the device, firmware version, carrier and more.

October 08, 2009

I've finally found some demographic data for iphone and ipod touch users. The data come from a survey done by Admob and comScore, as published on the AdMob blog and reviewed by BusinessWire (which has figures not included in the blog post). The survey was in the first half of 2009 (presumably in the US, though this is not stated). I was after age data, which gives me this chart:

iPod touch ownership was highest among the 13-17 years age group (46%) falling sharply after age 25 (23% of ipod touch users were 18-24, 12% were 25-34 and 12% 35-49. After that it's single figures). iPhone owners are older - only 6% were 13-17; 20% were 18-24; 27% were 25-34; 31% were 35-49. This is pretty much what one would expect: iPod touches are cheaper and can be bought for a one-off payment (no ongoing mobile phone bill). So they are a possible (generous) Christmas or birthday present. In the UK at least, you still have to sign up for some hefty costs to own an iphone. So probably another case of "you can tell the men from the boys by the cost of their toys" as a friend of mine used to say. Speaking of men and boys, >70% of the owners (of both iphone and ipod touch) were male.

These data may go some way to explaining an interesting observation from "Just Another iPhone Blog" - "Do Crap Apps have legs?" The author comments on the curious popularity of "Crap Apps" - (applications that are trivial and/or puerile).

October 07, 2009

The Rocky Mountain News was Colorado’s oldest newspaper, founded in 1859. It published its last edition in February 2009. John Temple, the last Editor, has a fascinating article on what he thinks went wrong and what lessons publishers can learn from the paper's demise, especially in Rocky Mountain News' attempts to move online. In a nutshell, he thinks that the web operation suffered from being though of as something that had to serve and make revenue for the old "core" (print) product, and which got saddled by practices, rules and mindsets of the print publication. Fascinating and sometimes painful reading for print publishers trying to manage an online product as well.