Category: Anything you want

And can you imagine showing this to your head of research, I mean, your HEAD OF RESEARCH, and saying “I wanna go to this… I REALLY wanna go to this…”. They’ll probably look at you and say:

“Kid, we don’t like your kind, and we’re gonna send your fingerprints off to Washington.”

And friends, somewhere in Washington enshrined in some little folder, is a study in black and white of my fingerprints. And the only reason I’m singing you this song now is cause you may know somebody in a similar situation, or you may be in a similar situation, and if your in a situation like that there’s only one thing you can do and that’s walk into the shrink wherever you are, just walk in say “Shrink, You can get anything you want, at Alice’s restaurant.”. And walk out. You know, if one person, just one person does it they may think he’s really sick and they won’t take him. And if two people, two people do it, in harmony, they may think they’re both faggots and they won’t take either of them.

And three people do it, three, can you imagine, three people walking in singin a bar of Alice’s Restaurant and walking out. They may think it’s an organization. And can you, can you imagine fifty people a day, I said fifty people a day walking in singin a bar of Alice’s Restaurant and walking out. And friends they may thinks it’s a movement.

PS so OU folks, when we gonna put together movies like this to advertise each and every course on the courses and quals site?! DMPB, would that fall under your remit? Or would Ian be fighting you for it?;-) heh heh

One of my favourite quotes (and one I probably misquote – which is a pre-requisite of the best quotes) is William Gibson’s “the future is already here, it’s just not evenly distributed yet”…

Several times tonight, I realised that the future is increasingly happening around me, and it’s appearing so quickly I’m having problems even imagining what might come next.

So here for you delectation are some of the things I saw earlier this evening:

SnapTell: a mobile and iPhone app that lets you photograph a book, CD or game cover and it’ll recognise it, tell you what it is and take you to the appropriate Amazon page so you can buy it… (originally via CogDogBlog;

Shazam, a music recognition application that will identify a piece of music that’s playing out loud, pop up some details, and then let you buy it on iTunes or view a version of the song being played on Youtube (the CogDog also mentioned this, but it was arrived at tonight independently);

So just imagine the “workflow” here: you hear a song playing, fire up the Shazam app, it recognises the song, then you can watch someone play a version of the song (maybe even the same version on Youtube.

A picture of a thousand words?: if you upload a scanned document onto the web as a PDF document, Google will now have a go at running an OCR service over the document, extracting the text, indexing it and making it searchable. Which means you can just scan and post, flag the content to the Googlebot via a sitemap, and then search into the OCR’d content; (I’m not sure if the OCR service is built on top of the Tesseract OCR code?)

barely three months ago, Youtube added the ability to augment videos with captions. With a little bit of glue, the Google translate service will take those captions and translate them into another language for you (Auto Translate Now Available For Videos With Captions):

“To get a translation for your preferred language, move the mouse over the bottom-right arrow, and then over the small triangle next to the CC (or subtitle) icon, to see the captions menu. Click on the “Translate…” button and then you will be given a choice of many different languages.” [Youtube blog]

Another (mis)quote, this time from Arthur C. Clarke: “any sufficiently advanced technology is indistinguishable from magic”. And by magic, I guess one thing we mean is that there is no “obvious” causal relationship between the casting of a spell and the effect? And a second thing is that if we believe something to be possible, then it probably is possible.

PPS I guess I should have listed this in the list above – news that Google has (at least in the US) found a way of opening up its book search data: Google pays small change to open every book in the world. Here’s the blog announcement: New chapter for Google Book Search: “With this agreement, in-copyright, out-of-print books will now be available for readers in the U.S. to search, preview and buy online — something that was simply unavailable to date. Most of these books are difficult, if not impossible, to find.”

But today I saw something that brought home to me the consequences of aggregating millions of tiny individual actions, in this case photo uploads to the flickr social photo site.

Form my reading of the post, the purple overlays in the images above – not the blue bounding boxes – are generated automatically by clustering geotagged and placename tagged images and extrapolating a well contoured shape around them.

That is, from the photos tagged “London” [that is, photos that are tagged with London in Yahoo’s WOE service], the algorithm creates the purple “London city” overlay in the above diagram.

For each an every photo upload, there is maybe a tiny personal consequence. For millions of photo uploads, there are consequences like this… (From millions of personal votes cast, there’s the possible consequence of change…) [Update: apparently, flickr received its 3 billionth upload at the start of November…]

And it struck me that even the relatively unsophisticated form of signals intelligence that is traffic analysis was capable of changing the face of war. So what are the consequences of traffic analysis at this scale?

What are the possible consequences? What are we walking into?

(Of course, following a brief moment of “I want to stop contributing to this; I’m gonna kill my computer and go and grow onions somewhere”, I then started wondering: “hmm, maybe if we also mine the info about what camera took each photo, and looked up the price of that camera, we might be able to generate socio-economic overlays over different neighbourhoods, and then… arrghh… stop, no, evil, evil…;-)

So to add to the mix, here’s a couple more things that the web made easy this week. Firstly, the Google Visualisation API was extended so that it could consume data in a simple format from your own data sources. That is, if you allow your own database to output data in a simple tabular structure, the Google visualisation API makes it trivial to generate charts and graphs from that data. Secondly, Google added RSS feed support to their Google alerts service. This makes it easy to subscribe to an RSS feed that will alert you to new results on Google for a particular search. What really surprised me was how, after setting up a couple of alerts, they appeared without me doing anything (or maybe that should be – without me changing something to say “no”?) in my Google Reader account.

Small components is one thing.

Small components loosely coupled is another – and one where many of us see value.

Small components automatically wired together is yet another thing – and one that is increasingly going to happen. A consequence I hadn’t anticipated of setting up a Google RSS alert was that the feed appeared automatically in my feed reader.

Yesterday, an unanticipated consequence of me adding my blog URL to my Google Profile page was that several other URLs I control were automatically suggested to me as things I might want to add to my profile.

Whenever I go into Facebook, the platform suggests a list of people I might know to me, whom I might want to “friend”.

Now this recommendation may be because we share a large number of friends, or it might be that I’ve appeared in the same photograph as some of these people… How would Facebook know? Maybe Mircosoft, their search provider, told them: Why “People” Tags? describes how the beta version of Microsoft Live Photo gallery automatically identifies faces in photos and then prompts you to tag them with people’s names… Google already does this, of course, in Picasa, with its “name tags“.

And finally…a chance clickthru from someone on the Copac developments blog, which lists OUseful.info in the blogroll, alerted me through my blog stats to this post on Spooky Personalisation (should we be afraid?) which discusses the extent to which “adaptive personalisation” may appear “spooky” to the user.

(A serendipitous link discovery for me? Surely… Spooky? Maybe!;-)

And that maybe is going to be an ever more apparent unanticipated consequence of the way in which it’s getting so much easier to glue apps together? Spookiness…

It’s strange to think that the web search industry is only 15 years or so old, and in that time the race has been run on indexing and serving up results for web pages, images, videos, blogs, and so on. The current race is focused on chasing the mobile (local) searcher, making use of location awareness to serve up ads that are sensitive to spatial context, but maybe it’s data that is next?

(Maybe I need to write a “fear post” about how we’re waking into a world with browsers that knowwherewe are, rather than “just” GPS enabled devices and mobile phone cell triangulation? ;-) [And, err, it seems Microsoft are getting in there too: Windows 7 knows where you are – “So just what is it that Microsoft is doing in Windows 7? Well, at a low level, Microsoft has a new application programming interface (API) for sensors and a second API for location. It uses any of a number of things to actually get the location, depending on what’s available. Obviously there’s GPS, but it also supports Wi-Fi and cellular triangulation. At a minimum.”]

So… data. Take for example this service on the Microsoft Research site: Data Depot. To me, this looks a site that will store and visualiise your telemetry data, or more informally collected data (you can tweet in data points, for example):

Want to ‘datablog’ your running miles or your commute times or your grocery spending? DataDepot provides a simple way to track any type of data over time. You can add data via the web or your phone, then annotate, view, analyze, and add related content to your data.

Services like Trendrr have also got the machinery in place to take daily “samples” and produce trend lines over time from automatically collected data. For example, here are some of the data sources they can already access:

Weather details – High and the low temperatures on weather.com for a specific zipcode.

Amazon Sales Rank – Sales rank on amazon.com

Monster Job Listings – Number of job results from Monster.com for the given query in a specific city.

At the moment, the API will let you pull datatable formatted data from your database into the Google namespace. But suppose the next step is for the API to make a call on your database using a query you have handcrafted; then add in some fear that Google has already sussed out how to Crawl through HTML forms by parsing a form and then automatically generating and posting queries using those forms to find more links from deep within a website, and you can see how giving the Google API a single query on your database would tell them some “useful info” (?!;-) about your database schema – info they could use to scrape and index a little more data out of your database…

Now of course the Viz API service may never extend that far, and I’m sure Google’s T&C’s would guarantee “good Internet citizenry practices”, but the potential for evil will be there…

And finally, it’s probably also worth mentioning that even if we don’t give the Goog the keys to our databases, plenty of us are in the habit of feeding public data stores anyway. For example, there are several sites built specifically around visualising user submitted data, (if you make it public…): Many Eyes and Swivel, for example. And then of course, there’s also Google Spreadsheets, DabbleDB, Zoho sheet etc etc.

One of the foundational principles of the Web 2.0 philosophy that Tim O’Reilly stresses relates to “self-improving” systems that get better as more and more people use them. I try to keep a watchful eye out for business books on this subject – books about companies who know that data is their business; books like the somehow unsatisfying Competing on Analytics, and a new one I’m looking forward to reading: Data Driven: Profiting from Your Most Important Business Asset (if you’d like to buy it for me… OUseful.info wishlist;-).

For those of you who don’t know of Tesco, it’s the UK’s dominant supermarket chain, taking a huge percentage of the UK’s daily retail spend, and is now one of those companies that’s so large it can’t help but be evil. (They track their millions of “users” as aggressively as Google tracks theirs.) Whenever you hand over your Tesco Clubcard alongside a purchase, you get “points for pounds” back. Every 3 months (I think?), a personalised mailing comes with vouchers that convert points accumulated over that period into “cash”. (The vouchers are in nice round sums – £1, £2.50 and so on. Unconverted points are carried over to the convertable balance in next mailing.) The mailing also comes with money off vouchers for things you appear to have stopped purchasing, rewards on product categories you frequently buy from, or vouchers trying to entice you to buy things you might not be in the habit of buying regularly (but which Tesco suspects you might desire!;-)

Anyway, that’s as maybe – this is supposed to be a brief summary of corner-turned pages I marked whilst on holiday. The book reads a bit like a corporate briefing book, repetitive in parts, continually talking up the Tesco business, and so on, but it tells a good story and contains more than a few a gems. So here for me were some of the highlights…

First of all, the “Clubcard customer contract”: more data means better segmentation, means more targeted/personalised services, means better profiling. In short, “the more you shop with us, the more benefit you will accrue” (p68).

This is at the heart of it all – just like Google wants to understand it’s users better so that it can serve them with more relevant ads (better segmentation * higher likelihood of clickthru = more cash from the Google money machine), and Amazon seduces you with personal recommendations of things it thinks you might like to buy based on your purchase and browsing history, and the purchase history of other users like you, so Tesco Clubcard works in much the same way: it feeds a recommendation engine that mines and segments data from millions of people like you, in order to keep you engaged.

Scale matters. In 1995, when Tesco Clubcard launched, dunhumby, the company that has managed the Clubcard from when it was still an idea to the present day, had to make do with the data processing capabilities that were available then, which meant that it was impossible to track every purchase, in every basket, from every shopper. (In addition, not everything could be tracked by the POS tills of the time – only “the customer ID, the total basket size and time the customer visited, and the amount spent in each department” (p102)). In the early days, this meant data had to be sampled before analysis, with insight from a statistically significant analysis of 10% of the shopping records being applied to the remaining 90%. Today, they can track everything.

Working out what to track – first order “instantaneous” data (what did you buy on a particular trip, what time of day was the visit) or second order data (what did you buy this time you didn’t buy last time, how long has it been between visits) – was a major concern, as were indicators that could be used as KPIs in the extent to which Clubcard influenced customer loyalty.

Now I’m not sure to what extent you could map website analytics onto “store analytics”, but some of the loyalty measures seem familiar to me. Take, for example, the RFV analysis (pp95-6) :

Recency – time between visits;

Frequency – “how often you shop”

Value – how profitable is the customer to the store (if you only buy low margin goods, you aren’t necessarily very profitable), and how valuable is the store to the customer (do you buy your whole food shop there, or only a part of it?).

Working out what data to analyse also had to fit in with the business goals – the analytics needed to be actionable (are you listening, Library folks?!;-). For example, as well as marketing to individuals, Clubcard data was to be used to optimise store inventory (p124). “The dream was to ensure that the entire product range on sale at each store accurately represented, in selection and proportion, what the customers who shopped there wanted to buy.” So another question that needed to be asked was how should data be presented “so that it answered a real business problem? If the data was ‘interesting’, that didn’t cut it. But adding more sales by doing something new – that did.” (p102). Here, the technique of putting data into “bins” meant that it could be aggregated and analysed more efficiently in bulk and without loss of insight.

Returning to the customer focus, Tesco complemented the RFV analysis with the idea of “Loyalty Cube” within which each customer could be placed (pp126-9).

Contribution: that is, contribution to the bottom line, the current profitability of the customer;

Commitment: future value – “how likely that customer is to remain a customer”, plus “headroom”, the “potential for the customer to be more valuable in the future”. If you buy all your groceries in Tesco, but not your health and beauty products, there’s headroom there;

Championing: brand ambassadors; you may be low contribution, low commitment, but if you refer high value friends and family to Tesco, Tesco will like you:-)

By placing individuals in separate areas of this chart, you can tune your marketing to them, either by marketing items that fall squrely within that area, or if you’re feeling particularly aggressive, by trying to move them from through the differnt areas. As ever, it’s contextual relevancy that’s the key.

But what sort of data is required to locate a customer within the loyalty cube? “The conclusion was that the difference between customers existed in each shopper’s trolley: the choices, the brqnds, the preferences, the priorities and the trade-offs in managing a grocery budget.” (p129).

The shopping basket could tel a lot about two dimensions of the loyalty cube. Firstly, it could quantify contribution, simply by looking at the profit margins on the goods each customer chose. Second, by assessing the calories in a shopping basket, it could measure the headroom dimension. Just how much of a customer’s food needs does Tesco provide?

(Do you ever feel like you’re being watched…?;-)

“Products describe People” (p131): one way of categorising shoppers is to cluster them according to the things they buy, and identify relationships between the products that people buy (people who buy this, also tend to buy that). But the same product may have a different value to different people. (Thinking about this in terms of the OU Course Profiles app, I guess it’s like clustering people based on the similar courses they have chosen. And even there, different values apply. For example, I might dip into the OU web services course (T320) out of general interest, you might take it because it’s a key part of your professional development, and required for your next promotion).

Clustering based on every product line (or SKU – stock keeping unit) is too highly dimensional to be interesting, so enter “The Bucket” (p132): “any significant combination of products that appeared from the make up of a customer’s regular shopping baskets. Each Bucket was defined initially by a ‘marker’, a high volume product that had a particular attribute. It might typify indulgence, or thrift, or indicate the tendency to buy in bulk. … [B]y picking clusters of products that might be bought for a shared reason, or from a shared taste” the large number of Buckets required for the marker approach could be reduced to just 80 Buckets using the clustered products approach. “Every time a key item [an item in one of the clusters that identifes a Bucket] was scanned [at the till],it would link that Clubcard member with an appropriate Bucket. The combination of which shoppers bought from which Buckets, and how many items in those Buckets they bought, gave the first insight into their shopping preferences” (p133).

By applying cluster analysis to the Buckets (i.e. trying to see which Buckets go together) the next step was to identify user lifestyles (p134-5). 27 of them… Things like “Loyal Low Spenders”, “Can’t Stay Aways”, “Weekly Shoppers”, “Snacking and Lunch Box” and “High Spending Superstore Families”.

Identifying people from the products they buy and clustering on that basis is one way of working. But how about defining products in terms of attributes, and then profiling people based on those attributes?

Take each product, and attach to it a series of appropriate attributes, describing what that product implicitly represented to Tesco customers. Then buy scoring those attributes for each customer based on their shopping behaviour, and building those scores into an aggregate measurement per individual, a series of clusters should appear that would create entirely new segments. (p139)

In the end, 20 attributes were chosen for each product (p142). Clustering people based on the attributes of the products they buy produces segments defined by their Shopping Habits. For these segments to be at their most useful, each customer should slot neatly into a single segment, each segment needs to be large enough to be viable for it to be acted on, as well as being distinctive and meaningful. Single person segments are too small to be exploited cost effectively (pp148-9).

Here a few more insights that I vaguely seem to remember from the book, that you may or may not think are creepy and/or want to drop into conversation down the pub:-)

calorie count – on the food side, calorie sellers are the competition. We all need so many calories a day to live. If you do a calorie count on the goods in someone’s shopping basket, and you have an idea of the size of the household, you can find out whether someone is shopping elsewhere (you’re not buying enough calories to keep everyone fed) and maybe guess when a copmetitor has stolen some of your business or when someone has left home. (If lots of shoppers from a store stop buying pizza, maybe a new pizza delivery service has started up. If a particular family’s basket takes a 15% drop in calories, maybe someone has left home)?

life stage analysis – if you know the age, you can have a crack at the life stage. Pensioners probably don’t want to buy kids’ breakfast cereal, or nappies. This is about as crude as useful segmentation gets – but it’s easy to do…

Beer and nappies go together – young bloke has a baby, has to go shopping for the first time in his life, gets the nappies, sees the beers, knows he won’t be going anywhere for the next few months, and gets the tinnies in… (I think that was from this book!;-)

A little while ago, I posted some notes I’d made whilst reading “Scoring Points”, which looked at the way Tesco developed it’s ClubCard business and started using consumer data to improve a whole range of operational and marketing functions within the tesco operation (The Tesco Data Business (Notes on “Scoring Points”)). For anyone who’s interested, here are a few more things I managed to dig up Tesco’s data play, and their relationship with Dunnhumby, who operate the service.

[UPDATE – most of the images were removed from this post because I got a take down notice from Dunnhumby’s lawyers in the US…]

Firstly, here’s a couple of snippets from a presentation by Giles Pavey, Head of Analysis at dunnhumby, presented earlier this year. The first thing to grab me was this slide summarisign how to turn data into insight, and then $$$s (the desired result of changing customer behaviour from less, to more profitable!):

In the previous post, I mentioned how Tesco segment shoppers according to their “lifestyle profile”. This is generated by looking at the data generated by a shopper, in terms of what they buy, when they buy it, what stories you can tell about them as a result.

So how well does Tesco know you, for example?

(I assume Tesco knows Miss Jones drives to Tesco on a Saturday because she uses her Clubcard when topping up on fuel at the Tesco petrol station…).

Clustering shopped for items in an appropriate way lets Tesco identify the “Lifestyle DNA” of each shopper:

(If you self-categorise according to those meaningful sounding lifestyle categories, I wonder how well it would match the profile Tesco has allocated to you?!)

It’s quite interesting to see what other players in the area think is important, too. One way of doing this is to have a look around at who else is speaking at the trade events Giles Pavey turns up at. For example, earlier this year was a day of impressive looking talks at The Business Applications of Marketing Analytics.

Not sure what “Marketing Analytics” are? Maybe you need to become a Master of Marketing Analysis to find out?! Here’s what appears to be involved:

So what are “geodemographics: (or “geodems”, as they’re known in the trade;-)? No idea – but I’m guessing it’s the demographics of a particular locales?

Here’s one of the reasons why Tesco are interested, anyway:

An finally (for now at least…) it seems that Tesco and dunnhumby may be looking for additional ways of using Clubcard data, in particular for targeted advertising:

Tesco is working with Dunnhumby, the marketing group behind Tesco Clubcard, to integrate highly targeted third-party advertising across Tesco.com when the company’s new-look site launches next year.
Jean-Pierre Van Lin, head of markets at Dunnhumby, explained to NMA that, once a Clubcard holder had logged in to the website, data from their previous spending could be used to select advertising of specific relevance to that user.
[Ref: Tesco.com to use Clubcard data to target third-party advertising (thanks, Ben:-)]

Now I’m guessing that this will represent a change in the way the data has been used to date – so I wonder, have Tesco ClubCard Terms and Conditions changed recently?

Looking at the global reach of dunnhumby, I wonder whether they’re building capacity for a global targeted ad service, via the back door?

Does it matter, anyway, if profiling data from our offline shopping habits are reconciled with our online presence?

In “Diving for Data”, (Supermarket News, 00395803, 9/26/2005, Vol. 53, Issue 39), Lucia Moses reports that the Tesco Clucbcard in the UK “boasts 10 million households and captures 85% of weekly store sales”, along with 30% of UK food sales. The story in the US could soon be similar, where dunnhumby works with Kroger to analyse “6.5 million top shopper households”, (identified as the “slice of the total 42 million households that visit Kroger stores that drive more than 50% of sales”). With “Kroger claim[ing] that 40% of U.S. households hold one of its cards”, does dunnhumby’s “goal … to understand the customer better than anyone” rival Google in its potential for evil?!

I’ve been a fan of the potential of augmented reality for some time (see Introducing Augmented Reality – Blending Real and Digital Worlds for some examples why…) but there have so far always been a couple of major stumbling blocks in the way of actually playing with this stuff. One has been the need to download and install the AR application itself; the other has been to get a hard copy, or print out, of the registration images that are used as the base for the digital overlay.

So when I saw this demo of a browser based Flash Augmented Reality application (via TechCrunch), I realised that the application installation barrier could soon be about to crumble… (though there is still potentially a compute power issue – the image registration and tracking is computationally expensive, which means the Flash app is not yet as reliable as a compiled, downloaded application).

The issue of having to print out the registration image still remains, however.

[Cue sideways glance to camera, and TV presenter mode;-)] Or does it?

Because it struck me that I have a portable, programmable image service to hand – my iPod touch. So maybe I could just display the registration image on that, and show it to my laptop…

(A copy of the registration image is at http://is.gd/9ABh if you want to give it a go. The application code itself can be found at FLARToolkit.)

It also strikes me that maybe training the AR package on an image shown in an actual iPhone would be another way to go – making use of the iPhone/iPod Touch itself to help frame the image? (My iPod touch has a well defined black border around the edge of the screen after all…)