Data-themed articles, essays, and studies

In Indices We Trust

Search engines serve primarily to quickly direct us to information after we entering a few words into our phone or browser. Business and research systems would be useless without a similar facility to convert queries or selections into a numbers or plots. By bulk, data system indices can be as large – or even larger – than than the data content itself.

These indices certainly do their job – they’re so good that we’re often directed down to a single piece of information, without even asking.

On the other hand: I have two books in front of me – one a work of fiction, one a work of history. The first has no index at all. The second has a (rather good) 17-page index out of about 700 pages – about 2.5% by bulk, with a granularity running from one page – about 250 words, to several pages – around 1,000 words.

Now, if we delivered a data system without an index, or one that forced users to spend a minute reading a page to find what they wanted, our user community would immediately rise up in unhappy protest. Indices, granular and fast, are expected.

But are indices, granular and fast, always sensible? Well, rather like search engines, indices are here to stay. Still, no one would index a typical story or newspaper article – we read it from the beginning to the end (for the story) or until we’ve learned enough (for the article). With many reference works, necessary context only comes from looking at several pages, at least. Granular indexing actually has a cost, and that cost is loss of context.

And so, an irony: what makes information systems so good can also make analytics more difficult: the standalone numbers, phrases, and disassocated bits of information naturally delivered by granular indexing can encourage us to see individual pieces, rather than the relevant whole.

Indices are here to stay – what we’re still learning to do really well is to supply relevant context where it’s needed. We’re pretty good at delivering parts and less good at delivering relevant wholes.

Supplying documentation in proximity to numbers (rather than as a dissociated product) is a great start – for people rarely read separate documentation, especially when – ah-hem – when we have to read pages of text to find something. “Advanced” analytics also has a role to play in discerning context, and acts in some cases to restore the context fractured by our delivery of information in small pieces.

It’s fantastic when these tools can deliver a quick view to tell us what a number really means – the value is typical, or reliable, or unusual. That said, context isn’t always a matter of getting a quick view – sometimes what we need is a directive to look more deeply, or to examine the assumptions on which our answers rest. To a great extent, those burdens are often placed on our users, who can understandably assume that that what they see is all there is to get. Usually, it isn’t.