Thursday, June 7, 2007

Data literacy is the new R

Catching up on some old Economist issues, I came across an article "Of bytes and briefs" from the May 19th issue. The article was about how electronic communications have raised new questions regarding information discovery in the legal system (such as what must be turned over in a request to produce documents) . The time required to comb e-data for proper disclosure is apparently becoming onerous (read, extremely expensive). The article also cites judges' need for better education about data, so they can better rule on proper discovery practices.

Yet another example of how lack of data illiteracy becomes a societal problem. About a year ago, the "CSI effect" got a lot of press, with concerns that CSI led jurors to expect to much by way of evidence (though the jury is still out on whether CSI is the fundamental culprit). Much as I love CSI, I cringe whenever they show software packages with zooming and search capabilities beyond what is technically feasible. Privacy is of course another big issue here, with people not understanding data provenance and the power and risk of mining algorithms.

Computing has long felt like a "new R" that should join reading, 'riting, and 'rithmetic as fundamental components of basic education. "Computing" is a broad term, though, and mentions of "computer literacy" raise the neck hairs of many a computer scientist given its association with being able to use office productivity software. Useful skills, but not ones meeting the usual requirement for university-level credit.

That led several to propose that 'rogramming was the appropriate "new R": as computation (rather than computers) became fundamental to so many fields (witness biology), understanding what could and could not be computed and automated grew increasingly important. I have a lot of sympathy for this view, and strongly believe that a basic education in computation is essential for anyone working in science, digital media, or other fields whose practice is touched by computers. But I'd never push my parents to learn to program (and hope my sister has finally forgiven me for convincing her to take a CS course her freshman year).

Data literacy, on the other hand, is a much better candidate. It touches everyone who uses modern societal infrastructure. It can be motivated in the concrete (via privacy) and tied to everyday human experience (ie, for CSI viewers). It's timely, as the verb "to google" comes up in casual conversation outside of tech circles. It has substance beyond the more vocational feel of how to use office software. And, unlike programming, it doesn't require hours of practice in building artifacts (which, much as I enjoy it, admittedly turns off many people).

_This_ is the required computing-related material for the masses. Universities should develop and offer it; eventually it should migrate down to pre-college. What would it take to give non-techies a basic education in data mining, privacy, data provenance, search, and information lifespan? Many of us could design a substantive course on this stuff that used programming. How to do it without that, while getting to the level of understanding that programming would enable, is a fascinating challenge.

What's the R then? 'rovenance is the best I have so far. 'rivacy is both too narrow and too broad. Taking a european twist, 'rmatics gets the gist, but lacks verbal flow. Suggestions, either for that or for other topics that should make a data literacy 'rriculum list?