XARK 3.0

Xark began as a group blog in June 2005 but continues today as founder Dan Conover's primary blog-home. Posts by longtime Xark authors Janet Edens and John Sloop may also appear alongside Dan's here from time to time, depending on whatever.

This isn't some "customize this page" button that lets you fiddle with fonts and colors. Instead, Google News' Dutch beta is the first little step toward a media future that is fast approaching all of us.

Here's the revolutionary concept: Rather than ask you for your preferences, this new class of tools figures out your preferences without your conscious input. This is going to be a Future Shock moment for many of us, because our unconscious preferences are likely to be far more accurate in delivering us the stuff we really want.

All right -- there might be a fair chance that Google might know what I
want better than I know myself, in my chaotic, incoherent way of
searching for news. But if that's the case, do I want to know they know
me so well?

Allow me to rephrase: Yes, we want to know that they know us so well. The real questions are, What control do we want to exercise over that ability? What protections do we require from unauthorized "knowing?" How can we use these powerful and beneficial 21st century tools without being victimized by them?

Mega-media and Discovery InformaticsOne
of the reasons that media is in such chaotic disarray in the early 21st
century stems directly from our inability to grasp how rapidly the flow
of information increased in just the past 12 years. In the early 1990s,
researchers struggled to find the data they needed. Today, researchers struggle to filter out the glut of unneeded data. It's as if we moved from the Sahara to the Amazon but forgot to update our wardrobe.

Consequently, debates about Old Media versus New Media are rife with
romantic notions about the ability of educated, alert media consumers
to separate good information from bad. Don't bet on it. The volume of
news-media content and blogosphere commentary already exceeds the
capability of unassisted human intelligence, and news is just one slice
of the larger information pie. When I first wrote about this subject in
February 2005, our global civilization was producing 20 terabytes of
information -- the equivalent of all the books in our Library of
Congress -- every single day. Chew on that for a minute -- and
know that in that minute, more information than you will ever read in
your life was just created and communicated.

Putting that information to work for humanity requires tools that
scale to the size of the problem. And that's what this new Google
project represents: the application of non-human intelligence to a
human environment that is evolving at a super-human pace. But just as
human tools created this chaotic information glut, so too can human
tools bring order to it.

The science behind these new tools is so new it didn't even have a
name (Discovery Informatics) until 2003. To understand its basic
concepts, set aside your ideas about computers as number-crunching
tools that help us answer questions. Discovery Informatics applies
non-human intelligence to findbetter questions.
Scientists and cops were the first to require such pattern-seeking
tools, but it was already obvious in early 2005 that it was only a
matter of time before the rest of society would need them, too. As I
wrote back then:

The ability to find digital
needles in data haystacks is a nifty
trick for a variety of government interests: intelligence,
counterintelligence, law enforcement, etc. (Jim Young, director of the
Discovery Informatics program at the College of Charleston) points out
that the
National Security Agency, which specializes in eavesdropping on
international communications, has become one of the best job markets
for new statisticians.

And guess who else is snapping up graduates these days? Google.

From today's perspective, the search-engine giant's interest in DI
graduates looks obvious. First Google brought us the world. Now Google
will attempt to make sense of it for us.

What Google saysHere's Google's Dutch introduction page. I don't speak Dutch, and the robotic translation is predictably clunky and constrained by Internet-adapted language (does "bladwijzers" mean "booklet indicators" or "bookmarks"? Does "labels en opmerkingen" mean "labels and observations," or is this the Dutch way of saying "tags and categorizations?"), but what follows is my translation of the robotic translation, warts and all:

*We provide the search results that are most relevant for you.
Google personally arranges search results on the basis of your previous searches. In the beginning you will not
see much difference, but the more you use Google, the more your search results will improve.

*
Your searches will be managed to reflect the web pages,
images, headlines and Froogle-results on which you have clicked in
previous searches. You can remove items at each desired moment from
your search history.

* You can make online bookmarks for your favorite Internet sites and
share tags and observations that you can use everywhere. You can search
later in your tags and notes, and
access these bookmarks from any computer by logging on.

What it meansThis
may strike some of us as weird, but we already have machines that
learn. Consider the US Postal Service's problem: How do you teach a
computer to recognize the handwritten number 9? Answer: Show the
computer a few million examples. Postal Service software has been doing
this now for years, and the result is illuminating: its handwriting
recognition system now exceeds human abilities to decipher messy scrawl.

What Google is proposing in Holland sounds an awful lot like
something that I discussed with Jim Young -- the Discovery Informatics
visionary at our local college -- over coffee late last year. I was
doing recon for my current assignment -- helping my executive editor
chart the future of online news -- and Jim was there representing his unique program.

I wanted to build high-powered information tools for Charleston.net
users, but Jim insisted that I grasp the larger concept: Why not build
tools that anticipated a user's needs by searching for deep
patterns in an individual's site usage? The technology is available, he
said, and the algorithims that could be adapted to such a project are
already performing Non-Obvious Relationship Analysis for the government or rearranging grocery displays at your local Harris Teeter.

We brainstormed several products and features, including an "on-off"
switch and a detailed history editor for protecting private information
-- an absolutely essential control. There are all sorts of benefits to
having a computer that anticipates my needs, but without human feedback
on what it learns about me, there's a likelihood that it will someday
behave like an overly precocious child. I want a computer that finds the best discussions of the latest Paul Graham essay for me, but I most definitely don't want a chipper cyber-agent that blurts out things like "I've found some great sub-continent group-sex porn you'll really like!" while my kids are sitting in the room.

If I've translated the Dutch properly, the Google News personal
agent beta sounds an awful lot like the dream prototype that Jim and I
sketched out that day at Kudu Coffee,
but with an added feature: the ability to store results, preferences,
favorites and tags in an online account that would be accessible via
your unique log-in. Also, rather than anticipating all kinds of user
needs, this product seems to focus on re-sorting the results of news
searches to better target an individual's demonstrated interests. We're
not witnessing the birth of Skynet here.

Yet we shouldn't mistake the historic significance of this moment.
In the same way that cars expanded our ability to travel and Lotus 123
extended our ability to manage business information and e-mail expanded
our ability to communicate and blogs expanded our ability to publish,
so too will these tools expand our ability find the things that we
want: stories, products, people, relationships, pets -- you name it.
Within 10 years, you'll have these cyber agents performing all sorts of
tasks on your behalf, and it's a safe bet that many of them will have
"Google" in their name.

Will it be a good trade? Well, that depends on us. Can we avoid
passionate arguments about romantic media ideas that have been obsolete
for five years? Can we address the conundrums of electronic privacy in
ways that account for technology? Can we write laws that allow
individuals (and groups) to make the most of the information resources
at hand while protecting our civil rights in the process?

Who knows? But the discussion is no longer just abstract talk in a quiet coffee house.

AUTHOR'S POSTSCRIPT: Here's my column, Digging for Truth in the Data Age, which was published on Aug. 30, 2004, in The Post and Courier. The column is not available online.

It's time to write a fond epitaph for the Information Age. Like it or not, we've entered the Data Age, the era in which we recognize that a glut of information doesn't make us smart, just like buying a dictionary doesn't make us Shakespeare.

Scientists Fred Holland and Paul Sandifer use the term in their work at
the Hollings Marine Lab on James Island. They live in a world saturated
by data — more pieces of information than the logical human mind could ever order, arrange or imagine.

Want to understand what is happening to the Lowcountry shrimp harvest?
Have at it. Thanks to modern technology, we have easy access to
everything from satellite images to historical weather logs to
digitized shrimp gene sequences.

That's what the Information Age was supposed to do: give us the scattered puzzle pieces that fit together to form The Big Picture.

But here's a more appropriate analogy: Information Age
technologies have proven instead to be wildly efficient at burying us
in the pieces from millions of jigsaw puzzles, all mixed up and
practically indistinguishable.

This data
surplus is most obvious in the world of science. Holland, the director
of the Hollings Marine Lab, frames things this way: "So we have all
this data. The challenge is, how do we add value to it and make sense of it?"

A dramatic example of this process comes from University of South
Carolina physics professor Dave Tedeschi. In 2003, Tedeschi and a group
of colleagues announced that they had found evidence confirming the
discovery of an exotic new subatomic particle — not in a lab somewhere,
but hiding in old data they just happened to have lying around.

It's not like the physicists were slack the first time around: The data from their particle accelerator experiments is measured in terabytes, a million million binary bits of computer information. You need a machine to recognize a pattern against that much background noise — unless you're very, very intuitive.

And at least the scientists are professionally equipped to deal with the challenges of the Data Age. The rest of us are struggling.

Example: One explanation for the increasingly harsh tone this election
year is the accelerating fragmentation of political media, a potential
blessing but an enormous test of society's ability to process
conflicting data.
Hate President Bush? Google can provide in seconds any number of Web
sites that will provide you with facts to support that feeling. Hate
anybody who criticizes Bush? Ditto. Just turn on the radio.

Without functional institutions equipped to integrate the complex data of 21st-century life, citizens typically wind up just picking sides. Raw data becomes a cultural Rorschach test, and what we see is generally what we expected to find in the first place.

So we're not just disagreeing — we're speaking in different languages.

The promise of the Data Age is that the truth really is in there, somewhere. But our age
has a curse, too: apophenia, the tendency to see patterns that may or
may not exist. As science-fiction visionary William Gibson wrote in his
blog earlier this year: "Want to see the Virgin Mary on a tortilla?
Look long enough."

The model of the Information Age
was the computer network, but the new model looks a lot more like an
old analog radio dial, searching for a signal in a vast sea of static.
The future belongs to those who prove most adept at finding it.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

Dan,

I found this article thoughtful and exciting. However, I need a little clarification: how does this technology differ from, say, Amazon's very strange ability to predict what I want to buy OR what I might like even though I don't know it exists (and they are surprisingly right), before I know based on my purchases cross-referenced against similar consumers? Is this a different filtering system? (This is not a criticism, I'm genuinely interested).

Very similar. But suggesting something like related titles (Amazon) or movie preferences (Netflix) is operating in a very limited environment -- a few thousand options with relatively obvious correlations.

What this suggests is that by tracking your online choices, software can essentially intuit deeper patterns and offer non-obvious suggestions. A hypothetical: If I'm searching for used car deals and commenting regularly on global climate blogs, maybe there's a non-obvious correlation between thriftiness and environmetal concerns and some third interest. Like Amazon and Netflix, such a system could give me suggestions and learn from my responses.

Even something as simple as sorting Google News search results based on individual click patterns is potentially playing with far more subtle relationships than Amazon's recommendation system, which is still revolutionary in its own right. Because if it's reading my click paths in response its ranked results, it's learning things about me I may not even know.

Another thing: Such systems can make highly accurate predictions without identifying the values and variables that produce the accuracy. Bob Chapman has a neural network setup that accurately forecasts shrimp harvests when you pour a bunch of data into the front end. Bob didn't write the program to weight certain variables above others: He wrote the program so that the software can "learn" from previous results.

This is absolutely fascinating. I remember my high school biology teacher warning us about the speed that information would double, back then I couldn't see how it could possibly be a problem. Now, all I have to do is need a specific fact and I have to wade through piles of sources.

On a side note, I met a very interesting man from NJ, the other day. While pumping gas he told me about plans for a plant that will "help" our local shrimping industry. It was quite possibly the most fascinating conversation I've had with a stranger in years.

This is genomics...revisited. The bioinformatics field that has been essential to genomics - actually, all of the -omics (proteomics, metagenomics, metabonomics) - allows us to make sense of all of the sequence or peptide data we generate. The amount of data is mind-boggling. One microbial metagenomic library could generate the full genome worth of sequence data for 30 microorganisms...that's one sample. In science, we're already fully immersed in the data age - and let me add, it's pretty fun. But you have to let yourself take a breath and come up for air from time to time. There's no limit - we can fish for anything now. One evolving problem: a single person or lab can generate so much data that they can't process it all - making that data accessible to the public speeds up science, but then limits opportunities for the lab that generated the data to begin with. Science needs to change with respect to how we are measured - it won't only be "I got 'x' number of publications last year" but "I got 'x' number of publications last year which generated 'x' number of publications from other laboratories." There's reluctance - it's not so much than scientists don't want to share, but they just don't have confidence that the intrinsic value in the data they obtained - data for data's sake - will be valued.