John W. Tukey said something about how valuable it is to think about the world while pawing through a set of data: that’s the essence of “exploratory data analysis.” Meaning that in real life, the most fruitful time we spend is when we are mulling about what a set of data might mean. “Concluding” and “confirming” get more press but are a lot less fun and may be much less useful. Back when I was a real data geek at the University of Colorado, I remember getting quite bored when all the wrinkles of a dataset were worked out, but was completely engaged and focused as data retrieval and interpretation were explored.

So I live in awe and some envy and some skepticism at Marc Smith’s Twitter diagrams. Each one seems like a tour de force, but they always leave me wanting. NodeXL makes collecting Twitter data so easy, but I always walk away wondering what it is that I’ve seen. It seems to me that no single view of a set of data is interesting beyond all the others: what’s interesting (and useful) is when we can look at from angles, such as:

If I look at the data from one point of view I see something slightly different than I do from the other(e.g., the two graphs in this posting).

I chatted with Alice MacGillivray (@4km) who made this observation: “One pattern I noticed is that ‘outsiders’ tended to be people who were tracking the OCE tag AND were already connected to at least one of us. Not surprising but raises challenges for offsite engagement.” That’s a reminder that part of our intention was to engage people who were not at the face-to-face event and demonstrate how that can happen.

I think that data about communities and social interaction is even more full of diverse meanings, so we should always resist closing in on “this is what it means.” We need to come up with more stories from our vast treasure troves of data. More statements such as:

That reminds me that what I was really trying to do or should have tried to do at that event was X.

And the outlier-person or outlier-topic that’s most interesting for future exploration is X.

The next time I’m involved with that group, I’m going to do X or say X.

When I showed it to X, she said X, which is really interesting, but I can’t repeat it here.

From a community leader's perspective a good lead or a follow-up conversation HAS to be more valuable than any participation data that we might use to justify ourselves or our work! (It's a lot more interesting, in any case.)

From an analytical perspective, having an idea of "bad leads" or "dead end hypotheses" seems just as valuable. I would love to hear an expert (like Marc Smith) muse and mumble as he looks at a dataset like this. (In a follow-up email he provided a link to a slide deck on the topic of future directions for NodeXL: http://www.slideshare.net/Marc_A_Smith/20120622-w… .)