In this post, I’ll slightly change gears and present some thoughts on a more research-like use of this data. First, an introduction to what drove this thinking.

“Why do we need to provide navigation to communities? There’s nothing going in them anyway!”

A few years back as we were considering some changes in the navigational architecture on our intranet, I heard the above statment and it made me scratch my head. What did this person mean – there is nothing going on in communities? There sure seemed to be a lot of activity that I could see!

A quick bit of background: Though I have not discussed much about our community program outside of the mailing lists, every community had other resources that they utilized – one of the most common being a site on our intranet. On top of that, at the time of the discussion mentioned above, communities actually had a top spot in the global navigation on our intranet – which provided the typical menu-style navigation to top resources employees needed. One of the top-level menus was labeled “communities” and as sub-menu items, it included subset of the most strategic / active communities. Very nice and direct way to guide employees to these sites (and through them to the other resources available to community members like the mailing lists I’ve discussed).

Back to the discussion at hand – As we were revisiting the navigational architecture, one of the inputs was usage of the various destinations that made up the global navigation. We have a good web analytics solution in place on our intranet (the same we use on our public site) so we had some good insight on usage and I could not argue the point – the intranet sites for the communities simply did not get much traffic.

As I considered this, a thought occurred to me – what we were missing is that we had two distinct ways of viewing “usage” or “activity” (web site usage and mailing list membership / activity) and we were unable to merge them. An immediate question occurred to me – what if, instead of a mailing list tool, we used an online forum tool of some sort (say, phpBB or something similar)? Wouldn’t that merge together these two factors? The act of posting to a forum or reading forums immediately becomes different web-based activities that we could measure, right?

Given the history of mailing list usage within the company, I was not ready to seriously propose that kind of change, but I did set out to try to answer the question – Can we somehow compare mailing list activity to web site usage to be able to merge together this data?

The rest of this post will discuss how I went about this and present some of the details behind what I found.

The Basic Components

The starting point for my thinking was that the rough analogy to make between web sites and mailing lists is that a single post to a mailing list could be thought of as equivalent to a web page. The argument I would make is that (of course, depending on the software used), for a visitor to read a single post using an online forum tool, they would have to visit the page displaying that post. So our first component is

Pc = the number of posts during a given time period for a community

In reality, many tools will combine together a thread into a single page (or, at least, fewer than one page per comment). If you make an assumption that within a community, there’s likely an average number of posts per thread, we could define a constant representing that ratio. So, define:

Rc = the ratio of posts per thread within a community for a given time period

Note that while I did not discuss it in the context of the review of activity metrics, it’s possible with the activity data we are gathering to identify thread and so we can compute Rc.

Tc = total threads within a community for a given time period

Rc = Pc / Tc

Now, how do we make an estimate of how many page views members would generate if they visited the forum instead of having posts show up in their mailbox? The first (rough, and quite poor) guess would be that every member would read every post. This is not realistic and to get an accurate answer would likely require some analysis directly with community members. That being said, I think, within a constant factor, the number of readers can be approximated by the number of active members within the community (it’s true that any active member can be assumed to have read at least some of the posts – their own). A couple more definitions, then:

Mc = the number of members of a community at a given time

Ac = the number of active members within a community for a given time period

In addition to assuming that active members represent a high percentage of readers, I wanted to reflect the readership (which is likely lower) among non-active members (AKA “lurkers”). We know the number of lurkers for a given time period is:

Lc = the number of lurkers within a community over a given time period = (Mc – Ac)

So we can define a factor representing the readership of these lurkers

PRc = the percent of lurkers who would read posts during a given time period (PR means “passive reader”)

Can we approximate PRc for a community from data we are already capturing? At the (fuzzy) level of this argument, I would think that the percentage of active to total members probably is echoed within the lurker community to estimate the number of lurkers who will read any given post in detail:

PRc ~= Ac / Mc

The Formula

So, with the basic components defined above, the formula that I have worked out for computing a proxy for web site traffic from mailing lists becomes:

Uc = the “usage” of a community as reflected through its mailing list

= Pc * (Ac + PRc * Lc) / Rc

= Pc * (Ac + Ac / Mc * Lc) / Rc

= Pc * (Ac + Ac / Mc * (Mc – Ac)) / Rc

= (2 * Pc * Ac – Pc * Ac2 / Mc ) / (Pc / Tc)

= (2 * Ac * Tc – Ac2 * Tc / Mc)

So with that, we have a formula which can help us relate mailing list activity to web site usage (up to some perhaps over-reaching simplifications, I’ll admit!). All of these factors are measurable from the data we are collecting and so I’ll provide a couple of sample charts in the next section.

Some Samples

Here are a few samples of measuring this “usage” over a series of quarters in various communities.

As you will see in the samples, this metric shows a wide variance in values between communities, but relative stability of values within a community.

Small Community Usage Metric

The first sample shows data for a small community. As before, I have obfuscated the data a bit, but you can see a bit jump early in the lifecycle and then an extended period of low-level usage. The spike represents the formal “launch” of the community, when a first communication went out to potential members and many people joined. The drop-off to low level usage shown here represents, I believe, a challenge for the community to address and to make the community more vital (of course, it could also be that other ways of observing “usage” of the community might expose that it actually is very vital).

The second sample shows data for a large, stable community – you’ll note that the computed value for “usage” is significantly higher here than in the above sample (in the range of around 30,000-40,000 as opposed to a range of 500-1,000 as the small community stabilized around).

Large Community

How does this relate to the title of this post?

Well, after putting the above together, I realized that if you ignore the Rc factor (which converts the measurement of these “member-posts” into a figure purportedly comparable to web page views), you get a number that represents how much of an impact the flow of content through a mailing list has on its members – indirectly, a measure of how much information or knowledge could be passing through a community’s members.

The end result calculation would look something like:

Kc = the knowledge flow within a community for a given period

= (2 * Pc * Ac – Pc * Ac2 / Mc )

This concept depends on making the (giant) leap that the “knowledge content” of a post is equivalent across all posts, which is obviously not true. For the intellectual argument, though, one could introduce a factor that could be measured for each post and replace Pc (which has the effect of treating the knowledge content of a post as “1″) with the sum of that evaluation of each post across a community (where each post is scored a 0-1 on a scale representing that post’s “knowledge content”).

I have not done that analysis, however (it would be a very subjective and manually intensive task!), and, within an approximation that’s probably no less accurate than all of the assumptions above (said with appropriate tongue-in-cheek), I would say that one could argue that you could multiply Kc by a constant factor (representing the average knowledge content of a community) and have the same effect.

Further, if you use this calculation primarily to compare a community with itself over time, you likely find that the constant factor likely does not change over time and you can simply remove it from the calculation (again, with the qualifier that you can then only compare a community to itself!) and you are left with the above definition of Kc.

Validating this Analysis

So far, I’ve provided a fairly complicated description of this compound metric and a couple of sample charts that show this metric for a couple of sample communities. Some obvious questions you might be asking:

What’s the value in this metric? Is it actionable?

How valid is this metric in the sense of really reflecting “usage” (much less any sense of “knowledge flow”)?

To be honest, so far, I have not been very successful in answering these questions. In terms of being actionable – using this data might lend itself to the types of actions you take based on web analytics, however, there is not an obvious (to me) analog to the conversion that is a fundamental component of web analytics. It seems more likely an after-the-fact measure of what happened instead of a forward-looking tool that can help a community manager or community leader focus the community.

In terms of validity, I’m not sure how to go about measuring if this metric if “valid”. Some ideas that come to my mind at least to compare this to include:

Comparing this metric to the actual usage of a community’s web site (via our web analytics tool); do they correlate in some way?

Comparing this compound metric to the simpler metric of posts to the community’s mailing lists – how do these compare and why does (or does not) this compound metric provide any better insight?

Taking a different approach to this formula – I think understanding how this metric changes as you hold some parts constant and change others would help understand what it “means”.

For example, if membership and posts remain the same, but the # of different posters changes, what happens?

If posts active members change but total membership changes, what happens?

I’d be very happy to hear from someone who might have some thoughts on how to validate this metric or (perhaps even better) poke holes in what its failings are.

Summing Up

Whew! If you’re still with me, you are a brave or stubborn soul! A few thoughts on all of this to summarize:

I do believe that this type of analysis could be useful to understand the flow through a community over time; I think it needs significantly more research to get to a better formula, though the outline above could be a starting point;

I have not been able to really validate the ideas expressed here in any way except intuitively, so take with an appropriate grain of salt;

I think this type of analysis could also be applied in a variety of other contexts – use of a community Wiki, use of a community blog, attendance at “physical space” meetings, attending virtual knowledge share events, use of community workspaces, etc.; I have not tried this, yet, though;

With that last comment in mind, I believe that a key idea here is that this type of compound metric provides an avenue to combine the measurement of knowledge sharing across all of a community’s avenues – raising the possibility of providing something like a “Dow Jones Index” for a community’s knowledge sharing – perhaps collapsing down to a single, measurable quantity that you can track over time.

And, yes, I do recognize that such a metric is, at best, on shaky ground and likely not really supportable. I raise this idea because I was once asked to generate a single “knowledge sharing index” that would cover the corporation and this type of analysis could lead in that direction. (For the record, when faced with that question, we resisted spending time