Last updated April 18, 2003. Previous version available here. Changes include correction of an embarrasing statistical error (helpfully pointed out by data God Gerry Wyckoff), some helpful text edits suggested by Nate Kurz, new research directions suggested by the two already named plus Berkman friends, and changes made to the to-do list based on projects already completed.

Summary
GAP is a project designed to document the attention media sources pay to the different nations of the world. GAP performs automated searches on media websites and calculates how many stories each website offers per million people in a nation. On three of the four sites currently monitored, GAP uses statistical regression to estimate how many stories we'd expect per nation and reports the variation from these estimates. The map above displays these variations on data from news.google.com on 4/9/2003. Countries in white are experiencing average attention (between half as many to twice as many stories as anticipated); countries in deep red are experiencing more than 4 times as many stories as anticipated, and countries in light red are experiencing 2x to 4x as much attention as predicted. Similarly, deep blue countries are experiencing 1/4 as many stories as anticipated or fewer, while light blue are experiencing 1/2 to 1/4 as many stories as predicted.

This research is in an extremely early stage, and I'm certainly not ready to publish it. I have serious concerns about methodology (most of which are documented on this page) and am hoping to get feedback and help from friends in addressing some of these methodology concerns before publication. That said, there are already some very interesting implications of the data: nations in South America, Central Asia and Africa appear to be systematically underrepresented, while nations in Western Europe and the Middle East seem to be systematically overrepresented.

Current data sets produced by GAP 'bots are available here. NB: Substantial changes were made to the country keyword lists on April 18th, so data sets created prior to that date and following that date do not map perfectly to one another.

Why?
Why is global attention important? I see three major reasons: trade, aid and intervention.

Trade -
As trade becomes global, it becomes crucial for nations to be globally visible as possible trading partners. India's IT revolution has been a triumph of both education and marketing - not only have India's universities developed tremendous capacity for training top IT professionals, India has also "branded" Bangalore and Hyderabad as world-class IT centers. As a result, multinational corporations have felt comfortable outsourcing major IT projects to Indian firms, spurring a high-value industry. Some middle-income nations have been engaging in branding that is almost corporate, producing inserts for magazines like Newsweek International to promote their nations as product. GAP attempts to look at how successful different nations have been at "getting their brand out".

Aid - There is a small, finite amount of money contributed by individuals and governments to provide humanitarian aid in developing and conflict-ridden nations. This money has a tendency to go towards the conflict most visible at any particular moment - one might term this the "Live Aid" effect. Nations with less well-publicized needs tend to go wanting. After US intervention in Afghanistan, substantial commitments were made by organizations and governments to rebuilding that nation. At the time, many international aid groups expressed concern that other nations were also in need of assistance and that aid to Afghanistan - the popular conflict - might detract from aid to other nations. Now that Afghanistan is no longer as prominent in global media, it's becoming clear that some of these pledged funds will not arrive and Afghanistan, too, may find itself short on reconstruction funding, as those new funds head to Iraq.

Intervention - Individual nations and multilateral coalitions have a tendecy to intervene in high-visibility conflicts and to ignore conflicts in less visible nations. As a number of activists have pointed out, a justification of US intervention in high-visibility Iraq on human rights grounds ignores the low-visibility, but severe, human rights violations occurring in Sudan. Global attention makes it more likely that the UN and other peacekeeping organizations will involve themselves in the prevention of genocides - global attention likely prevented many deaths in the Balkan conflicfts, while lack of attention permitted the massacres in Rwanda in 1994 to occur without outside interference.

Here's another more personal set of reasons why I think global attention is important.

Methodology
At its most basic level, GAP is simply a set of websearches performed automatically. A simple 'bot program (many thanks to Chris Warren for this key bit of code) reads a list of search keywords and presents them to a search engine. It accepts the page returned by the engine, parses to find the total number of results returned for that particular search and writes it for memory. Once all searches on a particular engine are performed, the bot crunches some numbers and presents an HTML chart of results.

Currently bots are surveying four sites:

news.google.com - Claims to index 4,500 news sources with a back catalog of 30 days.

www.altavista.com/news - Searching over a 30 day period. Unclear how many sources indexed, but Altavista news appears to be powered by Moreover, who claim 2,700 indexed sources for news.

query.nytimes.com - Indexes the last 30 days of news stories from the New York Times, and AP, Reuters and other wire stories included on the NYT website

search.cnn.com - Searching the cnn.com catalog, which appears to index stories as far back as 1996

A slightly different set of keywords is used for each engine, because each engine handles boolean search queries differently. CNN does not appear to handle them at all, and NYT handles them poorly - if more than two keywords are included on a "NOT", the NYT engine appears to handle them as an "AND". I'm aware that I'm ineptly querying Google - with my new copy of _Google Hacks_ in hand, I plan to revise the Google keywords in the next revision. The keywords I'm using are visible in the second column of the HTML results pages. A discussion of why the keywords are screwed up and how to improve them is in the "problems" section of this page.

Using population numbers from the 2002 CIA World Factbook, the bot calculates stories per million inhabitants of a nation and includes that in the output as well. In three of the four other cases, it goes on to calculate a few more numbers - estimated hits, estimated stories per million and variance. The source of these three numbers is a little complicated. In a perfect world, we would expect each nation to have the same number of stories per citizen - in other words, we'd expect Mexico, with 100 million citizens, to have ten times as many stories as Malawi at 10 million. (To nobody's surprise, Malawi gets a lot fewer than 10% of Mexico's stories...)

Graph stories per million against population and a pattern becomes quickly apparent. Big countries have a lot fewer stories per million than small countries. There's a pretty logical explanation for this. Assume for the moment that China's ratio of stories per million (observed at roughly 16 on AltaVista) holds true for the rest of the globe. We then anticipate 1.6 stories for tiny little Tonga. However, every nation, no matter how small, turns up multiple search matches on the large engines - nations tend to get press for appearing in world cup soccer qualifiers or attending UN summits. Even if Tonga loses every soccer match, it's still going to show up enough to skew statistics. The same problem appears in reverse if we attempt to use a Tongan ratio for the rest of the world - we suddenly discover China needs roughly 3 million stories to be proportional, outpacing everyone's story load by a factor of five.

Starting from the assumption that the number of stories in a given nation was a function - probably a nonlinear one - of population, I graphed stories against population on different data sets. Four outliers quickly became obvious - Iraq, Kuwait, Qatar and Guam. The first three are receiving an unusual volume of media attention because of the conflict in Iraq; Guam comes up high because both AltaVista and Google index Agana Pacific Daily News and KUAM-TV, both of which produce a high volume of Guam news stories in relation to Guam's modest population. Story counts for these four nations were above two standard deviations away from the mean stories per million on my initial Google data sets, so I removed them from my correlation curves.

The curve that best fits the remaining 188 data points is of the following form: stories = m * population^n. Variants of this curve fit google, altavista and CNN data sets with correlations ranging between R=0.6963 and R=0.7080. What's especially interesting on those three data sets is that, while there's a great deal of variation in "m", there's almost none in "n". On April 11th, 2003:

Site

Value of M

Value of N

R

AltaVista

0.0194

0.6853

0.6963

CNN

0.0108

0.6718

0.7080

Google

0.0421

0.6793

0.7059

In other words, while these three different sites return different volumes of results (proportional to m), the distribution of these results in relationship to population fits similarly shaped curves (a function of x^n) for each data set. While it's not especially surprising that a single source's curves would not very much over a series of days - after all, 29/30ths of Google and AltaVista's collections are unchanged from one day to the following - it is quite surprising that three different sites would show such similar power series distributions. I was especially surprised to discover that CNN, which is indexing 7 years of content, revealed a near-identical curve to the search engines indexing 30 days worth of content.

It has been harder to find a correlation for NY Times data because the story counts are much smaller, including numerous zero results. I'm using Excel to do curve fitting, and it refuses to fit exponential curves against data sets that include zeros. (If anyone feels like running these sets on Data Desk or Mathematica, please feel free and let me know what correlations you get...) Trying a NYT data set with zero points removed gives a R=0.64 correlation against a power series where n=0.58, a significantly steeper curve than on the other three data sets. Until I have more confidence on the NYT numbers, I'm not using them to calculate estimated story numbers.

After m and n are calculated for the three large sources, they are plugged back into the bot, which then uses the resulting equation to predict how many stories each country should produce and how many it actually does. Each source has its own values of m and n. Currently, I'm calculating these values based on historical data - in the future, I'd like to be able to calculate them on the fly for each data set and look for variations from those predictions. (This would allow the numbers to remain accurate even if Google decided to run with a catalog half its usual size, for instance.) To do this, I'd need to be able to do non-linear regression within the Perl script, rather than in Excel - if anyone has good formulas to do so, please let me know.

Finally, the script calculates the variation between how many stories were expected (as a function of population) and how many were actually found by the bot. Results are color-coded and displayed on the table. Countries that received more than four times as many results as anticipated by the equation are colored deep red. Countries that received two to four times as anticipated are coded in light red. Less than a quarter as many stories as expected and the color is deep blue; half to a quarter of anticipated stories is light blue. Remaining nations are colored in white or beige to distinguish them from untracked nations, which will be colored grey. Currently, this data is mapped on world maps by hand, a painstaking and time-consuming process. I'm currently writing a perl script that will place appropriately colored dots, proportional in size to a nation's population on the map.

Problems and concerns
I've got lots of them. Here are some of the highlights:

Keywords - It's very difficult to use the same keywords for every nation. While most nations give believable responses from using the common name for the nation as a quoted phrase, there are some notable exceptions.

Chad, Georgia - A search for "Georgia" will give you very little information about the Caucuses and a great deal about sports teams in the southern US. A search for "Chad" gets you lots of guys with that nickname and very few stories about the Sahara. In both cases, I'm using the capital of the nation as the search term, knowing it's skewing my results low.Guinea - Guinea, Guinea Bissau, Papua New Guinea and Equatorial Guinea are all nations. There's also a Gulf of Guinea, a disease called Guinea Worm, not to mention guinea pigs and guinea hens. An accurate search for the nation of Guinea would require a search string that looks something like "Guinea NOT "Guinea Bissau" NOT "Papua New Guinea" NOT "Equatorial Guinea" NOT etc." Some engines will handle that, others won't. Two fixes were recently made: "Niger" became "Niger NOT "Niger Delta" NOT "Nigeria", and Democratic Republic of Congo became "Congo NOT Brazzaville". Unfortunately, since CNN does not support even simple Boolean "NOT"s, it's searching for far less exact keywords.Guam, Australia, Canada As discussed above, Guam gets a huge amount of coverage because two news sources that primarily index Guam are covered by Google. Similarly, Canada and Australia have a large number of media sources listed in major engines, and tend to get heavy coverage. Bug? Feature? Given that this reflects the reality of the current indexing situation, I'm removing Guam from correlation studies, but otherwise including these results.American Underrepresentation The USA appears to be pretty well represented in the charts produced by GAP. I suspect it's actually represented by a factor of ten or more. That's because most stories set in the USA don't involve the phrase "United States" - instead, they've got the state or city name where the story is taking place. One could obtain a great deal more precision by doing a search for all US states and major cities. Of course, if I did this, I'd need to do the same thing for Canada, Britain and other large, media-rich nations. Instead of opening that can of worms, I'm thinking about expanding US to "UNITED STATES or US or USA".

Actually, I took a fairly different tack on this. The more I thought about my "United States" results, the more I thought they were grossly inaccurate. As of April 18th, the US is not part of the data set. If I can find a solution to this problem that satisfies me, it will be re-entered into the set. In the meantime, I think the data's cleaner without a severly inaccurate estimate.

Is that a nation?
For a number of the "nations" in the data set, the question of that nation's existence is a political one. I began this project with a data set derived from the CIA World Factbook. I trimmed the data to include only entries with a population of greater than 100,000. The existence of some strange entities - West Bank and Gaza Strip as separate areas - is the result of the CIA's decision, which I've not yet been moved to change. Mayotte is a territory of France, but is also claimed by Comoros. At the moment, I'm listing it as an independent entity - should it be incorporated into France? I don't know. Western Sahara claims to be independent, but is also claimed by Morocco. It's hard to know where to draw and not draw these lines.

All PR is good PR
GAP doesn't attempt to make any distinction between "good" and "bad" PR. I believe this is a feature, not a bug, but I'm anticipating possible criticism here. There's two reasons for not making this distinction. One is that it's very difficult for humans to make this distinction and near impossible for an AI system to do such. Even if I had the skills - or time! - to code a text analysis system that could attempt to classify search results as "good" or "bad", it would not be able to do so with a high degree of confidence. Furthermore, I'm not convinced this distinction needs to be made. Countries that are seeking international aid or intervention would benefit from "bad" PR. I think it would be a highly interesting study to look at perceptions of a nation based on search engine queries, but that's way beyond the scope of this study.

Bad Bots
Google explicitly prohibits automated querying, using its engine. In fairness, Google tries to soften the blow of this prohibition by offering an API. Unfortunately, said API only allows queries of the main database, not the special news collection. I suspect, were I to search through the TOS on the other three sites I'm searching, I'd discover that this sort of research is frowned upon. Chris and I have modified the bots to make them fairly polite - they sleep after each request and try not to overwhelm servers. Still, I've had some unusually repeatable errors on Google and I'm beginning to wonder whether my bot will be able to continue searching news.google.com for an extended period of time. I certainly plan on building an API-compliant bot once the interface makes searching of news.google.com possible, and would happily do so for the other sites should the APIs be made available. In the meantime, though, I'll just worry about bots being blocked.

I had good conversations with Chris Warren and Ben Edelman about this. Ben, the king of bad bots, managed to convince me that, while Google may have an argument for blocking commerical bots, it's hard to justify a prohibition against bots that research Google itself. This, combined with his argument that the Google API may not return data in exactly the same way the search page returns it to a user, has me feeling significantly less bad about my 191 daily "illicit" queries. I've temporarily taken this off my mental list of worries.

Dirty Data
I picked the worst possible week to start this project, from a statistical point of view. Global media obsession with the war in Iraq and the spread of SARS have caused a small number of nations to dominate international media. I've tried to counter the most egregious influences by pulling four countries out of my correlation data sets, but I still may be dealing with a radically distorted media picture. That said - I don't think this is that huge a problem. The CNN data, which draws from a much longer time period, and hence is less susceptible to Iraq and SARS distortions, fits a very similar curve to the curves Google and AltaVista are fitting, suggesting to me that the stories per population distribution may be fairly constant over time. We'll know more after we get a "normal" month of data sometime in the future.

M,N and how much correlation is enough?
I'm not a statistician, but I know enough to know that R=0.7 is a good, but not overwhelming positive correlation. Were I concerned purely with demonstrating that web stories are a nonlinear, continuous function of population, I'd be more obsessed with that figure. Instead, I've got a trickier statistical problem. If R=1.0, that would demonstrate perfect correlation between population and web stories... which we know not to be true. (For a simple example of this, take a look at Japan and Nigeria, which have almost identical populations. Japan generally gets 3-4x as many stories as Nigeria across search sites.) In other words, there is no possible "perfect" model for media distribution, because media distribution in the real world is imperfect. Instead, we need to model a curve that we know will have substantial deviation. I feel reasonably confident that I'm taking the right approach to this, but would be very, very grateful for outside opinions, especially those of my friends in the hard sciences.

And, indeed, I got very good feedback from Professor Wyckoff, who helpfully pointed out that I was an idiot, and reading an R^2 value for my correlation, instead of the R correlation I should have been reading. This was awfully good news - instead of seeing correlations in the neighborhood of .5, I'm actually seeing them in the R=0.7 range for stories/population and even higher for the stories/GDP figures. That said, Gerry gave me a whole pile of other things to worry about. It's quite possible that my data may not fit a normal distribution. In that case, the Pearson (R) correlations will need to be taken with a grain of salt, as they are imperfect without reasonably normal data. Two things I'm currently trying to figure out - just how normal is my data? And, if the data is profoundly non-normal, are there non-parametric techniques I can use to get similar results with this data set? If you're sufficiently statisically literate to understand those two questions and feel like pointing me in the right direction, help is always appreciated.

As I should have predicted, a number of folks have shared my discomfort with the notion of the "perfect distribution" curves. Nate, in particular, made a convincing case that, as there's no "natural law" that would predict a smooth distribution, it's absurd to talk about deviation from this. I'm responding to this argument two ways. One is to begin referring to these functions as "estimators". In other words, they're not statements of how media should be distributed, but estimators that show differences from scenarios where media is distibuted as a function of population or GDP. Second, I'm looking at a new set of directions in the research that rely on comparisons between different data sources, rather than comparisons to an estimator - for instance, how does the New York Times differ from a set of twenty daily international newspapers?

Slow data change
Search engines change slowly - most attempt to grow, rather than replace their current catalogs. I chose news.google.com as my first target because, if they are to be believed, their collection turns over every 30 days, allowing us to see different data sets on a monthly basis. We won't have that opportunity with CNN or several other possible targets. And searches that look at a search engines entire catalog rather than a subset are unlikely to change very much on a daily basis. Again, a bug, or just a reflection of reality? I don't know yet.

Conclusions and correlations
Or, "so what does this tell us, anyway?" Well, it's early, and I haven't tried very hard to cross-correlate data yet. Still, here are a couple of observations I've made so far:

People talk about you if you're rich
Looking at the results of Google, AltaVista and CNN.com searches on April 13, 2003, the top twenty countries, in terms of GDP per capita, are pretty well represented in the media. In fact, Google colors 2 white (within one multiple of expected results), 13 light red (two to four times expected results) and five deep red (more than four times expected results). Altavista colors one (Austria) white, 11 light red and eight deep red. CNN also colors Austria white, 12 light red and 7 deep red. (Don't cry for Austria - they still average 60% more stories than most nations their size.) Just as a side note, this subset includes enormous nations like the US, Japan and Germany, as well as tiny ones like Luxembourg and Iceland.

People don't talk about you if you're poor. Unless you're being shot at. Sometimes.
Looking at the same data set, the results for the twenty poorest nations in terms of GDP per capita are radically different. On all three sets, the majority of countries are colored light blue (half to a quarter of stories anticipated) or dark blue (less than one quarter of stories anticipated.) Afghanistan is deep red on all three sets, and the Gaza Strip is light red on CNN, dark red on the other two. The distribution is slightly less stark than the top 20. Google has 2 deep red, 5 white, 3 light red, and 10 deep blue. AltaVista has 2 deep red, 4 white, 5 light blue and 9 deep blue. CNN has 1 deep red, 1 light red, 7 white, 4 light blue and 7 deep blue.

Conflict doesn't guarantee attention, however. Of those bottom 20, 9 have had major conflicts in the past 5 years. While Gaza and Afghanistan have received substantial attention, Rwanda and Eritrea have received only average attention. Burundi, Democratic Republic of Congo and Sierra Leone averaged less than average attention, and Guinea-Bissau and Ethiopia have received far less than average attention.

Violence doesn't neccesarily lead to attention
The good folks at the Department for Peace and Conflict Research at Uppsala University in Sweden publish a fascinating report on conflict in the world from 1946-2001. The report attempts to document every conflict in the world, civil or international, small or large, during that time period. According to the report, 34 nations have been involved in "intermediate" conflicts or "war" level conflicts between 1998 and 2001, the end of the study. ("Intermediate" conflicts are ones where more than 1000 people die in total, but not in a single year. Wars have more than 1000 deaths in a single year. "Minor" conflicts cause fewer than 1000 deaths.) Nations make the list if they've hosted a conflict - the US makes it for 9/11 and the UK for the "troubles" in Northern Ireland. The list also includes perpetual hotspots like Israel and the nascent Palestinian states, Afghanistan and the Indian/Pakistani border.

It also includes conflicts in a lot of places people can't place on a map, including Uganda, Sudan and Burundi. As it turns out, conflict doesn't look like a strong correlator to attention. Of the 34 nations, 7 are colored dark red (on average, from our three engines on 4/13), 3 light red, 9 white, 11 light blue and 4 dark blue. In pure numerical terms, it looks like conflict areas may be slightly below average in terms of media attention. I hope to take a closer look at more recent data, and also to attempt to correlate attention to casualties. If anyone has a good source of current conflicts and estimated casualties, I'd be very grateful for the reference.

GDP and attention
There appears to be a correlation between total GDP and attention that's much stronger than correlation between GDP per capita and stories. Graphing a nation's total GDP in purchasing parity dollars versus stories on Google on 4/13 gives us the graph that follows below. (Iraq, Kuwait, Qatar and Guam are not in this set for reasons mentioned previously. Western Sahara is eliminated because I lack consistent GDP information.) Correlation is R=0.816, suggesting a fairly strong correlation. Of course, discovering that a nation's wealth correlates to the attention we pay to it surprises no one. Still, it's nice to have hard data to support that cynical presupposition.

An obvious next step - which only became obvious during my mutlihour drive to Harvard yesterday - is to start tracking story counts and their deviation from a GDP curve. I'm going to try to hack something tonight to get that data into the results file by this weekend.

Many of the best suggestions I've received have come from this data set. Gerry has made the suggestion that I look at a couple dozen possible variables and use discriminant functional analysis to see which best predict story counts. Towards that end, I'm starting to collect World Bank and other data that might correlate. (And trying to learn enough about DFA to actually do some of this analysis.) A number of cool possible correlators have been suggested - number of Internet hosts, literacy, distance from the US, language, religion, race, imports/exports, foreign direct investment, tourism...

Next steps

Correct obvious keywording errors and make strings more Google-boolean compliant

Find a better keyword sense for the US and other nations where I anticipate underrepresentation

Build a graphics program that plots variation on a global map for each data set, each day, automatically. (I've started on this, but PerlMagick is kicking my butt and I need to stop coding and work on my other projects for a few weeks.)

Track each data set on a week by week basis to see how m and n change.

Build bots to poll new sources - any suggestions? Suitable candidates return honest total numbers of stories for a query, not rounded or capped totals.

Rewrite GoogleBot to be API compliant - on hold until Google opens news.google.com to the API

Cross-correlate to other data sets. I'd like to look at regional correlations, correlation to oil production, correlation to deaths in conflict, etc. Any suggestions for other sets to correlate to?

I'm open to suggestions for other next steps, as well as any and all questions about methodology, source data, etc. Please email me at ethan@geekcorps.org. Should there be sufficient interest, I'd be happy to start a mail list for discussions about next steps on the project - let me know if you'd be interested in commenting or participating on an ongoing basis.

Thanks to the great feedback I've gotten thus far, here are some projects I'm likely to work on in the near future:

Creating an "influential media" index. If we're going to look at "attention", it's probably worth distinguishing between more and less influential media sources. Rather than attempt to classify each of Google's 4500 sources, I'm looking at identifying a set of 10-20 most influential media properties. This set would attempt to be global and would include the BBC World Service, CNN, the London Times, the Washington Post, etc. A bot (possibly a multithreaded bot that writes to a database, if I get fancy) would poll these sources daily and chart their deviation from population and GDP estimators, as well as from each other. Once we've accumulated sufficient data from the influential bot, it would be interesting to research correlation between attention in influential media with foreign direct investment or foreign aid.

Multiple factor correlation. Is GDP the best possible correlation to story volume? Is there some combination of factors that correlates better than any one factor? I'm working on developing the data sets and statistical chops to be able to take this project on.

Local versus international media. Many of the comments I've gotten have to do with curiosity over local versus international media. This has led me to think about two subsequent studies. One would look at several newspapers and attempt to calculate what percentage of their stories are local, regional, national or international. There's a strong suspicion that media outlets in large nations are more nationally focused than those in small nations and that the US may be significantly more locally focused than other nations. Another thought looks at the relationship between distance, death and prominence of news. Looking at the front page of a newspaper for an extended period of time, we'd look at stories that involved the (non-natural) death of people. We'd track how many deaths were reported and where they occurred in relation to the paper's location. We'd attempt to see if there was a formula that predicted whether a death made the front page, based on how many people died and their distance from the media outlet. Suspicion is that deaths on other continents would need to be orders of magnitude larger than local deaths to make the front page.

AP Wire analysis. Andrew McLaughlin and Diane Cabell, both of Berkman, each independently suggested this cool experiment. Track the AP news wire for a year to determine how many stories were available on a given nation. Then look at a specific media outlet to see which ones it chose or chose not to run. Do newspapers run very little news on Africa because little is available from wire services? Or do they deprioritize available stories due to perceived lack of reader interest?

The Media Coefficient. One of the most interesting statistics in development economics is the Gini coefficient. It measures the difference between the actual distribution of wealth in a country and the theoretical, perfectly equal distribution. The result is a number between 0 (perfect equality) and 1 (perfect inequality). Using the same techique, we could compute media inequality by comparing the actual story count of a nation to a "perfect" media distribution, where everyone in the world gets equal attention from the media. (Jonathan Zittrain points out that the analogy between media and money may well break down. While a world where everyone had the same amount of money might be a very nice place, a world where everyone were equally famous might be very strange. Or might not...) Alternatively, one could calculate differences between an individual outlet's curve and a mean curve, like the proposed Influential Media Index.

Thanks
Many thanks to Chris Warren for coding the original bot, to the Berkman Center for hosting these pages (and bots!), to Rachel and everyone else who's patiently listened to be babble about this project over the past couple of weeks. Many thanks to Gerry Wyckoff, Nate Kurz, Andrew McLaughlin, Diane Cabell, and Jonathan Zittrain for their helpful input and direction.