FLOCKS, HERDS, & STORIES

Web Science 2011

Flocks, Herds, and Stories

temporal coherence and the long tail

Mark Bernstein

Eastgate Systems, Inc.

click anywhere

I had intended here to offer some
Explanations and apologies
But there's no time. Prepare to shed some tears
And hang on tight.

“There are many ways in which we could ruin the Web.”

Wendy Hall(Hypertext 2011)

Last week at Hypertext, Dame Wendy Hall
Reminded us of what we might forget:
The Web is large and new, it flourishes,
It seems to go from strength to strength, and yet
We do not know how strong it really is.
We must remember that we still could wreck the web.

The Death Of Surfing

“Web-surfing is dead. Sure, users may check out a few new sites every now and then...most users will probably spend the majority of their time with a small number of websites that meet their requirements.”

Jacob Nielsen(Jan 1996)

A cheerful Jakob Nielsen once forecast
That the web's early froth would soon subside
And leave us with a few large sites that would
Provide the stuff that common readers want,
Leaving the failed and unsuccessul sites
Unfrequented, unvisited, to wither on the vine.

Integrating the long tail

The long tail remains viable if its integral is large.

If the integral becomes small, larger interests and governments will eventually discard it.

The key is that the long tail remain long.

This has not happened yet, and the long tail
Still seems to flourish. Blogging, to be sure
Is not quite what it was, but Twitter is,
And Facebook seems to make a lot for Zynga
And someday might for us. Besides, we still
Have lots of blogs and lots of other sites,
Folkloric or caloric, scholarly or fun.
They're doing fine for now, it seems, and so
What should we fear?

As We May Have Thought?

One lesson of the short 20th century: crowds are not always smart.

Lesson of 2011: the net does not route around power.

I want to argue here
That beyond familiar hazards lies one more:
The tricky danger that mere traffic noise –
Seemingly harmless, familiar, not a threat –
Presents when it confronts the zero lower bound
Without a viable recovery scheme.

Web traffic is noisy

We all know that Web traffic is very noisy, especially for low traffic sites.

Trained to be a chemist, I was taught
To first look for signal and, that found,
To always check the noise. For if no noise appears
It's likely that your signal is unsound.
Finding some noise, you ought give some thought
To measuring its size from trough to spike.
We all know well that server loads will veer
From high to low, and back, from day to day.

It’s always noisy

Traffic fluctuates all the time.

Sometimes we think we know why.

From hour to hour, alike from week to week.
We can explain it, just as the news
Can always tell us what the market thinks:
"Stocks moved down today on fears of fresh
Inflation. Tech stocks gained on un-
Employment news." But my experiences is
There's always something happening, and the noise
Is never really easy to explain.

What we’d expect

Poisson Distribution

Mean = n

Std. Deviation = √n

We do expect some noise because you can't
Have half a visit. Readers are discrete
Like cars upon the road. At the high-traffic bound
This doesn't matter much. But at the low
Each choice turns out to matter that much more.

Poisson

Speed: 2
Spacing: 0
Prob.: 0.05

Poisson first studied this, and the key thing
To know is the expected variance
Is just the mean; so if the mean is N
The variance is N as well, and so:
Look at our logs. We observe -- especially
In the tail -- a lot more noise than this.

Not Poisson

Speed: 2
Spacing: 10
Prob.: 0.05

Poisson assumes that no one interacts.
But if we interact the noise may change.
If our cars hit the brake in traffic, we
First find that clumps and traffic jams
Increase the noise.

The noise can go down, too

Speed: 2
Spacing: 10
Prob.: 0.5

The noise can go down, too,
As here, where traffic is so dense
That by the time they reach our sampling zone
The cars have all assumed a common speed and spacing.

Independent Browsing

Count: 100
Follow: 0
Avoid: 4
Stop: 1000

The same result is found for better models.
Here independent browsers move through space
Indifferent to what other browsers do.

Flocks Browse Together

Count: 100
Follow: 16
Avoid: 4
Stop: 1000

But here, instead, these readers flock together,
Following their whim unless they see a friend
Nearby, but clinging to their closest friend
If any wanders past.
We may distinguish here
The HERD, in which a pundit does decide
Where everyone should go, from what I here
Propose, a simple FLOCK, where no one is in charge
Yet nonetheless these organized behaviors do emerge.

Flocks Browse Together

Count: 100
Follow: 16
Avoid: 4
Stop: 1000

Temporal correlation boosts the noise as when
A classroom full of students visits you today
Because a visit to your website was assigned.
They won't be back today. A year from now
The next year's class may visit you again.

Narrative Drive

Storytelling and narrative

Unfolding events

We want to know what happens next

Some of you may know that for quite a few years
I've worked as a publisher of hypertext fiction.
We once were the darling of postmodern critics,
And later the – something – despised by their rivals.
I mention this story not just for your sympathy
(Though that's always welcome) but rather because
I want to distinguish the high modern fiction
We publish at Eastgate from broader concerns
For narrative that I've expressed in the paper.

Narrative Drive

Storytelling and narrative

Unfolding events

We want to know what happens next

Again, correlated behavior leads to aggregate browsing

People like stories, we all want to know
What happens next. We'll tune in tomorrow
To learn how things went, to hear of our friends –
Even our friends whom we don't really know,
Even our friends who don't really exist:
Especially our friends who may not be real.

Fewer Actors: More Noise

Herds

Flocks

Narratives

We visit tomorrow to see how things went,
Perhaps we might mention the case to some friends,
Or write a short note in our weblog about it.
Either way, herd or flock, stories focus the web.

A Test-Tube Blogosphere

Each site has n links to other sites

Each writer visits sites picked from their list of links

Depth first browsing to depth D

Moderate probability P of adding link to newly found site

Small (or zero) probability R of adding link to a site we notice in referrer logs

A simple test-tube blogosphere
Will quickly illustrate
The dangers our sites face when they
Confront the lower bound.
To start, we have some sites.
Each has N outbound links
To other sites they like or use
For regular updates.
Each day, each writer chooses
A few links to pursue
And Markovly they follow up
To see what might be new.
Sometimes, a site discovered
Is added to its list.
And we might sometimes take a look
At sites that links to us.

Few Links, no referrer logs

Count: 50
Links per site: 3
Reading list: 4
Logs: 0

When links are sparse and logs ignored, it's true
That nearly all the traffic goes to some successful sites.
The others publish links to what they read
– just like the rest;
The sites that still have traffic are in red, the rest are blue.

The zero lower bound

No site starts with an advantage

Fluctuations make some sites easier to discover

The traffic-rich grow richer

Once traffic falls enough, you're out of the game

A site that has no links no longer can be found,
And so, quite soon what once was our long tail
Decays to form a grim but stable web
In just the way that Nielsen once foretold.

Few links, referrer logs

Count: 50
Links per site: 3
Reading list: 4
Logs: 0.05

Static, dreary, dull and dead: our tail
Is now no longer long. What can we do
To shake things up? Our bloggers might pursue
Some inbound links discovered in their logs.

Discovery helps

Referrer logs let low-traffic sites be found

Search engines work too

Sparse links hasten convergence

Even the poor may occasionally strike it rich

Googling one's self would also do the trick,
Or keyword search, or even buying ads.
The same grim logic holds: our tail again grows short.
But now a site, though blue, can rise again
To shine in splendid redness for a time.
If links are sparse, even the lucky rich
May fall from grace and hear the baying hound,
That grim, unlucky reaper: the zero lower bound.

More links, no referrer logs

Count: 50
Links per site: 10
Reading list: 4
Logs: 0

Add more links, we're better off.
This observation is not new:
Its why we study hypertext.
Mindless link farms dont help much,
And simply linking’s not enough
Since if we hit the zero lower bound
Our site turns blue, and our mood blues too.

More links, no referrer logs 2

Count: 100
Links per site: 20
Reading list: 4
Logs: 0

The hope here is, add links enough
And readers too: you still might lose
The longest part of the long tail.....

More Links

Much better!

The zero lower bound still looms.

...But still retain a vibrant "middle class"
Of many sites too busy to endure
That fatal time of loss, but which
Need not consolidate – or anyway
That won't collapse right now.

More links and referrer logs

Count: 50
Links per site: 10
Reading list: 4
Logs: 0.1

As you've forseen, if we provide more links
And use our logs to rediscover sites
About lost love, the plate that time forgot,
Or synthesis of octatetraenes:
Whatever floats your boat: as you expect
The genre tropes that shape this talk compell
Our problems are resolved – and all is well.

Conclusions

So: we need lots of links, and backlinks too,
if we are not to wreck that fragile Web.

Links shape stories.

Narrative, rhetoric, argumentation: the fabric from which the web is constructed and construed.

Artists are not children. Scientists are not barbarians. Capitalism is not our fault.

So: we need lots of links, and backlinks too,
if we are not to wreck that fragile Web.
And have we links enough? Is our familiar Web
That final, happy Web that we just saw?
Or have we launched downhill? How can we tell
What we've already lost?
My second point: those links
Shape stories, expectations that – when violated
as I'm doing right now – drive our readers mad.
Fiction and rhetoric are not artistic toys:
They are the raw material with which Web Science works.