The World Free Web

Eric Ries has spotted an irony in the Slashdot effect: When a site is
Slashdotted, the information on it becomes harder to reach not because
it has become scarce, but because it is being copied to thousands upon
thousands of other machines. Shouldn't this make it easier to get
instead of harder? In today's editorial, Eric suggests a way to turn
the situation around so popular information becomes instantly more
accessible rather than less.For those that don't already know, FreeNet is a fully distributed
information storage system somewhat akin to Gnutella, except that it
is anonymous and far more resilient to attack. The basic concept is
that each node in the network replicates content that passes through
it on the way to another node. This has many advantages, including the
fact that popular information tends to propagate across the network
and become more abundant over time. I won't go into all of the
features of the FreeNet project here because you can read all about it
on their Web page.

Compare FreeNet to the current WWW architecture. As information on a
Web server becomes more popular, it becomes more difficult for users
to access. Witness the impressive "Slashdot effect" which occurs when
thousands of users suddenly overwhelm a Web server. The Slashdot
effect is caused by the centralization of information. When
information is centrally located, this location becomes a single point
of failure for the distribution of that information. The irony of the
Slashdot effect is that it is caused by users making thousands of
copies of the relevant information. It's not as if the information has
become scarce -- quite the opposite. Users ought to be able to share
that information with each other, decentralizing it.
[Editor's note: In fact, this already happens. What do you see almost
immediately in the comments to a story that links to a movie trailer
or the photos of Hemos's wedding? "Here's a mirror" and "Here's
another". Making this more efficient by turning it into part of the
system would actually just be the next step in an already-established
practice.]

I propose that FreeNet be integrated into the Mozilla cache structure,
allowing users to form a sort of "browsers' cooperative" in which
pages are freely shared in a giant collective caching structure. (Of
course, this should not be limited just to the Mozilla browser, but I
think it's a good starting point.)

I have given the structure of this cooperative some thought, and I
want to give an overview of how I think the system should work. First
comes a technical overview, which details the relatively simple
integration work that needs to be done. Second, I will give a more
abstract view of how I think the social structure of such a
cooperative ought to be formed. I have dubbed this system the World
Free Web (or WFW, not to be confused with the WWF or WCW).

Technical Overview

When a user makes a request for a page using this new enhanced WFW
browser, three things would happen simultaneously:

The browser makes an HTTP request to the requisite server.

The browser checks the on-disk and in-memory caches.

The browser submits a FreeNet request to the local FreeNet node.

Now, whichever of these methods returns a valid result first is
displayed to the user. The user need not have any knowledge of which
method was used, although if they get outdated or garbage data, they
can always hit "shift reload" which should force the browser to use
method #1 to re-fetch the data. This is similar to the way many proxy
servers work today. Behind the scenes, as each page is inserted into
the browser cache, it is also inserted into the local FreeNet
node. The entire process is transparent to the end user.

In the course of normal operation, this whole scheme will behave in a
very similar way to normal Web browsing. There are really only two
cases in which the FreeNet node would provide a page faster than the
HTTP request:

When the site in question is down or under heavy load.

When there is high network lag between the user and the site.

Philosophically, this scheme has one primary benefit: Each user of the
Web, even a non-techie type, becomes a contributor to the network
infrastructure instead of simply a drain on resources. This is much
like the old days of Usenet, when each person shared her news feed
with others. More on this below.

Practically, there are many more extended benefits. For instance, one
of the problems with distributed systems such as FreeNet is the lack
of feedback or ratings on the quality of the information. The WFW can
automatically provide reliable feedback on the validity of
information. If a user hits the "super reload" button after getting a
page from FreeNet, this page is likely to be of suspect quality (it is
either a bogus result or out-of-date). Large Web sites will no longer
have a monopoly on the ability to handle a large number of users. This
is just an example of the kinds of things users can do when they band
together.

But it gets better. Once this starts to catch on, proxy servers and
Web servers can be adapted to start participating in the
system. Imagine a new HTTP response code that indicates that the
server is too busy to handle your request right now, but that the data
you want was just inserted into FreeNet with a given key. Small sites
get to leverage the bandwidth and storage of their users to reduce
costs.

FreeNet itself benefits in a number of ways. Since many more people
are using FreeNet just by using their browsers, the amount of
information and overall storage capacity in FreeNet is increased by
several orders of magnitude. All the virtues of FreeNet's design
become stronger as the number of users increases. Having more nodes
increases the overall resilience of the network to attack, and having
nodes run transparently by "normal" users makes it harder to accuse
FreeNet users of engaging in suspicious activity.

Social Organization

In order to be successful, I believe the WFW should work as a true
grassroots user movement. In order to support this, the WFW will have
to add slightly to the underlying FreeNet protocol. However, a few
things should be noted. First, WFW nodes could still act as normal
FreeNet nodes, performing all the operations that the typical FreeNet
node would. The rules I am about to outline would only apply to WFW
nodes talking to other WFW nodes, and need not apply when they are
talking to normal FreeNet nodes.

First of all, one of the big problems with FreeNet as it currently
stands is the bootstrapping process of finding out about other FreeNet
nodes. Currently, FreeNet maintains an optional central repository of
nodes which is available via the Web. This is not a great long-term
solution, as it reintroduces centralization into a system that should
be fully distributed. My proposal is that the WFW be a closed "club"
structure. In order to join, you have to get an existing member to
sponsor you. In many cases, this member could just be your ISP, but it
does not need to be.

Each WFW node could have an ACL that keeps track of other nodes that
the current node is willing to accept requests from. When a node is
introduced into the system via a sponsorship, at first this node will
only be allowed to make requests via the sponsoring node. The node
will also handle requests through its parent node, but as it fulfills
these requests (and hence, becomes more and more useful to the rest of
the network), other nodes will start to accept direct connections from
it. This produces the proper incentives to marginalize the effects of
spammers. If you are going to sponsor people, your node will be the
primary victim of any malicious activity they engage in, and you will
be able to cut off their access if they do engage in such
behavior. Only after a node has proved its utility to the rest of the
network will it gradually be brought closer to the strongly-connected
center, and if it starts to change its behavior, it will gradually be
pushed out towards the periphery.

Clearly, there is much more work to be done. The WFW is a first step
towards accomplishing a more intelligent mainstream net architecture
which recognizes that information cannot and should not be controlled
by an elite few. But this is just an outline, a sketch of what's
coming. I am hoping to get people involved in a development effort --
a few from the FreeNet team, a few from the Mozilla team at first --
but then there's plenty more work to be done. If this is to succeed,
it will have to be a community effort. Consider this your official
invitation. If you'd like to get involved, the project has a
home at
http://enzyme.sourceforge.net/WFW/.

While you're thinking about these issues, you might want to check out
Professor David Gelernter's latest manifesto, The
Second Coming.

Eric Ries (eries@CatalystRecruiting.com)
is working on a BS in Computer Science and a BA in Philosophy at Yale
University. He is currently CTO of the Internet startup company
Catalyst Recruiting (http://www.CatalystRecruiting.com/)
and its cousin, the Enzyme open-source project (http://enzyme.sourceforge.net/). His
previous work experience ranges from Microsoft to the San Diego
Supercomputer Center. He has been published on Java and other topics
in both books and magazines. He was co-author of The Black Art of
Java Game Programming, among others, and was the Games &
Graphics editor for the Java Developer's Journal. His complete resume
is available at http://i.am/EricRies/.

T-Shirts and Fame!

We're eager to find people interested in writing editorials on
software-related topics. We're flexible on length, style, and topic,
so long as you know what you're talking about and back up your
opinions with facts. Anyone who writes an editorial gets a freshmeat
t-shirt from ThinkGeek in
addition to 15 minutes of fame. If you think you'd like to try your
hand at it, let jeff.covey@freshmeat.net
know what you'd like to write about.

Recent comments

> One of the posters mentioned a problem
> with bandwidth... such as do you really
> want to download a 16MB file from some
> guy with a 28.8 connection? Well... I'm
> not a programmer, but I am an
> engineer... If you know who has the
> file, couldn't you simultaneously fetch
> portions of it from multiple sources?
> In this case, the more available
> sources, the easier -- your end -- of
> the pipe saturates... in this case a
> good thing.
>
> -- Phenym

Hmmm... interesting idea. I suspect that the overheads in co-ordinating the partial downloads from all-over the place might kill the idea, but it's worth exploring...

Bandwidth Problem
One of the posters mentioned a problem with bandwidth... such as do you really want to download a 16MB file from some guy with a 28.8 connection? Well... I'm not a programmer, but I am an engineer... If you know who has the file, couldn't you simultaneously fetch portions of it from multiple sources? In this case, the more available sources, the easier -- your end -- of the pipe saturates... in this case a good thing.

security: how does a client know it's from the desired source, and how does a source know that only the desired client(s) got it. I'd suggest PGP/SSL type signatures and encryption. Be aware that data targeted for specific client/s cannot be cached, except specifically for those recipient/s.

freshness: large amounts of (all?) content is dynamic to some degree. How do you avoid serving stale content, while still caching? http already supports object meta-data in the from of Expires headers and also supports freshness-checking with If-Modified-Since (IMS) requests. The problem is, how do you know in advance when the the data might have changed by (time to do an IMS request), or even better should definitely have changed by (no point in caching anymore)?

storage: with limited distributed storage space, how do you optimise hitrates. This requires optimising distribution of data and determining what data to actually cache. Throwing away suspected stale stuff will free up space, but might miss out on IMS hits. Keeping stuff that is fresh is a waste if no one is going to request it.

bandwidth: huge amounts of storage are useless without the bandwidth needed to get at it. How do you use that bandwidth effectively? Do you really want someone on the other side of the globe to be fetching 16M objects from your little part of the distributed cache over your 28.8k modem? Would they even want to? This is possibly the biggest problem with distributed, client-based caches; servers have all the bandwidth, so why fetch from a low-bandwidth cache? Putting large caches at high-bandwidth distribution points is nearly always more effective than having low-bandwidth clients share caches.

These requirements all interact and contradict with each other. It's a big juggling act. I'd have to say http is pretty comprehensive in its attempts at ensuring at least freshness(if very ad-hock in design), unfortunately not everyone is using it to its full capability.

The most exciting solution for bandwidth I've seen is rproxy
, which uses the rsync algorithm for doing delta-based updates to stale cached objects. This could replace IMS fetches entirely, and reduce bandwidth significantly on misses, even for totaly dynamic content.

Freenet is a cool idea, but I see its primary aim as dynamicly distributing data to protect it from censorship. Efficient distribution of data is one of the technical problems that must be overcome to meet that aim. I'd be quite surprised (and pleased) if they manage to do this so affectively that it becomes a general solution to data distribution problems outside their primary objective.

META information
It seems to me, that the problem can be fixed with HTTP and some XML-Tags (or HTML-META-Tags). What is needed is the information:
What mirrors do exist
What mirrors should be used

On the other hand there is the What does the browser do with the information ?-problem. We had Mosaic that interpreted the LINK-element - others did not. The best way should be the development of a tag-set that could also be integrated in XHTML - I suppose there is one allready that gives the possibility to decide what do to next after the header of a document is retrieved. It is also possible to distinguish between stable and dynamic parts of a page (maybe only ads are dynamic). I don't think we need another new software - we need widely accepted protocols.

I like the description of the browser site in the article (like the "What's related") But I think buttons and menus should be more flexible (functions should be loadable dynamically - not build-in!) I see that the classical web-page comes to an end. Why shouldn't every web-page should have one pull-down-menu for it's contents? If an index is recognized by the browser he could display ist like he or the user wants. This would result in compatibility to every browser-type (even Lynx).

XML was invented because one would not implement new features and elements every time someone has an idea. It is time for the WWW, that this ideas get supported more effectively. And this will also mean: Not limited to BROWSERS at all!