Britannica Online and
Wikipedia

A disruptive technology can do more than cost a
business money. Sometimes the disruption extends so deep that the
virtues of the business’s past become problems, and
techniques that would previously have been vices suddenly become
virtues. The emergence of Wikipedia and its overshadowing of the
Encyclopedia Britannica is one case where the rules changed
decisively in favor of an upstart challenger.

Applicable Web 2.0
Patterns

The collaborative encyclopedia approach ushered in by Wikipedia
capitalizes on several Web 2.0
patterns:

Software as a Service

Participation-Collaboration

Rich User Experience

The Synchronized Web

Collaborative Tagging

You can find more information on these patterns in Chapter 7,
Specific Patterns of Web 2.0.

From a Scholarly to
a Collaborative Model

The Encyclopedia Britannica was originally published in 1768 as
a three-volume set, emerging from the intellectual churn of
Edinburgh. It grew quickly, reaching 21 volumes by 1801, and over
the next two centuries, it solidified its reputation as a
comprehensive reference to the world. Producing the printed tomes
was a complex and expensive
enterprise, requiring editors to judge how long to leave an edition
in print, how much to change between editions, what new material to
cover, and who should cover it.

The possibility of an electronic edition was in many ways a
relief at first. The Encyclopedia Britannica took huge strides
during the computer revolution to survive a changing world. In the
mid-1990s, the static book publisher tried bundling an Encyclopedia
Britannica CD with some PCs. That experiment was short-lived, as it
soon became obvious that any publishing effort in the new digital
age had to be dynamic. The company then migrated its entire
encyclopedia set to the Web, where it was free of many of the
edition-by-edition obstacles to updating that had limited its print
and CD editions.

Although this was a daring move, and Britannica continues to
sell its content online, the model behind the encyclopedia’s
creation now faced a major challenge from newcomer Wikipedia.
Whereas Encyclopedia Britannica had relied upon experts and editors
to create its entries, Wikipedia threw the doors open to anyone who
wanted to contribute. While it seemed obvious to many that an
encyclopedia created by volunteers—many of them non-experts,
many of them anonymous, and some of them actually out to cause
trouble—just had to be a terrible idea, Wikipedia has thrived
nonetheless. Even Wikipedia’s founders didn’t quite
know what they were getting into—Wikipedia was originally
supposed to feed into a much more formal, peer-reviewed Nupedia.

In Wikipedia, rather than one authority (typically a committee
of scholars) centrally defining all subjects and content, people
all over the world who are interested in a certain topic can
collaborate asynchronously to create a living, breathing work.
Wikipedia combines the collaborative aspects of wiki sites
(websites that let visitors add, remove, edit, and change content)
with the presentation of authoritative content built on
rich hyperlinks between subjects to
facilitate ultra-fast cross-references of facts and claims.

Wikipedia does have editors, but everyone is welcome to edit.
Volunteers emerge over time, editing and re-editing articles that
interest them. Consistency and quality improve as more people
participate, though the content isn’t always perfect when
first published. Anonymous visitors often make edits to correct
typos or other minor errors. Defending the site against vandals (or
just people with agendas) can be a challenge, especially on
controversial topics, but so far the site seems to have held up.
Wikipedia’s openness allows it to cover nearly anything,
which has created some complications as editors deleted pages they
didn’t consider worthy of inclusion. It’s always a
conversation.

The shift from a top-down editorial approach to a bottom-up
approach is a painful reversal for people who expect only expert
advice when they look up something—and perhaps an even harder
reversal for people who’ve built their careers on being
experts or editors. Businesses facing this kind of competition need
to study whether their business models are sustainable, and whether
it is possible to incorporate the bottom-up approach into their own
work.

Personal Websites and Blogs

The term blog is
short for
weblog, a personal log (or
diary) that is published on the Internet. In many cases, blogs are
what personal websites were initially meant to be. Many early
website gurus preached the idea that online content should always
be fresh and new to keep traffic coming back. That concept holds
just as true now as it did then—the content has just shifted
form.

Applicable Web 2.0
Patterns

Many blogs embrace a variety of the core patterns discussed in
Chapter 7,
Specific Patterns of Web 2.0, such as:

Participation-Collaboration

Collaborative Tagging

Declarative Living and Tag Gardening

Software as a Service

Asynchronous Particle Update (the pattern behind AJAX)

The Synchronized Web

Structured Information (Microformats)

Shifting to Blogs and
Beyond

Static personal websites were, like most websites, intended to
be sources of information about specific subjects. The goal of a
website was to pass information from its steward to its consumers.
Some consumers might visit certain websites (personal or otherwise)
only once to retrieve the information they sought; however, certain
groups of users might wish to visit again to receive updated
information.

In some ways, active blogs are simply personal websites that are
regularly updated, though most blog platforms support features that
illustrate different patterns of use. Because there are no hard
rules for how frequently either a blog or a personal website should
be updated, nor is it possible to classify either in a general
sense, it is probably not possible to identify clear differences as
patterns. However, here are a few key points that differentiate
blogs:

Blogs are built from posts—often short posts—which
are usually displayed in reverse chronological order (newest first)
on an organizing front page. Many blogs also support some kind of
archive for older posts.

Personal websites and blogs are both published in HTML. Blog publishing, however, usually
uses a slightly different model from traditional HTML website
publishing. Most blog platforms don’t require authors to
write HTML, letting them simply enter text for the blog in an
online form. Blog hosting generally allows users to know less about
their infrastructure than classic HTML publishing. Blogs’
ease of use makes them attractive to Internet users who want a web
presence but have not yet bothered to learn about HTML, scripts,
HTTP, FTP, and other technologies.

Blogs often include some aspects of social networking.
Mechanisms such as ablogroll (a list of other blogs to which
the blog owner wishes to link from his blog) create
mini-communities of like-minded individuals. A blogroll is a great
example of the Declarative Living pattern documented in Chapter 7,
Specific Patterns of Web 2.0. Comment threads can also
help create small communities around websites.

Blogs support mechanisms for publishing information that can be
retrieved via multiple patterns (like Search and Retrieve, Push, or
Direct Request). Instead of readers having to request the page via
HTTP GETs, they can subscribe to feeds (including Atom and RSS) to
receive new
posts in a different form and on a schedule more convenient to
them.

Standard blog software (e.g., Blogger or WordPress) has evolved
well beyond simple tools for presenting posts. The software allows
readers to add their own content, tag content, create blogrolls,
and host discussion forums. Some blog management software lets
readers register to receive notifications when there are updates to
various sections of a blog. The syndication functionality of RSS
(or Atom) has become a core element of many blogs. Many blogs are
updated on a daily basis, yet readers might not want to have to
reload the blog page over and over until a new post is made (most
blog authors do not post exact schedules listing the times their
blogs are updated). It is much more efficient if the reader can
register interest and then receive a notification whenever new
content is published. RSS also describes the content so that
readers can decide whether they want to view the actual blog.

Blogs are also moving away from pure text and graphics. All
kinds of blog mutations are cropping up, including mobile blogs (known as moblogs), video blogs, and even group
blogs.

Developers are adding tools that emphasize patterns of social
interactions surrounding blogs. MyBlogLog.com
has software that uses an AJAX widget to place the details of
readers of a blog on the blog page itself so that you can see who
else has been reading a specific blog. Figure 3.18,
“Screenshot of MyBlogLog.com blog widget” shows the
latest readers of the Technoracle blog at the time of this
writing.[39]

Figure 3.18. Screenshot of
MyBlogLog.com blog widget

Most blog software also offers the ability to socially network
with like-minded bloggers by adding them to your blogroll. Having
your blog appear on other people’s blogrolls helps to elevate
your blog’s status in search engines, as well as in blog
directories such as Technorati that track blog popularity. It also
makes a statement about your personality and your stance on a
variety of subjects.
Figure 3.19, “Example of a blogroll from
Technoracle.blogspot.com” shows an example of a
blogroll.

Figure 3.19. Example of a blogroll
from Technoracle.blogspot.com

A blogroll is a good example of the Declarative Living and Tag
Gardening pattern, as the list of fellow bloggers in some ways tags
the person who posts it. By making a statement regarding whose
blogs they encourage their readers to read, blog owners are
declaring something about themselves. Blog readers can learn more
about blog writers by looking at who they have on their blogrolls.
For example, in
Figure 3.19, “Example of a blogroll from
Technoracle.blogspot.com”, knowing that John Lydon is in
fact Johnny Rotten, the singer for the Sex Pistols, may imply to a
reader that the owner of Technoracle has a somewhat disruptive
personality and will try to speak the truth, even if it’s
unpopular.

Blogs lowered the technical barrier for getting a personal
presence on the Internet, making it much easier for many more
people to join the conversation. Blogs have also changed the
patterns of dissemination of information. Rather than simply
reading a news story on a particular topic, interested readers can
also find related blogs and find out what the average person thinks
about that topic. Blogs represent a new kind of media and offer an
alternative source for people who want more than news
headlines.

More recently, blogs have evolved beyond their basic form. Blogs
have become one of many components in social networking systems
like MySpace and
Facebook: one component in pages people use to connect with others,
not merely to present their own ideas. Going in a different
direction, Twitter has stripped blogging down to a 140-character
minimalist approach, encouraging people to post tiny bits of
information on a regular basis and providing tools for following
people’s feeds.

Screen Scraping and Web
Services

Even in the early days of the Web, developers looked for
ways to combine information from multiple sites. Back then, this
meant screen
scraping—writing code to dig through loosely
structured HTML and extract the vital pieces—which was often
a troublesome process. As Web 2.0 emerged, more and more of that
information became available through web services, which presented
it in a much more structured and more readily usable form.

Applicable Web 2.0
Patterns

These two types of content grabbing illustrate the following
patterns:

Service-Oriented Architecture

Collaborative Tagging

You can find more information on these patterns in Chapter 7,
Specific Patterns of Web 2.0.

Intent and Interaction

In the earliest days of the Web, screen scraping often meant
capturing information from the text-based interfaces of terminal
applications to repurpose them for use in web applications, but the
same technology was quickly turned to websites themselves. HTML is,
after all, a text-based format, if a loosely (or even chaotically,
sometimes) structured one. Web services, on the other hand, are
protocols and standards from various standards bodies that can be
used to programmatically allow access to resources in a predictable
way. XML made the web services revolution when it made it easy to
create structured, labeled, and portable data.

Note

There is no specific standardized definition of web services that explains the exact set
of protocols and specifications that make up the stack, but there
is a set that is generally accepted. It’s important to
examine the web services architecture document from the W3C to get
a feel for what is meant by “web services.” When this
book refers to “web services,” it doesn’t
specifically mean SOAP over HTTP, although this is one popular
implementation. RESTful services available via the Web are just as
relevant.

One major difference between the two types of interactions is
intent. Most owners of resources that have been screen-scraped did
not intend to allow their content to be repurposed. Many were, of
course, probably open to others using their resources; otherwise, they probably wouldn’t have
posted the content on the Internet. However, designing resources
for automated consumption, rather than human consumption, requires
planning ahead and implementing a different, or even parallel,
infrastructure.

A classic example of the shift from screen scraping to services
is Amazon.com. Amazon provides a tremendous
amount of information about books in a reasonably structured
(though sometimes changing) HTML format. It even contains a key
piece of information, the Amazon sales rank, that isn’t
available anywhere else. As a result, many developers have written
programs that scrape the Amazon site.

Rather than fighting this trend, Amazon realized that it had an
opportunity. Its network of Amazon Associates (people and companies
that help Amazon sell goods in exchange for a commission) could use
the information that others were scraping from the site. Amazon set
out to build services to make it easier for its associates to get
to this information—the beginning of a process that has led
Amazon to offer a variety of web services that go far beyond its
product information.

Most web services work falls under the Service-Oriented
Architecture (SOA) pattern described in Chapter 7,
Specific Patterns of Web 2.0. SOA itself doesn’t
depend on the web services family of technologies and standards,
nor is it limited to the enterprise realm where SOA is most
ubiquitous. Web services are built on a set of standards and
technologies that support programmatic sharing of information.
These usually include XML as a foundation, though JSON has proven popular lately for
lightweight sharing. Many web services are built using SOAP and the
Web Services Description Language (WSDL), though others take a
RESTful approach. Additional useful specifications include the SOAP
processing model,[40] the XML Infoset[41] (the abstract
model behind XML), and the OASIS Reference Model for SOA (the
abstract model behind services deployed across multiple domains of
ownership).

While web services and SOA are often thought of as technologies
used inside of enterprises, rather than publicly on the Internet,
the reality is that there is a wide spectrum of uses in both public
and private environments. Open public services are typically
simpler, while services used internally or for more specific
purposes than information broadcast and consumption often support a
richer set of capabilities. Web services now include protocol
support for expressing policies, reliable messaging features,
secure messaging, a security context, domains of trust, and several
other key features. Web services have also spawned an industry for
protocols and architectural models that make use of services such
as Business
Process Management (BPM), composite services, and service
aggregation. The broader variety of web services standards has been
documented in many other books, including Web Services Architecture and Its
Specifications by Luis Felipe Cabrera and Chris Kurt
(Microsoft
Press).

Content Management
Systems and Wikis

As the Web evolved from the playground of hobbyists to the
domain commercial users, the difficulty of maintaining sites
capable of displaying massive amounts of information escalated
rapidly. Content management systems (CMSs) such
as Vignette leaped into the gap to help companies manage their
sites. While CMSs remain a common component of websites today, the
model they use is often one of outward publication: a specific
author or organization creates content, and that content is then
published to readers (who may be able comment on it). Wikis take a
different approach, using the same system to both create and
publish information and thereby allowing readers to become writers
and editors.

Applicable Web 2.0
Patterns

The patterns illustrated in this discussion focus on collaboration:

Participation-Collaboration

Collaborative Tagging

You can find more information on these patterns in Chapter 7,
Specific Patterns of Web 2.0.

Participation and Relevance

Publishing is often a
unilateral action whereby content is made available and further
modifications to the content are minimal. Those who consume the
content participate only as readers.

Wikis may look like ordinary websites presenting content, but
the presence of an edit button indicates a fundamental change.
Users can modify the content by providing comments (much like blog
comments), use the content to create new works based on the content
(mashups), and, in some cases, create specialized versions of the
original content. Their participation gives the content wider
relevancy, because collective intelligence generally provides a
more balanced result than the input of one or two minds.

The phrases “web of participation” and
“harnessing collective intelligence” are often used to explain Web 2.0.
Imagine you owned a software company and you had user manuals for
your software. If you employed a static publishing methodology, you
would write the manuals and publish them based on a series of
presumptions about, for example, the level of technical knowledge
of your users and their semantic interpretations of certain terms
(i.e., you assume they will interpret the terms the same way you
did when you wrote the manuals).

A different way to publish the help manuals would be to use some
form of website—not necessarily a wiki, but something
enabling feedback—that lets people make posts on subjects
pertaining to your software in your online user manuals. Trusting
users to apply their intelligence and participate in creating a
better set of software manuals can be a very useful way to build
manuals full of information and other text you might never have
written yourself. The collective knowledge of your experienced
software users can be instrumental in helping new users of your
software. For an example of this pattern in use, visit http://livedocs.adobe.com and see how Adobe Systems
trusts its users to contribute to published software manuals.

Directories
(Taxonomy) and Tagging (Folksonomy)

Directories are built by small groups of experts to help people
find information they want. Tagging lets people create their own
classifications.

Applicable Web 2.0
Patterns

The following patterns are illustrated in this discussion:

Participation-Collaboration

Collaborative Tagging

Declarative Living and Tag Gardening

Semantic Web Grounding

Rich User Experience

You can find more information on these patterns in Chapter 7,
Specific Patterns of Web 2.0.

Supporting Dynamic
Information Publishing and Finding

Directory structures create hierarchies of resource
descriptions to help users navigate to the information they seek.
The terms used to divide the hierarchy create a taxonomy of
subjects (metadata keywords) that searchers can use as guideposts
to find what they’re looking for. Library card catalogs are
the classic example, though taxonomies come in many forms. Within a
book, tables of contents and especially indexes often describe
taxonomies.

Navigation mechanisms within websites also often describe
taxonomies, with layers of menus and links in the place of tables
of contents and a full-text search option in place of an index.
These resources can help users within a site, but users’
larger problem on the Web has often been one of finding the site
they want to visit. As the number of sites grew exponentially in
the early days of the Web, the availability of an incredible amount
of information was often obscured by the difficulty of finding what
you wanted. The scramble for domain names turned into a gold rush
and advertisers rushed to include websites in their contact
information—but many people arrived on the Web looking for
information on a particular subject, not a particular
advertiser.

The answer, at least at the beginning, was directories.
Directory creators developed taxonomic classification systems for
websites, helping users find their way to roughly the right place.
Online directories usually started with a classification system
with around 8 to 12 top-level subjects. Each subject was further
classified until the directory browser got to a level where most of
the content was very specialized. The Yahoo!
directory was probably the most used directory in the late 1990s,
looking much like Figure 3.20,
“The Yahoo! directory”. (You can still find it at
http://dir.yahoo.com.)

Similarly, clicking on “Countries” in the
subcategory listing shown in Figure 3.21,
“Subcategories under the Regional category” yielded
an alphabetical list of countries, which could be further
decomposed into province/state, city, community, and so on, until
you reached a very small subset of specific results.

Directories have numerous problems. First and foremost, it is
very difficult for a small group—even a small group of
directory specialists—to develop terms and structures that
readers will consistently understand. Additionally, there is the
challenge of placing information in the directory. When web
resource owners add pages to the Yahoo! directory, they navigate to
the nodes where they think the pages belong and then add their
resources from there. However, other people won’t necessarily
go to the same place when looking for that content.

Say, for example, you had a rental car company based in
Vancouver, British Columbia, Canada. Would you navigate to the node
under
Regional→Countries→Canada→Provinces→British
Columbia→Cities→Vancouver, and then add your content? Or
would you instead add it under Recreation &
Sports→Travel→Transportation→Commuting, or perhaps
Business & Economy→Shopping and
Services→Automotive→Rentals? Taxonomists have solved this
problem by creatingpolyhierarchies, where an item can be
classified under more than one node in the tree. However, many
Internet directories are still implemented asmonohierarchies, where only
one node can be used to classify any specific object. While
polyhierarchies are more flexible, they can also be confusing to
implement.

Another problem concerns terminology. Although terms such as
“vehicles for hire” and “automobiles for
lease” are equally relevant regarding your rental car
company, users searching for these terms will not be led to your
website. Adding non-English-speaking users to the mix presents a
whole new crop of problems. Taxonomists can solve these problems
too, using synonyms and other tools. It just requires an
ever-greater investment in taxonomy development and
infrastructure.

Hierarchical taxonomies are far from the only approach to
helping users find data, however. More and more users simply
perform searches. Searches work well for textual content but often
turn up false matches and don’t apply easily to pictures and
multimedia. As was demonstrated in our earlier discussion of
Flickr, tagging offers a much more flexible approach—one that
grows along with a library of content.

The most effective tagging systems are those created by lots of
people who want to make it easier for themselves (rather than
others) to find information. This might seem counterintuitive, but
if a large number of people apply their own terms to a few items,
reinforcing classification patterns emerge more rapidly than they
do if a few people try to categorize a large number of items in the
hopes of helping other people find them. For those who want to
extract and build on folksonomies selfish tagging can be
tremendously useful, because people are often willing to share
their knowledge about things in return for an immediate search
benefit to them.

Delicious, which acts as a gigantic bookmark store,
expects its users to create tags for their own searching
convenience. As items prove popular, the number of tags for those
items grows and they become easier to find. It may also be useful
for the content creators to provide an initial set of tags that
operate primarily as seed tags—that is, a way of encouraging
other users to add their own tags.

More Hints for Defining Web
2.0

Tim’s examples illustrate the foundations of Web 2.0, but
that isn’t the end of the conversation. Another way to look
at these concepts is through ameme (pronounced “meem”)
map. A meme map is an
abstract artifact for showing concepts and their relationships.
These maps are, by convention, ambiguous. For example, if two
concepts are connected via a line, you can’t readily
determine what type of relationship exists between them in tightly
defined ontological terms. Figure 3.23,
“Meme map for Web 2.0” depicts the meme map for Web
2.0, as shown on the O’Reilly Radar website.

Figure 3.23. Meme map for Web
2.0

This map shows a lot of concepts and suggests that there are
“aspects” and “patterns” of Web 2.0, but it
doesn’t offer a single definition of Web 2.0. The logic
captured in the meme map is less than absolute, yet it declares
some of the core concepts inherent in Web 2.0. This meme map, along
with the Web 2.0 examples discussed earlier in the chapter, was
part of the conversation that yielded the patterns outlined in
Chapter 7,
Specific Patterns of Web 2.0. Concepts such as
“Trust your users” are primary tenets of the
Participation-Collaboration and
Collaborative Tagging patterns. “Software that gets better
the more people use it” is a key property of the
Collaborative Tagging pattern (a.k.a. folksonomy). “Software
above the level of a single device” is also represented with
the Software as a Service and Mashup patterns.

Reductionism

Figure 3.24,
“Reductionist view of Web 2.0” shows a reductionist
view of Web 2.0. Reductionism holds that
complex things can always be reduced to simpler, more fundamental
things, and that the whole is nothing more than the sum of those
simpler parts. The Web 2.0 meme map, by contrast, is a largely
holistic analysis. Holism,
the opposite of reductionism, says that
the properties of any given system cannot be described as the mere
sum of its parts.

Figure 3.24. Reductionist view of Web
2.0

In a small but important way, this division captures an
essential aspect of the debates that surround Web 2.0 and the next
generation of the Web in general: there is one set of thinkers who
are attempting to explain what’s happening on the Web by
exploring the fundamental precepts, and another set who seek to
explain in terms of the things we’re actually seeing happen
on the Web (online software as a service, self-organizing
communities, Wikipedia, BitTorrent, Salesforce, Amazon Web
Services, etc.). Neither view is complete, of course, though
combining them could help.

In the next part of the book, we’ll delve deeper into
detail, some of it more technical, and try to distill core patterns
that will be applicable in a range of scenarios.

[38] Cloud computing, in which developers
trust their programs to run as services on others’ hardware,
may seem like a return to centralization (“All those programs
run on Amazon S3 and EC2....”). The story is more complicated
than that, however, as cloud computing providers have the
opportunity to give their customers the illusion of centralization
and the easy configuration that comes with it, while supporting a
decentralized infrastructure underneath.

This excerpt is from Web 2.0 Architectures. This fascinating book puts substance behind Web 2.0. Using several high-profile Web 2.0 companies as examples, authors Duane Nickull, Dion Hinchcliffe, and James Governor have distilled the core patterns of Web 2.0 coupled with an abstract model and reference architecture. The result is a base of knowledge that developers, business people, futurists, and entrepreneurs can understand and use as a source of ideas and inspiration.

Recommended for You

Sign up today to receive special discounts, product alerts, and news from O'Reilly.