The Social Life of Books

Visualizing Communities of Interest via Purchase Patterns on the WWW

by Valdis Krebs

One of the cardinal rules of human networks is "Birds of a feather
flock together". Friends of friends become friends, and coworkers
of coworkers become colleagues. Dense clusters of connections emerge
throughout the social space. The usual pattern found throughout
social structures[and many other complex systems] is dense intra-connectivity
within clusters with sparse inter-connectivity between
clusters.

One day, while searching for a book on amazon.com, I started thinking
about Amazon's value-added service -- Customers who bought this
book also bought these books. Amazon lists the top 6 books that
where bought by individuals who also bought the book currently being
browsed. I wondered...

How do these listed books relate?

Are they 'books of a feather'?

Or, are they different -- complementary?

What do these books say about the community buying them?

Who are these people?

What are their goals and interests?

Are these people I should know[obviously our interests overlap]?

Being a student of networks, I knew the inquiry would not stop
at the books listed on this web page. What would happen if I joined
these individual lists into a network?

The key to understanding the dynamics of networks is reading the
emergent patterns of connections that surround an individual,
or that are present, within and around, a community of interest.
I wanted to see the network in which my book of interest was embedded.
Seeing those connections would give me insight into the 'network
neighborhood' surrounding this book and hopefully help me make a
smarter purchase.

I decided to trace the network out one and two steps from the focus
book. This is a common procedure in social network analysis when
studying ego networks -- the immediate relationships of a
chosen individual. An ego network allows us to see who was
in one's network neighborhood, how they are interconnected,
and how this structure may influence ego.

To continue my exploration I had to choose a book as my focal point,
or ego. I chose Tom Petzinger's The New Pioneers. After all,
that book was the reason I had originally visited amazon.com --
before I got sidetracked. As I collected the data, I started wondering
again...

What themes would I see...

in the books?

in their connections?

What other topics are Tom's readers interested in?

Will Tom's book end up in the center of one large, massively
interconnected cluster -- a single community of interest?

Or, will it end up linking together otherwise disconnected clusters
-- diverse communities of interest?

Below is the network surrounding The New Pioneers. Each
node represents a book. A red line links books that were purchased
together. The buying pattern of the books has self-organized into
emergent clusters that I have named for the content of each cluster.
It is obvious that Tom's book does span a diversity of interests!

Next we examine the network measures of each node/book, to see
which nodes are well positioned in the web of connections. The most
common measure in social networks is network centrality. To assess
'positional advantage' we measure each node's network centrality.
We have two parts of the network 1) the Complexity cluster and 2)
the other 3 interconnected clusters forming a large network component.
The highest scoring nodes in the Complexity cluster are Open Boundaries
and Complexity Advantage -- they received identical scores. The
scores in the large network component, in declining order, are as
follows

[tie] Management Challenges in the 21st Century

[tie] Business @ the Speed of Thought

Dance of Change

Innovator's Dilemma

Information Rules

New Rules for the New Economy

Customers.com

The top two books received the highest scores because they are
instrumental in connecting/bridging the three clusters [Internet
Economy, Old School, New School]. Without these bridging connections
there would be more holes in the network such as those
that surround the currently isolated Complexity cluster. Notice
that more connections do not necessarily translate to network
benefits --Information Rules has the most connections but
not the highest network score. In networks it is not the number
of connections one has, but where the connections lead to that
creates advantage. In networks the golden rule is the same as
in Real Estate -- location, location, location. In real estate
it is physical location -- geography. In networks it is virtual
location -- determined by the pattern of connections surrounding
a node.

Another common network measure is structural equivalence.
It reveals which nodes play a similar role in a network. Equivalent
nodes may be substitutable for one another in the network. As
an author, I would not like my book to be substitutable
with many other books! As a reader, I would like equivalent choices.

Another value-added service that Amazon provides are the reader-submitted
book reviews. A person considering the purchase of a particular
book may be aided by the many reviews that accumulate over time.
Unfortunately the reviews can be skewed. An author, with a large
personal network, can quickly get a dozen or more glowing reviews
of his/her latest book posted to amazon.com. Customers who are
comparison shopping based on reader reviews alone may be mislead.

There is a similar phenomena with web pages -- many webmasters
have become quite adept at formatting the content of their web pages
and meta tags so that their web sites hit near the top in many search
engines. The creators of a new search engine, Google, recognized
this trickery. They created algorithms that scored a web page based
on the number of other pages hyperlinked in to it. The links-in
are further adjusted by the popularity of the pages linking in to
the focus page. This severely limits 'alchemy of content' to score
better with search engines. The social network analysis community
has had a measure like Google's for many years. It was developed
to trace the diffusion of innovation in a professional community.

In the Google search engine, if no one else points to your web
page then you get bottom billing, if many popular web pages [those
that have many links pointing to them] point to yours then you get
top billing in the search results. It is easy for the webmaster
to alter content, but not context[the pattern of incoming hyperlinks
to a web site]. It is amazing how well this social network approach
to searching the web works. Google usually lists the most useful
pages right at the top of the returned search results. IBM is developing
a similar search engine -- that looks at hubs and authorities in
the webspace -- under its CLEVER project.

Could these community of interest maps work in a similar capacity
with other consumer items? If I am not familiar with a product,
an author, an artist, a vintage, or a brand, I would like to judge
an item by the company it keeps -- its network neighborhood.

Who points to it?

What communities is it a member of?

Is it central in the community?

Does it bridge communities?

Are their equivalent alternatives?

It appears that as a customer of Amazon I could make smarter
decisions by viewing the embeddedness of various items they sell
in communities of interest -- especially if I did not have much
experience with the items I am considering purchasing.

What are some network rules-of-thumb we can distill from this
analysis?

If you have read one nonfiction book of a structurally equivalent
pair, you may not be in a rush to read the second[the second book
probably covers the same information as the first book]. On the
other hand, you may wish to read all structurally equivalent fiction
titles[can't get enough of those cyber-thrillers].

If you liked books A, B, and C and want to read something
similar, find which books are linked to A AND B AND C. You can
only see this in the network, you cannot see this in Amazon's
individual lists unless you open three browser windows and compare
the lists yourself.

If you want to read just one book about topic X, find the
book with the highest network centrality in the cluster of topic
X books. This follows the Google philosophy and may reveal a
book with excellent 'word of mouth'.

If the book you are looking for is not in stock, find which
books are structurally equivalent to the book you were searching
for. These will provide similar content and are available now.

An irony in Amazon's drive to sell more books to its existing customers
through value-added information is that these services could provide
an opportunity to the businesses that Amazon competes against. All
those local booksellers that have been going out of business from
the onslaught of mega-retailers such as Borders, Barnes & Noble,
and Amazon can now 'mine the data' on the amazon.com and bn.com
web sites to create smarter book orders for their own clientele.
Rather than compete on discounting bestsellers -- a game they cannot
win -- local booksellers could show their customers other purchase
options using the book networks. For instance, they could recommend
Petzinger's book to those customers that have interests in business,
and the internet, and complexity science. It is one of the few books
that link to all three communities of interest. With this type of
data analysis local booksellers may again thrive in their niche.
In a balanced ecosystem the larger species[i.e. Amazon, Barnes &
Noble, Borders, etc.] help form a niche for the smaller species
[i.e. the local booksellers] and they all co-evolve.

A book author and/or publicist could use the knowledge of existing
book networks to position a book where there is a hole
in the network. A publisher could view evolving book networks
-- they may change weekly -- to adapt its marketing efforts. Amazon,
of course, is still the big winner -- they have the data, and
a rich upside of untapped possibilities of how to analyze the
data and apply the findings.