Comparison

The third part of this assignment is to compare the Amazon intellectual
web with the World of Knowledge intellectual web in terms of four variables:
structure, cyclicality, content, and denseness.

Given that I have generated and saved many Amazon-derived intellectual
webs, I will draw upon several in this discussion. One cannot generalize
about the nature of all Amazon webs from a particular instance, and using
several lets us see some patterns that we might otherwise miss. In a similar
vein, one cannot generalize about the nature of intellectual webs derived
from Web of Knowledge based on a single instance. In both cases, though,
thinking about the nature of the collections may offer insights that could
be confirmed by much more extensive experimentation.

First, though, we have to decide what these terms mean. If we are to
use them to measure our webs, they must be operationalized. We'll base
this on dictionary.com definitions:

Structure

"The way in which parts are arranged or put together to form
a whole; makeup: triangular in structure."

The structure of an intellectual web, then, is the physical shape of
the resulting network of entities and relationships. This is straight
out of topological studies in network theory. Some possibilities:

Linear: all but two objects had one incoming and one outgoing link,
and the end entities were only connected via a single link

Circular: a linear structure where all entities had one incoming and
one outgoing connection

Tree topology: a structure where all but terminal nodes each had one
incoming link and two or more outgoing links. With two outgoing links
it is a special case of a binary tree.

Star: a single entity is in the middle with relationship with every
other entity; all the rest have a relationship only with the central
node.

Fully connected network: every entity had a relationship with every
other entity

Hybrid topology: a network consisting of a combination of other types
of topologies

In these terms, my intellectual web from Web of Knowledge has a hybrid
topology with mostly linear characteristics; seven of the ten nodes have
either a one or two links, which are the defining characteristic of linearity.
Only nodes #2, #5, and #9 have more than two nodes. Borgman's article
(#2) is the center of a little mini-star topological network.

The Amazon intellectual webs I have created, in contrast, are generally
hybrid topologies with tendencies towards being fully connected. Consider
the "high school classics" web:

CD and MD are outliers; MP and MF are in linear strands; all the rest
are linked to three or more other nodes.

These patterns are dependent on the particular search, rather than reflecting
any structure innate in either Amazon.com or Web of Knowledge. While following
citations, one could easily pick a set to create any of the topologies
mentioned above. From Borgman's opus with 220 citations, for instance,
selection of ten papers published, say, between six and seven months prior
to her paper would likely produce a star topology. Conversely, picking
papers far apart in time on as disparate subjects as possible would likely
result in a very linear topology. Following a "normal" research
pattern where one follows citations to closely related articles, one would
expect something a hybrid topology with a fair number of interconnections
on each node.

Amazon.com webs might tend towards a slightly more natural grouping,
as the books on a "similar to" list are all, by definition,
similar to the original. We have seen clumping around authors and multi-volume
sets, for instance. However, just as with the Web of Knowledge, it is
possible to create webs that travel far and wide with few links joining
the various nodes.

Cyclicality

"Of, relating to, or characterized by cycles: a cyclic pattern
of weather changes.
Recurring or moving in cycles: cyclical history."

In general, the sample space of ten entities specified for each of the
two intellectual web is too small to allow development of cyclical patterns.
One interesting Amazon web, the "China
web", did have a bar-bell pattern where two groups of tightly
related books were joined together through a single link. This displays
the only element of cyclicality I discerned in a web of a single type
of media. In personal webs that span media (starting with a musician's
biography, for instance, and jumping into recordings) a similar barbell
pattern was noticed several times. With larger sample sizes, one could
envision this pattern recurring more frequently.

Looking at the larger picture, both the Amazon database and the citation
database would tend to exhibit cyclicality over various scales. Moving
from topic to topic or academic discipline to academic discipline, one
envisions groups of books or citations joined together. Again, though,
there are so many differences - differences in citing behavior among different
academic disciplines, for instance - it is hard to make any generally
valid observation other than "things will vary". White, in "Authors
as citers over time," contains an interesting observation: "Citing
styles in identities differ: "scientific-paper style" authors
recite heavily, adding to core; "bibliographic-essay style"
authors are heavy on unicitations, adding to scatter; "literature-review
style" authors do both at once." (2000) The author's style,
then, will also effect the patterns and cyclicality in a given Web of
Knowledge web.

Content

"Something contained, as in a receptacle. Often used in the
plural: the contents of my desk drawer; the contents of an aerosol can."

Assuming that the webs are the receptacles, then the entities in the
webs are very similar: in a WOK web, they are lists of citations joining
academic papers, in an ASE web, they are lists of links from one book
to another. The content will be the same regardless of the subject matter
of the individual searches, although subject matter does effect structure
and density, as we have seen and will see.

Denseness

"Crowded closely together; compact"

The dictionary definition makes no sense in the context of an intellectual
web, nor does the physics definition of mass divided by volume. However,
if we consider link density to be the number of actual links
in a network divided by the potential number of links, we find a measure
of an intellectual web that could provide some sense of "linkedness"
of the collection.

As described early, our "open citation linking" WOK web is
very linear, resulting in a low density. With 11 links present and 90
possible, it has a density of roughly .12 or 12% of all possible connections.
On the other hand. the "high school literature" Amazon web has
25 links, or density of around .27, more than double that of my WOK web.

My experience playing with the Amazon Similarity Explorer is that webs
are generally more dense in this environment that in Web of Knowledge.
Again, one can create any topology (and thus, any denseness) one desires;
but a typical collection of books from Amazon.com, chosen through the
"Similar to", are going to be more dense. Intuitively, this
makes sense; in WOK, the information organization is one way - one paper
cites another; that cited paper, by definition, cannot link back. On Amazon.com,
the opposite is true: it is extremely likely that links will be mutual,
going in both directions, for book deemed "similar" by the Amazon
web creators. Now, we saw earlier a case where it was user performance,
not true similarity, that linked "HTML Help Authoring" and "Stupid
White Men" and, in fact, the latter does not link to the former.
So again, generalization is pretty difficult.

Other comparisons

A few other metrics might be considered: user friendliness, reliability,
and completeness come to mind as interesting variables.

Neither ASE nor WOK are particularly user friendly. Of course, ASE is
a hack, not a commercial product, but it benefits from Amazon's (usually)
intelligent handling of queries. It is very tolerant of error and returns
something regardless of the input. On the other hand, I laughed as I read
Atkin's paper, describing how much improved WOK was over earlier versions.
I would not have liked to try to use the earlier tools; the current one
is still cryptic and intolerant at times.

In terms of reliability, experiences with both ASE and WOK leave questions
in my mind. Two issues are open.

First, they both have returned different answers to identical queries
submitted about a week apart in time. Given that the underlying databases
are constantly in a state of development, this may be inevitable; it is
still unsettling not to get the same answer back when you run the same
query after a relatively small amount of time has elapsed. The reference
system for books is more stable than that used for citing references.
WOK uses a proprietary tagging system to identify articles, while Amazon
uses the standard ISBN. Going forward this will help Amazon and hinder
WOK.

The second issue of reliability is an Amazon one exclusively; sometimes
it returns values that just make no sense, and often the return values
change over time. Recently, for instance, a keyword search on "education"
was consistently returning "Harry Potter and the Order of the Phoenix
(Book 5)" with no other books associated with it. During the same
session I did a search on "music education" that returned a
string of guitar lesson manuals. One day later, the search for "education"
returned the result list from the prior day's "music education"
search. I'm scratching my head wondering if it is a bug in my code or
if Amazon is tailoring its AWS responses to previous queries! I suspect
it is Amazon reacting to my searches: all AWS transactions include a unique
identifier so it knows when a query is coming from ASE.

Completeness is an issue in any reference system. My experience with
WOK points out that a citation database is only useful if it indexes the
journals containing research of interest to you. Amazon has a much broader
percentage of works in print than WOK has of citations that have been
made. WOK, on the other hand, could be a useful tool when used as an adjunct
to other sources. It is clearly insufficient to use as a primary research
tool, at least in the field of digital libraries.

Finally, WOK has a certain intellectual deceit involving its core business.
Every search display shows two fields: "Cited References" and
"Times Cited". A casual user will view them as both equally
credible and inclusive, but they are not. "Cited References"
is in fact a complete list of references originating in the current paper.
However, as we have seen, "Times Cited" only includes the citations
in the journals included in WOK database. Looking through the "Cited
References" for the articles in my personal web, I'd estimate that
more than half of the links were not live; thus, the references out from
the articles were to others not included in WOK. From that I conclude
that "Times Cited" has missed at least half of the follow-up
citations.

Conclusions

In many ways this project was a flop:

I wanted to learn some nitty-gritty web service protocols; instead,
I found a very nice API that let me interface cleanly and easily to
Amazon.com's database.

I wanted to learn more about OpCit and the open citation movement;
instead, I found a commercial citation database that doesn't include
the literature on open citation movement.

I wanted to build a cool tool to explore Amazon.com; instead, the
"similar to" relationships returned from Amazon were mostly
boring and repetitive.

I was hoping that WOK would be a new reference tool that I could use
heavily in the future. Instead, I ran into a hard to use tool that does
not include the literature in which I'm interested.

But it had its good moments too:

Writing ASE was fun; I hope you enjoy playing with it a little.

I learned about some very active researchers I hadn't encountered
before

It provided a nice context for thought as I was reading the information
retrieval user interface assignments.