Share this Page

Noesis: Is it a library with built-in searching or a search engine with a built-in library?

By Peter Suber

03/04/02

Every discipline has
a rapidly growing body of literature on the Web. Many hard-working volunteers in
every field have built Web directories of this literature. Some have even built
discipline-specific search engines. As the scholarly content on the Web grows,
life gets more and more difficult for these directory and search engine editors.
Think about the problems they face. They must try to cover the field, or their
own topic within the field, comprehensively. They must distinguish worthy
literature from unworthy. They must discover new sites within a reasonable time
and add them if they are worthy. They must fix or delete dead links. The
directory editors must organize their contents to help users navigate. If they
can, they should offer searching, not only of the links and their annotations,
but of the full-text files to which they point. Finally, they must use methods
that scale up as the relevant body of literature continues to grow. Methods that
worked five years ago when the Web was small no longer work today.

N'esis (n'esis.evansville.edu) is an
online library and search engine for the field of philosophy that solves these
problems. Moreover, the software enabling it to solve them is transferable to
any other discipline.

I’m one of the two co-editors of N'esis.
My partner, Tony Beavers, deserves the credit for envisioning and implementing
the features of this powerful software. In what follows, I can make immodest
claims for N'esis because I’m praising Tony.

N'esis Today

N'esis has a board of topic editors, each
with a different specialization within the field. The topic editors are
responsible for monitoring their corners of the field for old, new, and worthy
content. The N'esis software gives them a Web form for adding sites, which is
much easier than writing HTML code or sending e-mail to another human editor who
then writes HTML code. (N'esis also gathers new content by inviting user
submissions, which are evaluated by the editors.) Topic editors may organize
their topic area according to the sub-topics of their choice. Users can browse
or search the entire N'esis collection or any sub-collection produced by an
individual editor. By dividing the labor among the editors, an entire discipline
can be covered comprehensively and kept up-to-date. If one editor has too large
a topic to cover adequately, then we only have to divide the topic and add
another editor.

Gateway Selection Filters

N'esis uses several kinds of peer review to identify and recommend
worthy sites. The first is at the gateway, when editors use their professional
judgment to decide what deserves to be included. In addition to the criteria
invoked in the gateway decisions, N'esis currently requires (with a few
exceptions) that the texts be written by Ph.D.s. As we’ll soon see, N'esis
supports other, higher kinds of quality control that sort out the better from
the worse among the texts that make it into the collection.

Adjustable-Scope
Searching

Searching is the glory of N'esis. Because
N'esis stores all its texts in a database, it can index them for searching much
more quickly than a traditional search engine can crawl a series of Web sites.
For the same reason, it can fine-tune the construction of the index. Traditional
searchable collections only support all-or-nothing searching: if a file contains
the search string, then a link to the file appears on the hit list, and
otherwise not.

But N'esis is an adjustable-scope search
engine. Users can search the whole collection, any sub-collection created by a
topic editor, the collection of works by a given author, the collection of works
from a given journal or set of journals, or the custom collection created by the
user. N'esis also classifies its texts by genre (essays, reviews, course
syllabi, and so on) and lets users filter any search by genre. Finally, editors
only need to collect links to desirable texts; N'esis will automatically provide
fulltext searching of those texts.

Adjustable-scope searching allows users
to add another layer of peer review to their research. If you trust the peer
review judgments made by the editors of journals A, B, and C, then you can set
the scope of N'esis to search just those journals.

When updating its search index,N'esis
automatically purges dead links. The next version of the software will put dead
links in a special offline graveyard for post-mortem analysis. Most of the time,
dead links mean that content has been moved, not deleted. With a little effort,
the new location can be found and the link revived.

N'esis Tomorrow

The version of N'esis
now online is 2.0. N'esis 3.0 will have two key features that we’ve
already proved to work, so it’s not premature to sketch here how
they could enhance research.

I said that in 2.0, users could create a
custom collection to help organize and search a subset of the master collection.
A custom collection could contain texts relevant to a course, a dissertation, or
an essay.

In N'esis 3.0, user control over custom
collections is set free to flourish. The first key feature in 3.0 is that users
can create as many custom collections as they want. That might mean one for each
course, each essay, each research interest. By default, all N'esis collections
are public, so the collections you make for your courses can be used by your
students. Each collection has a unique URL, making it easy to tell your students
where to look.

At first only N'esis-approved editors
will have the authority to add new items to the master collection—i.e., to make
the gateway decisions about relevance and worth. Other N'esis users will only be
able to make custom collections from the items in the master collection.

But eventually all users will be able to
make N'esis collections from any content anywhere on the Web.We can give up the
gateway control because N'esis will contain other, more effective forms of peer
review and quality control.

Collection Building

The second key feature in N'esis 3.0 is that
users can "adopt" collections built by other users. If you build a collection on
Plato, and another scholar builds one on Aristotle, then I could start a
collection on Greek philosophy by adopting both of these pre-existing
collections (see Figure 1). You retain control over your Plato collection and
update it whenever and however you like. When you do, my collection subsuming it
is automatically updated.

This allows a team of scholars to divide
the labor of covering a large subject like Greek philosophy. The final
collection on Greek philosophy can be searched as a whole by users who don’t
know and don’t care how it is constituted. The sub-collection on Plato is a bona
fide, separately searchable N'esis collection that might in turn have adopted
smaller sub-collections.

The result is that editors of large
collections can make their collections comprehensive and up-to-date without
monitoring the whole field themselves, and can make their collections
authoritative without being experts in every sub-topic. If N'esis 2.0 is about
searching, N'esis 3.0 is about modularity and cooperation in making collections
worth searching.

If the editors of an online journal took
control over the N'esis collection of its articles, then they could decide its
internal structure—e.g., sub-collections for research articles, review articles,
letters, sub-collections by year, and so on. They could also put the N'esis
search box for their collection on their journal’s Web page. Other N'esis users
could adopt their collection whenbuilding larger disciplinary collections.

Expert Quality Controls

N'esis users can become editors or peers for
the purpose of peer review. When they build a custom
collection, they are endorsing the texts they choose to include.
The result can be an online journal, encyclopedia, "virtual reference shelf " for
a course, or full-text bibliography for an evolving
essay.

What’s important to other N'esis users is
not just that you’ve built a collection on a certain topic, but that you’ve done
so with certain standards. If your collections are miscellaneous or heedless of
quality, others will tend not to adopt them. If I trust your judgment about
Descartes or Kant, then I might adopt your Descartes collection into my larger
collection on epistemology. If I decide later that someone else has a better
Descartes collection, I can adopt it too or I can remove yours and add the new
one. Adopting one collection into another (see Figure 1) uses the same drag and
drop interface as adding URLs to a collection, shown in Figure 2.

N'esis 3.0 will start with a digital
library of philosophy, inherited from N'esis 2.
0 and supplemented by the index
of Hippias (hippias.evansville.edu) and my Guide to Philosophy on the Internet
(www.earlham.edu/~peters/philinks.htm). Tony and I will nurture N'esis libraries
in a few other disciplines, such as ancient history, religion, and law, and
encourage volunteers to use the software to build collections in any discipline.
N'esis collections will be adoptable into other collections regardless of where
they reside on the Internet.

The most scalable way to build large,
long-term, up-to-date, authoritative digital libraries that cover entire
disciplines is to let individual experts build individual N'esis collections on
the topics of their expertise. Other users can yoke these together in any
combination. From this natural N'esis activity will emerge strong collections on
Greek philosophy and epistemology, for example. These can become components for
larger collections that cover all of philosophy. Researchers on a
multidisciplinary topic, such as racism, could build the collections they need,
for example, by dragging together collections on economics, sociology, politics,
and law.

Emergent Peer Review

Finally, N'esis can harness user activity to
create what we call emergent quality control or emergent peer review. If a
certain article has been adopted by three collections, then it has three
endorsements. This is a start, but only a start, because it only counts votes
without weighing them. If one of the collections adopting the article has itself
been adopted by other collections, then the collection editor is not just an
endorser but an endorsed endorser. If the author of works in the N'esis master
collection is also the editor of one or more custom collections, then
endorsements of the author’s works can increase the weight of the author’s
endorsements.

In both of these ways, and in many
others, we can start to weigh votes and find the works most endorsed by the most
endorsed endorsers. There’s no reason not to put some of these parameters under
user control and allow them to turn a "quality knob" in order to get fewer hits
of higher quality or more hits of mixed quality. Once the structure is in place,
"quality zooming" will be at every researcher’s fingertips, including novice
researchers who need it most.

N'esis can use this information to create
special collections on its own, say, the cream of the crop on Plato, as
determined by collective user activity but not by any individual user. It can
also use it to sort search hits. When you search for Plato, you can sort hits by
relevance, date, or by N'esis-determined quality.

N'esis collections not only carry several
kinds of built-in quality controls; they also solve the problem of information
overload. When users search N'esis collections on relevant topics rather than
turn to general search engines, they will find only relevant hits and no false
positives. Searching a Greek philosophy collection for "Plato" will return hits
about the philosopher and none about the software or the town in Illinois.

Tony and I are committed to making N'esis
available to ordinary academic users (collection builders and collection
searchers) free of charge. If we ever charge for it, we’ll charge users who want
advanced features or businesses that want a commercial version to manage their
proprietary information. The first purpose of revenue will be to subsidize the
free N'esis services.

N'esis and the Free Online Scholarship
Movement

Like most search engines, N'esis can only
link to texts that are freely available on the Web. It can’t see texts behind
passwords accessible only to paying subscribers of a journal or database. Like
most search engines, we don’t see this as a limitation. However, our rationale
for not seeing this as a limitation differs from that of most other search
engines. We don’t aspire to comprehensive coverage of
the general Web. We aspire to be one of the premier
tools for organizing and searching free online content, especially academic content. We aspire to be such a
useful tool that content now in print or now online
behind passwords has one more reason to move into the free
online sector where it can be picked up, organized, and made visible
by N'esis.

There is a growing movement to publish
scientific and scholarly literature, especially journal articles and preprints,
on the Internet and to make them available to readers free of charge. The
movement is fueled in part by the exorbitant and rapidly rising costs of print
journals, partly by the unprecedented opportunity for virtually cost-free
worldwide dissemination afforded by the Internet, and partly by the venerable
tradition in which scientists and scholars write journal articles and preprints
without expectation of payment.

Progress occurs in this movement whenever
a journal makes its contents freely available online or a university creates a
free online archive for the research articles by its faculty. Making these
collections free and online solves part of the problem. The rest of the problem
is to find what you need in these separate collections. Each is separately
searchable, of course. But researchers shouldn’t have to run separate searches
at separate archives, let alone learn which ones are likely to contain
literature relevant to their research interests.

Standards and Cross-Archive Searching

One evolving strategy is a cross-archive
search engine. The name explains the technology. As long as the separate
archives conform to a basic standard—in this case, a metadata standard from the
Open Archives Initiative (OAI)—then these special search engines can search all
cooperating archives as if they were a single, grand archive. You needn’t know
which archives exist, where they are, or what they contain. As new ones come
online, they are incorporated seamlessly, scaling up and supporting the division
of labor in maintaining archives with different topical or regional
specializations.

N'esis already d'es this—with N'esis
collections. When a future version of N'esis can read OAI-compliant archives as
if they were N'esis collections, and when OAI-compliant search engines can read
N'esis collections, then N'esis will be a powerful force for accelerating the
free online scholarship movement.

N'esis wants to be a simple but highly
flexible tool for building, maintaining, and searching large collections of
texts. But it wants to serve this function above all for free texts, to support
free online archives and attract new content to them. A print journal may have
many reasons for not migrating to the Web. But if one reason is that it will not
be well indexed or visible to scholars in the field, N'esis will answer that
worry.

If a group of experts wants to create a
new, peer-reviewed journal on the Internet, then making it a N'esis collection
will be by far the fastest and easiest way to do so, and the most convenient for
readers and researchers. In both these ways, we want N'esis to create incentives
to enlarge the body of online scholarship available to readers without charge.