Monday, 23 April 2012

This is the
final post in the series exploring the databases containing Shakespearean texts.
From Stoppard I have learned that “there is an art in delay.” In this
series of posts dealing with digital databases of Shakespearean texts I have constantly
postponed revealing the collection of these databases. I have done this through
introducing the topic and then for four posts I posted a list of criteria that
I think helps to assess digital databases. Originally I thought it would be
enough to post the sixteen questions I found relevant in meditating about
databases, but then realized that these criteria formulated as questions
without explanation would be less beneficial, so I pasted a paragraph-long
explanation to each of the questions. Last week having finished the posting of
these questions, I had to admit that the delay is not righteous any longer. So
this time, I should present the list of databases on the one hand.

On the
other hand this post is not just a post directing attention to databases that
might come in handy
when doing some research on Shakespeare, but also a contribution to another
project, i.e. the celebration of Shakespeare’s 448th birthday. The Happy
Birthday Shakespeare website can be found here. This is not the
first time that a blog post functions as a gift to the long dead and still
living Bard. Last year I wrote up a post
in the same project about the given theme: “How did Shakespeare shape my life,
my intellectual life?” That said it may be clear that this year if I intend to
take part in this festive event again, I cannot retell the same story. Of
course, hermeneutics would remind me that a year later—having changed (hopefully
for the best)—the same story would not, could not be the same, yet I think this
year I should do something else. So this year, as I guess Shakespeare would be
interested in what happened to his texts, I present him and anybody else interested
in this, the list of databases that contain Shakespeare’s texts.

So this
time, both as a gift and a conclusion to my previous posts I am going to lists
databases, not unexpectedly in an indirect way, making the experience
interactive. There is a simple way for whoever is interested in this list, as
following the link to my Delicious stack, “Databases of Shakespearean texts”
one may well go to the list directly, and check out the items immediately
without reading the rest of this post. Those, however, who would like to stay
here for longer, I shall give some explanation on how these otherwise different
types of databases can be classified as databases. I am quite sure that a lot
of databases have been left out, but as I promised it in the
introductory post, I have only dealt with databases that have some either
institutional basis, or scholarly references or both.

There are seven
ways the individual databases can be classified. Some of the databases can be
downloaded, or at least the text analysis software, such as WordHoard or WordCruncher,
the rest of the databases can be used via a web browser. Most of the databases
are dedicated to Shakespeare studies, while two of them are rather text
analysis tools demonstrating their power on the Shakespearean corpus, i.e.
WordCruncher and Wolfram|Alpha. Most of the databases are Open Access but some
are massively behind the pay-wall, such as Gale
Catalog: The Shakespeare Collection, XMAS, and one project though not behind
the pay-wall yet it needs a password which may or may not be granted is The
Shakespeare Electronic Archive. Most of the databases are dedicated to
Shakespeare, while there are two that include texts by Shakespeare and many
others as well: Project Gutenberg, The Internet Archive. Most of the databases
include a text analysis tool, but there are a few that only contain digital
texts, such as The Project Gutenberg, the Internet Archive, the Shakespeare
Quartos Archive, the Shakespeare in Quarto, etc. Some of the databases deploy
either an unreliable corpus or a somewhat questionable one from a strictly
philological point of view, while some others use either the digital versions
of reliable early prints (Shakespeare Quartos, Shakespeare in Quarto), or even
modern critical editions (Internet Shakespeare Editions, The Shakespeare
Electronic Archive). Most of the databases are device independent, while there
is at least one that has been built only for the iPad: Shakespeare's The Tempest for iPad.

The lines of this classification
create a rather complicated matrix upon which the individual databases can be
located. This complexity is both an advantage and a disadvantage. It is an
advantage as it demonstrates the interest in Shakespeare in the digital space,
that scholars use digital technology in studying and thus representing the
Bard’s texts in the 21st century in a great number of ways and modes.
But this variety also demonstrates that enthusiasm towards digital scholarship
is also dispersed, funds are scattered instead of uniting forces and resources
to create a database that would be equally useful and beneficial for a variety
of scholarly approaches, number of levels of interest from the scholarly to the
general. Do you like this, Will? Anyway,
I wish you a happy birthday in the heavenly theatre with this multifocal
symphony of textual databases.

PS. The
advantage of checking my Delicious stack is that it may well be improved in the
long run. I can imagine, however, that somebody would like to see the list here
as well, so here it is:

Thursday, 12 April 2012

This post
is number five in the series of posts dealing with working out a possible methodology for assessing and accounting for
databases containing Shakespearean texts. After an introductory post four other
ones have been dedicated to listing and explaining, contextualizing questions
that might come in handy when pondering about these databases. So far areas of
basic facts, transparency and flexibility were covered in the first three
posts, and now, as I have promised I am going to meditate and present questions
pertaining to what I would like to term as “interdisciplinary openness.”

Most of the
databases reduce texts to their linguistic aspect. Queries focus on words,
strings of words, linguistic units, grammatical units and verbal statistics.
They can also visualize tendencies, create diagrams in a variety of formats
about the linguistic construction of the text. All this is fine, as most of the
time when reading a Shakespearean play the reader will be interested in the
ways a text communicates its layers of meaning through verbal means. There has
been, however, a tendency in scholarly circles claiming in a great number of
ways that a text does not only reveal layers of meaning via its linguistic
construction but that meaning is also a social construct embedded in the
material ways a text functions in the world. So, scholars claim that bibliographical data
from the date of publication to publisher, from the typeset to the type of
paper, from decoration to page size play their part in the process of
constituting meaning. Here, a long list of authors, theoretical and pragmatic
may be presented from David Scott Kastan to John N. King, from Woudhuysen to
McGann, from Shillingsburg to Hayles, from Marshall McLuhan to Andrew Murphy to
mention a few authorities in the field. It is beneficial if a database allows
for research other than ones pertaining to the linguistic aspect. The next
three questions, thus, explore ways in which a database may cater for interests
in aspects other than the linguistic one.

Format of the digital text (txt, xml, jpg, tiff etc.)

Interdisciplinary
research presupposes the complexity of possible questions to be asked, and this
complexity can only be provided through presenting the texts in a variety of
formats. Sometimes the best choice is to have a rather unmarked list of
words, e.g. in a txt file, this is sufficient and even more fruitful for some
queries, especially when it is not clear how the file is read by a text
analysis tool. For another set of questions encoding is needed, say for
tokenised or lemmatised queries, other times it is the best if there are images
only that may be analyzed in ways unimaginable before. It is the format of the
file that enables these differing approaches, so it is fine if the same text is
accessible in a variety of formats.

Is it the linguistic, digital or bibliographic aspect that is emphasized?

The
linguistic aspect refers to the language, linguistic elements of the digital
text. The bibliographical aspect refers to the material aspect, but in this very
case, this does not define the digital text, as digital, but as an outcome of
the visual aspect of some original printed material. The digital aspect refers
to the computational coding of a text that enables the visual aspect and also
the searchable quality of these texts. It is clear that builders of databases have to
decide on what they intend to achieve. Unfortunately there is no such database
that would/could lay equal emphasis on every aspect of a digital text.
Databases vary among paying special attention to the text as a linguistic unit,
or to the text as a deeply encoded entity that allows for complex and
intelligent queries, or to aspects that are relevant for the historian of the
book.

Which aspect of the text is open to queries?

If it is
possible to present the text in a variety of formats, thus a variety of disciplinary
approaches may be occasioned within the database. If this is so, it is also
relevant which aspect of the text is open to queries, as it is a query that
makes computer enabled research fruitful. It is the query that makes research
faster and more accurate, so it is great if the image file is there that
enables research related to the history of the book, but if this aspect of the
text is not open to queries, computation is like a disabled giant: it is there
but the scholar cannot make use of the power of computer technology. The Text
Encoding Initiative enables marking up a text for queries about the visual
aspect of a work, and there are even free image mark-up tools, so
technologically it is not impossible to prepare a database in which the
bibliographical code is open to queries.

* * *

This time,
thus, we have seen the remaining three criteria for assessing a database. These
questions covered practically an area that I have labeled as “interdisciplinary
openness.” The interdisciplinarity of a database manifests itself in the
variety of formats of the files, the types of queries that a user may conduct.
Naturally, these criteria may or may not be true for each and every database and
can only be used as a means of orientation. So neither these three criteria nor
the other thirteen should be thought of as complete and compelling ones, but
rather as means to be able to discuss critically a database or databases. What
follows form this is that a positive assessment does not necessarily mean that one
can give the highest possible scores for each and every criterion, as it can
easily happen that a database can fruitfully be used even though reviewing it
with the help of the above sixteen criteria should suggest that the database is
less good. Assessment at its best relies on criteria relevant to the individual
database. Having thus finished the meditation about the criteria of assessment,
next time I shall start a new series of posts exploring databases one by one.

Monday, 2 April 2012

This is the
last but one post in the series “Digital Shakespeares: Features of a Database.” The previous posts presented and explained the
first eight questions of the list that I used when assessing databases containing
Shakespeare’s texts.
The first eight questions explored some basic facts and the documentation of
the database. This time the focus will be on another aspect that I label as
flexibility. This is an important aspect, as it makes a database more usable
if it can be bent to the researchers’ expectations and interests. Before this
larger area of questions there are two extra ones that pertain to the ease of
the usage of a database.

9. Is the interface clear and logical?

The
question about whether the interface is clear and logical does not invite an
answer in a form of a subjective aesthetic judgement, but rather reflection about
the pragmatic aspect of the interface. What I am interested in here is whether
one could without much thinking and many mistaken steps navigate from one action to another
with relative ease. Nevertheless, I am aware that this feature of a database is
a rather subjective one, as something that seems illogical and complicated for
one user may well be straightforward and simple for another. Yet hopefully the
response to this question will not reflect on the interface in isolation, but will
keep an eye on other databases and even other applications, and then
subjectivity can be avoided via experience and comparison.

Is it possible to create a researchers room?

A
researcher “room” is a handy opportunity if the database is an online one. It seems
handy if one can stop working whenever it is necessary without losing the
findings of the then current research, and can continue working when it is
possible again. This feature is also important as this may be the cyber-spatial
“room” where one may share the results with colleagues and may expect some
reaction from them to her/his work. A researcher “room” can be a place that
anybody can, may customize to her/his expectations, work-method and needs, can
leave notes and reflections on where one is in the process of research.

Flexibility

The
theoretical problem that is addressed by the following questions seems to be the
following. A database most of the time is built for one type of research, which
is no problem as how can one foresee what other researchers would like to do
with a particular database. One may well argue that the virtue of a database is
that it does what it promises in the best way, and I agree with this argument.
An equally powerful claim could be, however, that if a database is tuned for
only one type of research, naturally the one that best suits the builder, then
why and how could it be used by other researchers with either slightly, or
completely different purposes? So in this Kantian or Pyrrhonian situation,
where there are two equally powerful claims in opposition, I would like to vote
for some sort of a flexibility providing more opportunities than the ones
envisioned by the builders. I can imagine that a database that can be adapted
to a variety of purposes will be the one that will attract researchers’
attention.

Can the digital text be downloaded?

Sometimes
it seems beneficial to be able to download the text that one works with. This
adds to the usability of a database, as it can easily happen that the analytical
tools of a database do not harmonize completely with the needs of a researcher.
It is then beneficial if the text, or texts can be downloaded and fed into
another search engine. This may well be the case with absolutely cleansed texts
to be used with independent text analysis tools, or with deeply marked-up
texts, when the mark-up is deeper than what the facilities of the database allow
to explore. In this latter case it is also possible that queries tuned for
specific aspects can be executed elsewhere than within the database.

Can the results of the query be saved, downloaded?

It may well
be fruitful if the findings can be saved and downloaded to be deployed
elsewhere than within the application. This may be appropriate if results in
one database are to be compared with the findings in another one, or if to be
arranged in another way than what is occasioned by an application. A third
scenario when saving, downloading is fruitful may be when one intends to
insert, or copy-paste the results of the query into an article, paper,
blogpost. (Only between round brackets do I dare to insert here, that as a
Zotero fan, it would be nice if a database could be linked to Zotero, and then
referencing would be a matter of clicking here and there. I am aware that this
is only the lazy researcher’s dream…)

Is the source-code open, i.e. can the search tools be modified?

This attribute
is something that is both beneficial and nice. It is beneficial because the
tools may be tuned for the analysis of texts from another database without
starting the building of the search-tool from nothing. Naturally it can happen
that it is easier to start from nothing, but it can happen as well that coding
means just fine-tuning. The open-source code is nice too, as it tells the user
that the builder trusts his/her users, shares with them everything, admits that
the application can be developed, used elsewhere and in other ways than first
envisioned.

To sum up, this
time I pondered about the features of a database that I labelled “flexibility.”
Flexibility of a database lies in whether a researcher can or cannot adapt the
texts included in the database, the analytical tools to her / his needs.
Flexibility is not only important because the database then will be one that may
serve a variety of purposes but also because this way it will attract more
users. Having, thus, accounted for this feature of a database what remain for
the next post are the attributes that I classify as “interdisciplinary openness.”