You can't build a car with just one wheel (why duplication may not be such a bad thing), and some limitations of Internet search/retrieval

In this article I survey different approaches to the indexing of time based media (sound and video recordings) in response to two articles published in December 2010. Issues of overlap and duplication are discussed as a positive boon. They enable real comparison to be made and different styles may suit different categories of user. The lack of synoptic overview of different applications approaching the same topic is noted. Suggestions are made as to why search engines are not picking up on this. This failure leaves a continuing role for human agents, such as subject specialists and librarians, to see connections and make complete comparison lists of what is available for end users.

There’s an old joke among commuters that you spend a long time waiting for a bus then two come along together [1]. So it is with discussions about the qualitative indexing of time–based media (sound and video recordings): two articles have been published on the same topic in late 2010.

The articles are:

1. Erin Jessee, Stacey Zembrzycki, and Steven High, 2010. “Stories Matter: Conceptual challenges in the development of oral history database building software” [29 paragraphs]. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, volume 12, number 1, Article 1, at http://nbn-resolving.de/urn:nbn:de:0114-fqs110119.
NB on the Web site this is dated January 2011 but it was made available late 2010, and announced in the ‘[FQS] Newsletter November 2010’, late in November 2010. The FQS Web site section for the article “how to cite this item” gives the date as 2010.

This article was sparked by the coincidence, and it was first drafted in December 2010. Reading them both has provoked two quite separate conclusions which I will discuss in turn. These are, first that concerns about reinventing the wheel can be overstated. There are good reasons for welcoming a multiplicity of approaches to the same problem set (even in a time of cuts and financial restraint). Second, is a reflection that indexing by search engines, the use of tagging, semantic ontologies, etc. does not seem to have overcome ghetto–isation or silo–isation. People from different disciplinary backgrounds have arrived at the same problem set but are discussing it in different ways with different vocabularies, and proceed, perhaps without knowing, certainly without acknowledging the alternatives. Web searches do not succeed in making the connection.

Reinventing the wheel: Competition and duplication/functions or tasks

For almost as long as humanities and social science computing has existed there have been debates about whether we would be best served by developing our own tools to answer subject specific problems or adapt exiting tools and use them creatively in ways not intended by their designers to support our work. The two articles here illustrate the two poles. Jessee, et al. (2010) describe the history of the development of a piece of software called ‘Stories Matter’. They are oral history researchers. They collaborated with software designers to help build and test the software so that it ‘worked for them’. An uncharitable reviewer could accuse them of reinventing the wheel (again). To do so would miss the importance of having the humanities researchers leading the process and also risks missing the scale of the deterrent factor in taking software designed for other purposes and using it in new ways. However, Wainwright and Russell’s (2010) short article promotes just that. They describe the use of NVivo to manage collections of interview sound recordings (i.e., exactly the same sorts of material which Jessee, et al. consider). NVivo is one of the best known commercial software packages for indexing qualitative data and recent releases enable it to index sound and video based data [2]. Its main use is as an indexing tool to text–based material and its interface and documentation reflect this. It can perhaps be best described as an implementation of grounded theory research, in the sense that researchers are encouraged to develop a coding system iteratively as they work through the results of their research. It is not surprising to me that an oral historian coming to the software from a background of research training in history might find it off–putting and might find ‘Stories Matter’ more congenial (in part because of the statement that it was developed by historians). Leaving such speculation to one side, what of the ‘reinventing the wheel’ jibe? Having worked as a researcher and funding board assessor (in one form or other) for quite some time my own view can be summarized as follows:

You cannot compare without having comparators (competitors). This is an obvious point but needs to be made against those who use the phrase ‘reinventing the wheel’ as a way of, in effect, stifling innovation. Even a tricycle needs three wheels so unless we are all to travel on unicycles we should accept (and welcome) more than one attempt to tackle the same problem. Relatedly, Kuhn’s (1962) pioneering The structure of scientific revolutions is not a manual for the administration of science (or other types of research). From the point of view of the practitioners and administrators we cannot know what will succeed and so we need robust debate (competition) in order to clarify the possibilities (see Gilbert and Mulkay, 1984, for a worked example). Without necessarily endorsing the ‘anything goes’ provocations of Paul Feyerabend (1975) we must recognise that mavericks serve a purpose and that occasionally they get the last laugh.

Without hindsight or divine inspiration it is not easy (possible?) to predict which attempt to solve a problem will be best. If this were the case the world would be an easier place. However, there may not be a single ‘best solution’.

Truisms of software design 1. We need to recognise that there are many classes or types of researchers and users of software and that a one–size fits–all approach may not best suit them all. Different users may prefer different interfaces and those with particular working styles may find one piece of software more congenial to use than another, functionally equivalent package.

Truisms of software design 2. It is not just the software per se but the documentation and training materials associated with it which can determine whether a community (or category) of users adopt a particular piece of software. Having historians involved from the outset (e.g., in the development of ‘Stories Matter’) increases the likelihood that other historians will realise that it could help their research rather than dismissing it sight unseen, for example seeing NVivo as a tool for sociologists relevant only to them.

How large is the field?

There are many different approaches to time–based material starting from several disciplinary bases. There is considerable functional overlap but relatively little communication between the practitioners and as far as I can tell no common reviewing or even listings. The following is a summary (almost certainly incomplete). For clarity I have attempted to identify different fields and to note the overlaps between them (they are not exclusive categories). To serve as entry points into each field I give some examples of each category in the notes.

Transcription tools

Captioning tools

Archiving tools

CAQDAS

Indexing vs. retrieval

Strictly, the task of creating an index is separate from the use of such an index. I think this distinction explains in part why the tools themselves are not indexed together. To be clear: I am concentrating on the creation of indices for time–based media on the basis that once an index is created it can be imported into generic database software such as FileMaker Pro (or Lemur’s less well know Bamboo database, http://lemurconsulting.com/Products/Bamboo/Overview.shtml) to enable search functions. They could also be imported into CAQDAS software. Many of these CAQDAS tools perform both indexing and retrieval tasks (and the digital replay system unusually has concordancing built in). In the terms of this article an index of a time–based recording (sound and /or video) is a set of keywords (which may or may not be controlled) each of which is time–coded to the recording being indexed.

On this basis consider the categories of software tools I have just identified:

Transcription tools[3]. Both articles mentioned above start with the important observations that transcription can be incredibly time consuming and that once created the transcript can become the focus of analysis leading to neglect of the recording proper. They point out that digital tools can facilitate the indexing and hence access to key parts of the recording. What they do not point out is that software intended to assist transcription can actually also be used to create indices. We can see an index as an extremely sparse or incomplete transcription.

Captioning tools[4]. These have been developed either to assist with multilingual recordings, or as in the case of MAGpie to provide access to video for deaf or hearing–impaired viewers. The same point made about transcription tools applies here: a time–based index (or set of tags) is an incomplete/reduced/parsimonious/ summary set of captions.

Archiving tools[5]. This is an extremely heterogeneous category. I am including in it multimedia annotation tools such as ANVIL and ELAN. There is crossover between these and transcription tools. CHILDES/CLAN, now the TalkBank archives (http://talkbank.org/), as well as Stories Matter, the oral history management software discussed by Jessee, et al. (2010), fit as well here as with transcription or CAQDAS tools.

CAQDAS (Computer Assisted Qualitative Data Analysis) tools[6]. This started with tools to index fieldnotes, interviews and focus group transcripts but now many of the tools handle sound and video recordings. Unlike some of the transcription or captioning tools they include database search and retrieval functionality.

Web searches and the absence of an overview

Different Web searches reveal some but not all of the types of software discussed above. Most importantly, in my opinion, the Google Scholar “find related” option did not enable one to move between different categories, nor the “similar” option appended to some straight Google searches.

The Table lists the results on the first page of searches on Google on Monday 13 December 2010. No overlap is apparent.

[PDF] QSR NVivo8 Distinguishing features and functions File Format: PDF/Adobe Acrobat — Quick View by C Silver — 2009 — Related articles which provides a more general commentary of common CAQDAS functionality. ... items into the same file ordered by code. Output of multimedia content ... On the positive side these tools can be seen as ways to step back and view ... caqdas.soc.surrey.ac.uk/.../NVivo8%20- %20distinguishing%20features%20FINAL.pdf

[PDF] ATLAS.ti 6 Distinguishing features and functions. File Format: PDF/Adobe Acrobat which provides a more general commentary of common CAQDAS functionality. ... caqdas.soc.surrey.ac.uk/.../ATLAS%206%20- %20distinguishing%20features%20FINAL.pdf Show more results from surrey.ac.uk

[PDF] Choosing a CAQDAS Package File Format: PDF/Adobe Acrobat by A Lewins — 2009 — Cited by 33 — Related articles the combination of tools within CAQDAS packages varies, ... handle a range of multimedia files. It is important to understand how the chosen software ... eprints.ncrm.ac.uk/791/1/2009ChoosingaCAQDASPackage.pdf

Curiosity is bliss: Video transcription tool28 Jul 2004 ... Video transcription tool. If you need to type a transcript for a recorded talk or event (video or audio), you might find this tool useful. ... blog.monstuff.com/archives/000195.html

Speech analysis and transcription software It provides a way to view video or play audio recordings, create a transcript, and link places in the transcript to frames in the video. It provides tools ... liceu.uab.cat/~joaquim/phonetics/fon_anal.../herram_anal_acus.html

Linguistic Data Services: Transcription Tools. The following is a snapshot of the free transcription tools that we’ve been using through ... researchers who want to analyze digital video or audio data. ... linguisticdataservices.com/transcription_tools.html

ECS — Multimedia Annotation and Community FOlksonomy Building (MACFOB) The MACFOB project aims to develop a web–based multimedia annotation tool that will meet the important and pervasive user need of making multimedia web ... www.ecs.soton.ac.uk > ECS > Research > Projects

Multimedia Annotation and Community Folksonomy Building (MACFoB ... 27 Jul 2010 ... The MACFOB project aims to develop a web–based multimedia annotation tool that will meet the important and pervasive user need of making ... www.jisc.ac.uk > ... > Programmes > Users and innovation

It is not always easy to visualise exactly what a CAQDAS package offers when exploring it for ... and searching tools for textual and multimedia data. ... cue.berkeley.edu/qdaarticle.pdf

Transcription Tools for Mac Audio/Video 26 Apr 2006 ... The second is Transcriva, which is an audio only transcription tool. Same as above, but no video. I used this to transcribe the Joe Swanberg ... www.selfreliantfilm.com/?p=126

WebAIM: Software for Creating Captions This software tool is designed to make it easy for multimedia content developers to add captions to their audio and video content. ... webaim.org > Articles > Captions

Mukurtu Wumpurrarni-kari Archive :: An Indigenous Archive Tool An Indigenous Archive Tool. The Mukurtu Wumpurrarni-kari Archive is a browser–based digital archive created by the Warumungu community in Tennant Creek, ... www.mukurtuarchive.org/

Three Video Captioning Tools Educational Technology and Change ...5 Dec 2008 ... Video captioning tools are similar in many aspects: see the screenshot of a captioning window at DotSUB: dotsub_transcribe ... etcjournal.com/2008/12/05/three-video-captioning-tools/

Terrill Thompson: Free Tools for Captioning YouTube Videos 2 Aug 2009 ... We’d rather just generate a prioritized list of YouTube videos and start captioning. Each of the following tools has that ability: ... terrillthompson.blogspot.com/.../free-tools-for-captioning-youtube.html

The Formosan Language Archive: Development of a. Multimedia Tool to Salvage the Languages and Oral. Traditions of the Indigenous Tribes of Taiwan ... 130.102.44.245/journals/oceanic_linguistics/v042/42.1zeitoun.pdf

Of course different searches give different results. However, I am struck by the absence of overlap despite the search terms being deliberately chosen to minimise variation. A user searching for one category is unlikely to stumble upon the others.

I also note that the Wainwright and Russell paper is dated Autumn 2010 and was sent to me on paper early in December 2010. On 11 December 2010 it was not visible on the Web in any form. At this point it was not on the publisher’s Web site (Social Research Update, at http://sru.soc.surrey.ac.uk/) (whose list only went up to SRU 59 whereas Wainwright and Russell is SRU 60) nor visible via Google (and Google Scholar), Bing or Yahoo, nor on the Durham anthropology Web site (the departmental home of Wainwright and Russell). It is salutary to be able to document a paper based distribution system which continues to function and is more up–to–date than electronic communication!

Conclusions

My conclusions are straightforward. On the one hand, software designed for one purpose may have uses not anticipated by the designers. This has implications for reviewers and for search engine optimisation. On the other hand it is only by encouraging parallel development that true comparison is enabled. We cannot predict which software package will actually be most congenial to most users. It is possible that historians might like to use DRS rather than Stories Matter. The important thing is that they can find both. The failure of search engines to make the connections leads us back to humans. There is a clear need for staff at centres such as University of Surrey’s CAQDAS centre (http://sru.soc.surrey.ac.uk/), the Centre for e–Research (CeRch at http://www.kcl.ac.uk/iss/cerch/) at King’s College London (KCL) or those active in the Alliance of Digital Humanities Organizations (http://digitalhumanities.org/) to continue their work to identify connections, build bridges and produce synoptic overviews to help the rest of us find the full range of tools available to help our research.

My final point was provoked by a reviewer to this paper (to whom I am truly grateful for making me think more about the wider picture). As actors we live in a messy world constrained by practical factors (first and foremost, the number of hours in the day) and socio–political ones, which come down to desire, morality and economics. As this was written in July 2011 there are proposals in the U.S. Congress to remove funding from the National Endowment for the Humanities (http://www.neh.gov/) which, if successful, would radically constrain developments in the U.S. In times of financial constraints, it is likely that arguments about duplicated wheels are deployed to justify withholding of support. I hope this article can point out the rhetorical nature of those arguments and that debate, discussion and competition all need multiple parties to succeed.

2. In the Full Disclosure Department, I was consulted by the makers of NVivo early in 2007 when they were planning to include time–based media. I don’t think this is a conflict of interest, but I report it nonetheless. I have had no further contact with them.

You can’t build a car with just one wheel (why duplication may not be such a bad thing), and some limitations of Internet search/retrievalby David Zeitlyn.First Monday, Volume 16, Number 9 - 5 September 2011
http://firstmonday.org/ojs/index.php/fm/article/viewArticle/3332/3044