The Metadata Offer New Knowledge (MONK) Project is an attempt to leverage emerging text mining, text analysis, and text visualization technologies for use by humanities scholars. Led by PIs John Unsworth and Martin Mueller, the MONK team consists of over 35 researchers at 7 universities in the United States and Canada. The project is organized around five core research areas, or cells, dedicated respectively to data, analytics, users and use cases, collaboration, and interfaces. In this presentation, we will concentrate on several aspects of the work being carried out by the interface cell.

This will be in some senses a project report, describing the current state of our activities, but we also intend to summarize the insights we have gained that we believe may be of benefit to other projects that involve the development of online tools.

In the first part of the presentation, we summarize what we have learned in our evaluations of the next generation of online development tools, many of which are built on Ajax, which is in turn resting on JavaScript. We carried out this survey because one of the most crucial steps in the interface cell for MONK consisted of determining an appropriate set of technologies for the user interface. We didn’t take this choice lightly, as we wanted to avoid developing tools that would need to be re-implemented in a different framework a year later. We had also been hamstrung in an earlier project by making a comparatively light choice of OpenLaszlo (OL) as the development platform.

Although excellent in many ways, OL turned out, at least at that time, to provide limited handling of text formatting, since it compiles into a Flash object.

Some of the recurring priorities for the MONK technologies are as follows:

attractive and slick

cross-platform

works fast

capable of animation

capable of scaling for showing many items

incremental loading of content

no download required

able to host or interact with other technologies (Java, Flash, etc.)

decent control over typography

XML or JSON support for data interchange

ability to simulate state

ability to log (for testing, user undo, and possibly collaboration)

To help us with the evaluation, we attempted to carry out test cases at two levels: first, where it was appropriate to do so (with technologies that include display to the end user, for example) we tried three tests: create a tree-structure; display text with non-roman characters; do a simple 3-D animated visualization with different colors. These demonstrated some basic functionality and allowed us to determine whether things that we know we needed to do would be harder or easier to do in a particular environment.

We also accepted examples that had been done by others, provided that they could satisfy any of the test cases.

As we developed the idea of a Monk Workbench, we narrowed our choices down to two main options:

RAP, an Eclipse-based server-side framework for building rich web applications

a custom, client-side Javascript framework, based on EXJTS

Although RAP has much to recommend it, we decide that the technology was 1) premature; 2) too difficult to customize; 3) too demanding to develop (essentially in Java). Our primary concern with that choice has to do with the flexibility of the technology for customization: our interface designs rest on a number of details that are not available out of the box. We therefore also experimented with MooTools, which provides more flexibility, but at the cost of requiring more development. In the end, we decided to pursue the custom, client-side Javascript framework, using EXTJS, which seemed to provide a compromise position.

The goal of the next part of the presentation is to advocate, albeit cautiously, for the development and use of proxies for large, distributed projects. By “proxy,” we mean a set of calls that the interface developers can use to obtain data in a standardized format.

Whether the data is real or not is a secondary issue, since the proxy exists to isolate the interface developers from issues at the back end. In fact, it can almost be guaranteed that at the beginning of the project, much of the data available to the interface designers will be faked at the proxy, while as the project progresses, iteratively more and more of the fake data will be replaced by access to the real thing.

Although by its nature a proxy layer is technical, its purpose is therefore primarily managerial. By means of this device, we have attempted on the MONK project to isolate the working environments of the different cells, in order to avoid the situation where the critical path for one cell leads through another cell, and the dependent group could potentially be left waiting. From the perspective of project management, a proxy layer seems to be a necessary condition for the success of a distributed team. However, there are a number of factors that complicate the situation.

First, a successful proxy needs to meet a number of technical criteria. It should be in a form where the calls for data are quick and easy to implement, since a typically iterative design process will require an ever-increasing number of different kinds of data. Once implemented, the details of the individual calls should be held stable, since the whole point of the proxy is that the interface developers can work within an environment that doesn’t shift too much on them. A good proxy needs, therefore, to be both flexible and rigid. It should also be easily accessible from the back end, so that swapping in real data for fake data is not problematic. Finally, it should be the consistent one-stop shop for all data needs. This is particularly difficult to ensure in cases where a number of proxy-like calls are available from the native back-end technology. There is always a temptation to make exceptions, and require the interface developers to access data directly from a source. The problem with succumbing to that temptation is that when the data source changes, the interface needs to be modified to accommodate the change.

The MONK proxy consists of a set of URLs that return XML. We decided to use XML in part to make troubleshooting easier for human readers, although for performance reasons we are converting most of the data to JSON format at the interface. Eventually, it may become useful to provide JSON directly from the proxy.

The final part of this talk will describe our work to date on the tools, toolsets, and workbench. The purpose of the workbench design is to provide maximum flexibility in the selection and reuse of components with maximum support and encouragement for the novice user. The main canvas space contains tool sets that each consists of all the tools necessary to accomplish a particular task (Figure 1). So, for example, the FeatureLens stack will contain all the components of FeatureLens. The Search by Example stack will contain all the components necessary to do supervised classification. The networkgraphiny stack holds all the tools required to perform a social network analysis of a text and visualize the results. Each tool set has a unique and distinctive visual reference to the primary analytic to that tool set. To launch a tool set, the user would either drag it into the socket named “drag n’drop a tool set to start,” or else double-click the toolset.

We recognize in MONK that a great deal of functionality can be provided simply by being able to search and sort across collections, although we also seek to make some fairly sophisticated processes available, such as the D2K supervised classification. We also hope to be able to create visualizations that will help to provide various forms of interactive prospect for both the processes and the results. Our strategy in designing tools is therefore to develop in some cases more than one tool that can perform similar functions. The user will then be able to choose a basic, enhanced, or experimental tool at stages in the process where those options are available. For example, we currently have on our list half a dozen tools that provide various ways to search and sort items, either within or across collections. The simplest of these is a hierarchical tree browser, much like the folder system on the desktop. The most sophisticated include the MONK implementation of the mandala browser, and also a faceted browser that dynamically arranges text tiles according to the available metadata about the documents.

Figure 1. This MONK workbench sketch shows a variety of tools grouped into toolsets, allowing users to approach the workbench from the perspective of a particular kind of task. Some tasks we hope to support include search by example (also known as supervised classification, using either Naïve Bayes or Support Vector Machines), working with timelines, and studying patterns of relationships.

We also wanted to support expert users who may choose to construct new toolsets, as well as modify existing ones. To make that possible, there is a “new toolset” option that would be launched like any other tool set. There is also a complete list of all tools, grouped by type, across the bottom of the screen. The current list of tools includes the following. Whether or not we will be able to include all of these in the initial MONK release remains to be seen:

I live and work

on the ancestral and traditional Indigenous territories of the Blackfoot and the people of the Treaty 7 region in Southern Alberta, which includes the Siksika, the Piikani, the Kainai, the Tsuu T’ina and the Stoney Nakoda First Nations. The City of Calgary is also home to the Metis Nation of Alberta, Region III.