Inexpensive, easy-to-use data collection: Kobo Toolbox as an example.

As promised in my previous post , this post is about using Kobo Toolbox as a data collection tool for archaeology. I also address important issues that have materialized as I design and use forms in Kobo Toolbox.

What is Kobo Toolbox? Kobo Toolbox is an open source suite of tools for the collection and analysis of humanitarian data- especially in remote places or after a disaster. Kobo Toolbox is comprised of three different tools, one for form design, one for data collection and another for data analysis (the latter is rather simple and won’t be discussed here). Kobo Toolbox can be used from their website or installed on a local server. It was designed by a fairly large collective organized by the Harvard Humanitarian Initiative and supported by lots of heavy hitters, including the United Nations, the International Rescue Committee and US government- through USAID. It is very well supported and has a very active and dynamic user and collaborator community.It appears to be sustainable. Similarly, the data collection form is based upon Enketo, another open source, well supported and actively growing tool.

Why Kobo Toolbox? First and most importantly, Kobo Toolbox is device independent (though there is an Android app as well). Instead of operating via an app, it runs as a webform that can be accessed through a browser on any device that can run a browser (browser does need to be updated to handle HTML5; Google Chrome seems to be the best at this point; see this). With the expansion of the use of smartphones throughout the world, this is incredibly significant. Anyone with a device with a browser can collect data. Personally, in trying to devise ways to collect archaeological data digitally, I purchased software and hardware into the $1000s- and I was doing it “on the cheap”. As my two iPads aged substantially between field seasons, I became increasingly frustrated because the expectation was that I was going to need to purchase new iPads in the near future- I was stuck in the technology treadmill. The system was unsustainable. With Kobo Toolbox, an inexpensive smart phone (yes, they do exist) is all one needs. Data collection becomes much less restricted and the opportunities for collaboration with interested communities, especially in remote places, is much, much greater.

Second, web forms can be used OFF-LINE! That’s right, a WEB form that can be used OFF-LINE. This is essential for nearly all situations in archaeology. Even if you have access to the internet (through nearby WiFi or a cellular data connection), it is likely that your connection will be cut at some point- usually the most inconvenient one- leaving you unable to collect data. Not with Kobo Toolbox. You can continue to collect data; it will upload and synchronize once a data connection has been reestablished. How much data is based upon the browser and settings with the browser; here’s details for Google Chrome. Important update: questions are stored in question banks that can now be shared!

Third, Kobo Toolbox includes an intuitive form design tool via the web. Most importantly, this means that the “learning curve” or threshold is very low. I was able to create basic forms in minutes the very first time I tried the tool. However, for more advanced data collection, the web design tool can be nearly as complicated as one desires. Data types are varied and include everything from alphanumeric to GPS location to images and audio. There are some qualifiers here; for example, barcodes can be collected using the Android app, but not the web form. Data can be collected by radio buttons, check boxes, drop down lists and many other ways. Options include skip logic (i.e. show certain questions based upon the response to previous questions) and validation criteria, both of which increase ease of use and the reliability of the data. Form construction can be even more complex if XLSforms (http://xlsform.org/ )(based upon the open standard XForms; https://www.w3.org/MarkUp/Forms/ ) are used. XLSforms can be designed using the ubiquitous Microsoft Excel (LibreOffice Calc (https://www.libreoffice.org/discover/calc/ ) could be used as well and the file saved as .xls). The tool, therefore, is incredibly easy to use from the very beginning, but can be as complex as the user demands.

A portion of the data collection form in Kobo Toolbox

How about a quick example? In the fall of 2015, I taught a course entitled “Historical Ecology of the Lehigh Gap” at Muhlenberg College, which was “clustered” with another course, “Degradation and Restoration” taught by my colleague Kimberly Heiman. As a component of this course, we mapped plant distributions along a transect through the Lehigh Gap Nature Center. The Lehigh Gap is a Superfund site contaminated by heavy metals from a zinc factory. The purpose of the transect was to identify different plants that represented degraded or restored communities. I created the form (see example) in an evening and shared the link with the students.

In the field, students pulled up the link on their smartphones. Although there were some hiccups, the form worked like a charm. We had one phone that could not use the forms- we still do not know why. Some students found that one browser was preferable to others (Chrome seemed to be the best) and we had significant issues collecting photos. Except for a few exceptions, GPS coordinates were within their known error (c. 10 meters). Those points that were not located within the expected 10 meters appear to have been random. That is, there was no apparent pattern that would suggest particular phone brands, individual phones, or users that were less accurate than others. The data was exported (via CSV) and imported into CartoDB, which students used to analyze the spatial distribution of plants. The map below shows one day of collection and only shows one plant, sweet birch, which in this case actually shows the effectiveness of prescribed burns on the northeastern portion of the transect (birch tend to pull heavy metals from the soil and reintroduce them into the ecosystem; controlled burns limit the growth of birch).

A sample of data collected in Kobo Toolbox being displayed in CartoDB.

This was a particularly effective exercise. We were able to collect approximately 75 data points on 10 plants at 10 meters intervals (i.e. a distance of 3/4 kilometers) in approximately 2 hours with 17 students. Data collection required no special tools, but students were able to collect and analyze a rich data set with relatively simple, intuitive tools.

Ok, it may appear that I am trying to “sell” Kobo Toolbox. However, let’s face it, there are some drawbacks to Kobo Toolbox.
First, data entry via a small screen virtually requires that typing be minimized- drop down boxes, radio buttons, etc. are far superior. This means that it is preferable to design these types of data entry into the form, which has the positive effect of standardized data, but also can reduce recording important, narrative data. Really, this is not a drawback of Kobo Toolbox, but of the device. Narrative data would be best collected via audio or text through voice recognition, but neither of these is ideal either.

Second, because Kobo Toolbox is designed around location collected through the device GPS, it is wonderful for survey, but the location aspect is less useful for excavation. If a sub-centimeter RTK GPS was connected to the device (via Bluetooth?) the location aspect of the form would be much more useful for archaeology, but that requires serious expense (or, perhaps not). The device could also be connected via Bluetooth to a total station, etc. for increased locational control. Excavation data is likely better collected via a tablet, rather than a phone.

Third, initially I wanted a tool that could be connected directly to a relational database; at this point, this cannot be easily done with Kobo Toolbox. However, because the format of the Kobo export is a CSV file, the data can be easily synchronized with any database, relational or otherwise with relatively little effort. I am now convinced that this sort of compartmentalization is actually preferable. With such a format, the user can decide how to store, analyze and archive their data. While I now prefer a PostGIS database accessed through a LibreOffice Base, others may prefer to database types or GUIs. Compartmentalization means that no one tool is reliant upon any other, but it does mean that standardized (and preferably open) data formats are required to go between tools. Compartmentalization also means that forms must be designed with the database in mind and vice versa, the database must be designed with the intension that all data will be arriving in CSV files (alternatives include KML and XLS; except images, audio, etc., which must be entered into the database manually).

Fourth, I also initially wanted the ability to modify forms in the field. I was able to do this with my initial Filemaker Pro database, but only because I carried the server into the field. However, I used adjusted forms largely during field testing. Adjusting forms in Kobo Toolbox once they have been deployed is difficult, requires an internet connection and new “projects” must be created with new URLs, etc (side note- you can install Kobo Toolbox on your own server and take it into the field). However, once past the testing phase this may not be important; once data collection has begun in earnest, changing forms is usually a poor idea because it reduces consistency and makes synchronization more difficult.

Although Kobo Toolbox has some important limitations, I now consider these limitations to be beneficial. Open source tools that do one thing extremely well and that use open standards and open file formats are preferable because their output is useful in a wide variety of other tools (and in ways that I cannot even imagine).