Crowdsourcing Our Cultural Heritage

About this blog

Posts from a cultural heritage technologist on digital humanities, heritage and history, and user experience research and design. A bit of wishful thinking about organisational change thrown in with a few questions and challenges to the cultural heritage sector on audience research, museum interpretation, interactives and collections online.

Stuart Dunn reported on the Humanities Crowdsourcing scoping report (PDF) he wrote with Mark Hedges and noted that if we want humanities crowdsourcing to take off we should move beyond crowdsourcing as a business model and look to form, nurture and connect with communities. Alice Warley and Andrew Greg presented a useful overview of the design decisions behind the Your Paintings Tagger and sparked some discussion on how many people need to view a painting before it’s ‘completed’, and the differences between structured and unstructured tagging. Interestingly, paintings can be ‘retired’ from the Tagger once enough data has been gathered – I personally think the inherent engagement in tagging is valuable enough to keep paintings taggable forever, even if they’re not prioritised in the tagging interface. Kate Lindsay brought a depth of experience to her presentation on ‘The Oxford Community Collection Model’ (as seen in Europeana 1914-1918 and RunCoCo’s 2011 report on ‘How to run a community collection online‘ (PDF)). Some of the questions brought out the importance of planning for sustainability in technology, licences, etc, and the role of existing networks of volunteers with the expertise to help review objects on the community collection days. The role of the community in ensuring the quality of crowdsourced contributions was also discussed in Kimberly Kowal’s presentation on the British Library’s Georeferencer project. She also reflected on what she’d learnt after the first phase of the Georeferencer project, including that the inherent reward of participating in the activity was a bigger motivator than competitiveness, and the impact on the British Library itself, which has opened up data for wider digital uses and has more crowdsourcing projects planned. I gave a paper which was based on an earlier version, The gift that gives twice: crowdsourcing as productive engagement with cultural heritage, but pushed my thinking about crowdsourcing as a tool for deep engagement with museums and other memory organisations even further. I also succumbed to the temptation to play with my own definitions of crowdsourcing in cultural heritage: ‘a form of engagement that contributes towards a shared, significant goal or research question by asking the public to undertake tasks that cannot be done automatically’ or ‘productive public engagement with the mission and work of memory institutions’.

Chris Lintott of Galaxy Zoo fame shared his definition of success for a crowdsourcing/citizen science project: it has to produce results of value to the research community in less time than could have been done by other means (i.e. it must have been able to achieve something with crowd that couldn’t have without them) and discussed how the Ancient Lives project challenged that at first by turning ‘a few thousand papyri they didn’t have time to transcribe into several thousand data points they didn’t have time to read’. While ‘serendipitous discovery is a natural consequence of exposing data to large numbers of users’ (in the words of the Citizen Science Alliance), they wanted a more sophisticated method for recording potential discoveries experts made while engaging with the material and built a focused ‘talk‘ tool which can programmatically filter out the most interesting unanswered comments and email them to their 30 or 40 expert users. They also have Letters for more structured, journal-style reporting. (I hope I have that right). He also discussed decisions around full text transcriptions (difficult to automatically reconcile) vs ‘rich metadata’, or more structured indexes of the content of the page, which contain enough information to help historians decide which pages to transcribe in full for themselves.

Some other thoughts that struck me during the day… humanities crowdsourcing has a lot to learn from the application of maths and logic in citizen science – lots of problems (like validating data) that seem intractable can actually be solved algorithmically, and citizen science hypothesis-based approach to testing task and interface design would help humanities projects. Niche projects help solve the problem of putting the right obscure item in front of the right user (which was an issue I wrestled with during my short residency at the Powerhouse Museum last year – in hindsight, building niche projects could have meant a stronger call-to-action and no worries about getting people to navigate to the right range of objects). The variable role of forums and participants’ relationship to the project owners and each other came up at various points – in some projects, interactions with a central authority are more valued, in others, community interactions are really important. I wonder how much it depends on the length and size of the project? The potential and dangers of ‘gamification’ and ‘badgeification’ and their potentially negative impact on motivation were raised. I agree with Lintott that games require a level of polish that could mean you’d invest more in making them than you’d get back in value, but as a form of engagement that can create deeper relationships with cultural heritage and/or validate some procrastination over a cup of tea, I think they potentially have a wider value that balances that.

I was also asked to chair the panel discussion, which featured Kimberly Kowal, Andrew Greg, Alice Warley, Laura Carletti, Stuart Dunn and Tim Causer. Questions during the panel discussion included:

‘what happens if your super-user dies?’ (Super-users or super contributors are the tiny percentage of people who do most of the work, as in this Old Weather post) – discussion included mass media as a numbers game, the idea that someone else will respond to the need/challenge, and asking your community how they’d reach someone like them. (This also helped answer the question ‘how do you find your crowd?’ that came in from twitter)

‘have you ever paid anyone?’ Answer: no

‘can you recruit participants through specialist societies?’ From memory, the answer was ‘yes but it does depend’.

something like ‘have you met participants in real life?’ – answer, yes, and it was an opportunity to learn from them, and to align the community, institution, subject and process.

‘badgeification?’. Answer: the quality of the reward matters more than the levels (so badges are probably out).

‘can you tell in advance which communities will make use of a forum?’ – a great question that drew on various discussions of the role of communities of participants in supporting each other and devising new research questions

a question on ‘quality control’ provoked a range of responses, from the manual quality control in Transcribe Bentham and the high number of Taggers initially required for each painting in Your Paintings which slowed things down, and lead into a discussion of shallow vs deep interactions

the final questioner asked about documenting film with crowdsourcing and was answered by someone else in the audience, which seemed a very fitting way to close the day.

James Murray in his Scriptorium with thousands of word references sent in by members of the public for the first Oxford English Dictionary. Early crowdsourcing?

6 thoughts on “Notes from ‘Crowdsourcing in the Arts and Humanities’”

Hi Mia, I’m currently working on my thesis about the use of crowdsourcing in the digitization of an art museum’s collection. Was there an answer to the following question or a suggestion of where to look for an answer? Thank you!‘what’s happened to tagging in art museums, where’s the new steve.museum or Brooklyn Museum?’ – is it normalised and not written about as much, or has it declined?’LynneGoucher College MAAABaltimore MD USA

unfortunately we didn’t really come up with an answer other than assuming that it’s still happening but is no longer considered innovative enough to generate conference papers or social media. If you investigate this for your research I’d love to know what you find out.