Q&A with Katie Mika

October 18, 2017

October 13, 2017

In this Q&A, Connie Rinaldo, Librarian of the Ernst Mayr Library of the Museum of Comparative Zoology, spoke with Katie Mika, the National Digital Stewardship Resident for Foundations to Actions: Extending Innovations in Digital Libraries in Partnership with NDSR Learners.

Katie: I’m investigating tools and programs for BHL to crowdsource enhancements for collections data and metadata. So far, this includes correcting OCR output errors, generating transcriptions for manuscript documents, and donating bibliographic metadata to Wikidata to better expose our collections and support queries and connections across a very large Linked Data knowledge base.

Q: What’s one project you’re working on right now?

Katie: I’m trying to figure out how to improve collections access to support text mining. Part of this includes correcting OCR files and marking up transcribed manuscript documents on an external platform and determining whether it improves natual language processing (NLP) and named entity recognition (NER) processes for extracting occurrence data like species names, locations, temporal references, and events.

Q: What is an accomplishment that you’re most proud of?

Katie: I’m getting a paper published! Editor’s note: Katie has also been named by ATG Media at Reed Library as one of their top Up & Comers, nominated as a rising star in the library and information profession.

Q: What is one big challenge the project is facing?

Katie: Scale. BHL has 53 million pages and is adding more, including potentially hundreds of manuscript collections. Crowdsourcing does not scale up to the degree that is needed, so it’s a little challenging to keep in mind what the data can be used for and to optimize a program accordingly. As machine learning algorithms improve for library processes, our crowdsourced data may prove to be valuable training data in the future.

Q: How do you feel your team/project impacts the Harvard Library community?

Katie: I think BHL brings an interesting perspective to the “Collections as Data” movement in libraries. A lot of our users are taxonomists and scientists that are looking to research and query across a much larger scale than has been possible. I hope that my work this year can demonstrate how libraries can optimize collections data for these users and be a real partner in open data and open science initiatives.

What’s the most surprising thing you learned during the project?

Katie: I’ve been pleasantly surprised how important and in-demand librarian skills are becoming for data management practices. It’s been very exciting to contribute to Linked Data discussions and consider how library collections can interact with other types of information for an even wider range of users.