I’ll be giving a talk at the Semantic Technology Conference, June 23 from 7:30AM – 8:20am (ouch!), in San Francisco, CA. The talk title is “Using a Controlled Vocabulary for Managing a Digital Library Platform“: no talk page yet, but the abstract follows. If you’re there, come by and say hello!

(Astute readers will note some similarities between this and my upcoming BibleTech talk. But the audiences are quite different, so the content will be too. This talk will provide “a practical case study on semantically organizing reference material to support search and navigation, using a controlled vocabulary.”)

Abstract

Encyclopedias and other subject-oriented reference books frequently present the same content using different names: and users often look for this information using other names altogether.

The Logos Controlled Vocabulary (LCV) organizes parallel but distinct content in the domain of Biblical studies to integrate reference information and support search, discovery, and knowledge management. The LCV captures

preferred and alternate terminology

inter-term relationships

term hierarchy

linkage to other semantic information

The initial version of the LCV (now shipping in the Logos digital library platform) comprises some 11,000 terms, and continues to grow as more reference works are added. It also provides the backbone of http://topics.logos.com, a website for user contributions to terminology and content.

This talk will describe the building of the LCV, how we’re using it now, and how we plan to use and extend it in the future.

I am interested and have been for a while in some of the ideas behind LCV (particularly community sourced lists of passages relating to a topic and discovering alternative words and phrases used to describe the same content).

However, my main questions are:
1. How much public involvement are you expecting in this project?
2. What terms will the resulting information be licensed under?

If I look at the website now I see “Copyright 2009 Logos Bible Software”. It is unclear whether this applies to the text or just the software, but it seems reasonable to be the text.

If the majority of the work is to be done by Logos employees, then it would seem reasonable for Logos to retain the copyright. However, if the intention is that it is largely run by the public I think a Wikipedia style license allowing everyone to use it and to derive from it makes much more sense. It does seem to me wasteful having everyone putting effort into building a useful dataset just so that Logos can use it. I’m not sure how easily you can get all the contributors to assign copyright for their contributions to you even if they wanted to (open source projects tend to make a lot of trouble about keeping their copyright clean, and probably scare off potential contributors by doing so).

Jonathan:
These are good questions, and i’m not sure at this early stage what the answers are. I still view the user-contribution part of the LCV as an experiment in progress. Certainly the bulk of the effort to date has been done by Logos employees, and of course we host the data and make it available to others. I don’t think we’ve answered the question of what terms might apply if others wanted to license this data. We’re committed to this project as a key building block of our software, even if others don’t contribute: at the same time, we hope that lots of other folks will want to contribute, since that creates more possibilities.