Take Two:
We demonstrated scalability by using all of DBLP data and a much larger FOAF dataset (1 order of magnitude larger).
We improved the COI detection algorithm by using more robust collaboration strength measures and by considering more relationships (e.g., same-affiliation, co-editorship)

Data Sources:

DBLP data is used by means of SwetoDblp, which is an RDF version of DBLP data that incorporates additional relationships such as affiliation and publisher.
We used the March-2007 version of SwetoDblp (over 500K person entities, over 800K publication entities)

FOAF data comes from the crawled collection of Swoogle.
We filtered data from Swoogle by incrementally expanding upon person names matching those of a subset of DBLP.
The resulting FOAF dataset we used is available
(show/hide list).
However, we removed foaf:mbox values to avoid making plain email addresses readily available to spammers
(in few cases where an email was used as URI, we removed part of the email's domain name)

Source Code:
The source code is in Java.
We used the Java-bindings of BRAHMS to load all the files (about 1GB).
We claim that scalability is possible by using an average laptop (and probably the first to use BRAHMS in OSX).
Earlier prototyping was done using main-memory implementation of SemDis API. The change to BRAHMS was quite easy because its Java-bindings implement such API.
The source code is
available (show/hide)

Code for COI detection, zipped and organized as an ant project:
coicode.zip

Code for Entity Disambiguation, will be posted here shortly

The main-memory implementation of SemDis API uses Jena's ARP (RDF Parser).
Hence, some jar files are required and should be obtained from their respective distributions as indicated in
jars-list (show/hide)

Evaluation Datasets:
Our evaluation datsets consists of sets of accepted papers in several conference tracks (of WWW2006) and their respective Program Committee members.
We ran our COI detection over these and manually verified a sample of the results to adjust our method.
There were relatively few relationships passing through the foaf part of the dataset and then back to DBLP entities.
Hence, we took a sample of 200 foaf:Person entities that have at least one foaf:knows relationship to verify that the detection of COI worked properly with FOAF data.
These datasets are available in
this list (show/hide)

Note: All tracks are from the 2006 World Wide Web Conference, which is one of the ones that separates Program Committee (PC) members across tracks

The contact person for details/problems/questions/etc on this page is
Boanerges Aleman-Meza (balemanuga.edu)

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0325464 titled "SemDis: Discovering Complex Relationships in Semantic Web". Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.