A Real-Life Instance of Plagiarism Detection by SCAM

This note describes how SCAM was used in a real-life plagiarism
case. Our involvement started when we saw a message in
DBWORLD, a large mailing list about a possible case of plagiarism
by an author. We felt this was a good opportunity to test SCAM
in a real-life scenario.

We downloaded abstracts of other papers written by the same author
(henceforth referred to as X), from INSPEC (a database of conference
and journal abstracts) and registered them into SCAM. Since we
did not know the source of plagiarism (that is from which paper
was X's paper copied from), we had to "poll" INSPEC
for abstracts in the same field as each of X's abstracts.

We chose several keywords from each of X's abstracts (for instance,
we chose "routing" and "VLSI" in one of the
VLSI CAD papers), and downloaded all the abstracts returned from
the keyword search (usually about 1000 to about 10,000 abstracts).
We did not want to use one keyword per abstract and thereby throw
away some abstracts that may have had overlap, but did not use
the chosen keyword. For instance, X may have chosen to add a certain
word like "VLSI" while the original abstract may not
have used that word. We then created a union of all downloaded
abstracts (based on several keywords) and then used SCAM to compare
the union of the downloaded abstracts with the registered abstracts.
SCAM returned between 1 and 20 abstracts (at least one because
the abstract of X was already in INSPEC, and clearly had 100%
overlap) for each of X's abstracts. We then manually examined
the flagged abstracts and saw if there was any abstract that had
any close relation with X's abstracts.

Overall we manually downloaded about 35,000 abstracts from INSPEC.
The downloading from INSPEC to our local workstation took a few
hours due to the quirks in INSPEC's mailing system (we could not
mail more than about 50 citations at a time). Once the abstracts
were downloaded, it took SCAM about 5 minutes to produce the list
of possible overlaps. What would normally would have taken a human
days or months to perform without SCAM, was reduced to between
15 - 20 minutes for manually examining all the flagged abstracts
(about 50) for the initial set of X's 4 abstracts.

We were then in constant touch with Prof. Halatsis (University
of Athens, Greece) who sent us more abstracts of papers that X
had either put on his CV or had submitted. SCAM found more instances
of plagiarism, and the original sources from which each of X's
abstracts were copied. We also expanded our search space to the
CS-TR database, a database
of Computer Science Technical reports from a few universities,
in addition to our INSPEC searches. The entire incident was summarized
in a message posted on several newsgroups
and mailing lists on the Internet.