Detection of Spam Messages Using High Level Abstraction Techniques

Spam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Recent work in P2P overlay networks allow for Decentralized Object Location and Routing (DOLR) across networks based on unique IDs. In this paper, the authors propose an extension to DOLR systems to publish objects using generic feature vectors instead of content-hashed GUIDs, which enables the systems to locate similar objects. They discuss the design of a distributed text similarity engine, named Approximate Text Addressing (ATA), built on top of this extension that locates objects by their text descriptions. They then outline the design and implementation of a motivating application on ATA, a decentralized spam-filtering service.