SPADE: a social-spam analytics and detection framework

Abstract

Social media such as Facebook, MySpace, and Twitter have become increasingly important for attracting millions of users. Consequently, spammers are increasing using such networks for propagating spam. Although existing filtering techniques such as collaborative filters and behavioral analysis filters are able to significantly reduce spam, each social network needs to build its own independent spam filter and support a spam team to keep spam prevention techniques current. To alleviate those problems, we propose a framework for spam analytics and detection which can be used across all social network sites. Specifically, the proposed framework SPADE has numerous benefits including (1) new spam detected on one social network can quickly be identified across social networks; (2) accuracy of spam detection will be improved through cross-domain classification and associative classification; (3) other techniques (such as blacklists and message shingling) can be integrated and centralized; (4) new social networks can plug into the system easily, preventing spam at an early stage. In SPADE, we present a uniform schema model to allow cross-social network integration. In this paper, we define the user, message, and web page model. Moreover, we provide an experimental study of real datasets from social networks to demonstrate the flexibility and feasibility of our framework. We extensively evaluated two major classification approaches in SPADE: cross-domain classification and associative classification. In cross-domain classification, SPADE achieved over 0.92 F-measure and over 91 % detection accuracy on web page model using Naïve Bayes classifier. In associative classification, SPADE also achieved 0.89 F-measure on message model and 0.87 F-measure on user profile model, respectively. Both detection accuracies are beyond 85 %. Based on those results, our SPADE has been demonstrated to be a competitive spam detection solution to social media.

Keywords

Notes

Acknowledgments

This research has been partially funded by National Science Foundation by CNS/SAVI (1250260), IUCRC/FRP (1127904), CISE/CNS (1138666), RAPID (1138666), CISE/CRI (0855180), NetSE (0905493) programs, and gifts, grants, or contracts from DARPA/I2O, Singapore Government, Fujitsu Labs, and Georgia Tech Foundation through the John P. Imlay, Jr. Chair endowment. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding agencies and companies mentioned above.

Caverlee J, Webb S (2008) A large-scale study of MySpace: observations and implications for online social networks. Proceedings of the international conference on weblogs and social media 8Google Scholar