Abstract:

With advancement in technology most of the modern day communication takes place through emails. This has made the process of communication much faster and easier as it saves time. One probable disadvantage of using emails as a prime mode communication is advertisements. Many companies use them for their advertisement and keep on sending emails that contain unwanted information often referred to as Spam. Although many approaches have been developed for the identification of spam emails but none of them gives 100% accuracy in spam identification and screening. In this paper by using RapidMiner data mining tool a method has been proposed for identification and screening of spam emails. Initially pre-processing has been done by using different data mining pre-processing techniques. Major emphasis of proposed approach is on preprocessing and importance of pre-processing while doing text mining. After data pre-processing different algorithms for classification are applied over the taken sample dataset. Furthermore, cross authentication has been done on the basis of different parameters. In the end, a model with best classifier in combination of pre-processing technique for spam email is identified based on accuracy, precision, recall, execution time and error rate. Proposed model is used to identify spam emails.