Protect Sensitive Sites from Phishing Attacks Using Features Extractable from Inaccessible Phishing URLs

Phishing is the third cyber-security threat globally and the first cyber-security threat in China. There were 61.69 million phishing victims in China alone from June 2011 to June 2012, with the total annual monetary loss more than 4.64 billion US dollars. These phishing attacks were highly concentrated in targeting at a few major websites. Many phishing Webpages had a very short life span. In this paper, the authors assume that the websites to protect against phishing attacks are known, and study the effectiveness of machine learning based phishing detection using only lexical and domain features, which are available even when the phishing webpages are inaccessible.