High-Performance Content-Based Phishing Attack Detection

Phishers continue to alter the source code of the web pages used in their attacks to mimic changes to legitimate websites of spoofed brands and to avoid detection by phishing countermeasures. Manipulations can be as subtle as source code changes or as apparent as adding or removing significant content. To appropriately respond to these changes to phishing campaigns, a cadre of file matching algorithms is implemented to detect phishing websites based on their content, employing a custom data set consisting of 17,684 phishing attacks targeting 159 different brands.