4Phishing and Anti-phishing (2)Phishing attackers use both social engineering and technical subterfuge to steal user’s identity data as well as financial account information. By sending “spoofed” s, social-engineering schemes lead users to counterfeit web sites that are designed to trick recipients into divulging financial data such as credit card numbers, account usernames, passwords and social security numbers. In order to persuade the recipients to respond, phishers often hijack brand names of banks, e-retailers and credit card companies. Furthermore, technical subterfuge schemes often plant crimewares, such as Trojan, keylogger spyware, into victims’ machines to steal user’s credentials.Pharming is a special kind of phishing. Pharming crimeware misdirects users to fraudulent sites or proxy servers typically through DNS hijacking or poisoning, so it is harder for a common user to distinguish pharming web sites from legitimate sites, because pharming web sites have the same visual features and URLs as the genuine ones.

5The ways to anti-phishingAccording to the study of Zhang et al. [2], there are four categories in the past work of anti-phishing:studies to understand why people fall for phishing attacks;methods of training people not to fall for phishing attacks;user interfaces for helping people make better decision about trustable and web sites;automated tools to detect phishing.

6The Naïve Bayesian classifierThe Naïve Bayesian classifier is thought to be one of the most effective approaches to learning of the classification of text documents. Given an amount of classified training samples, an application can learn from these samples so as to predict the class of the unmet sample using the Bayesian classifier.x1, x2, x3, …, xn is conditionally independent

7Global Black-List vs. Individual White ListMany ways use black list to detect phishing site. They will tell the user whether the web site is malicious.short life-time and emerging in endlessly of the phishing URL are badly affect on the efficiency of black-list approaches.for example : IE 7 ( 70%, Zhang et al. NDSS‘07)?Individual White List only tells whether the site is legitimate.The favorite web sites requiring authentication are usually stable

8Individual White List What is LUILogin User Interface, a user interface where a user inputs his username/passwordWe use some stable and necessary features to identify the login page.Definition 1: LUI = (URL, IPs, InputArea, CertHash, ValueHash)

9Two Problems in Our methodHow to setup the White ListWhat is the efficiency of the White ListUse a Naïve Bayesian Classifier to automatically set up the individual white list.Use the stable and necessary features of the favorite web pages as a item in the white list to identify the legitimate page.

10AUTOMATED INDIVIDUAL WHITE-LIST APPROACHOur work consists of two phases: training phase and practice phase.Training Phase: In the training phase, we use a number of login processes as samples. Each login process is represented with the features described in the next slide and labeled as a successful login process or a failing one. AIWL learns from these labeled samples so that the classifier can label other processes correctly to build up a white list in practice phase.Practice Phase: In the practice phase, AIWL maintains the white-list automatically and uses the white-list to detect legitimate sites.

11Training Phase (identify a successful login process)Features Used in ClassificationInbrowserhistoryHasNopasswordFieldNumberoflinkHasNoUsernameOpertime

14Evaluation Training a Naïve Bayesian ClassifierEfficiency in Classifying Login ProcessEfficiency of the White-List

15Training a Naïve Bayesian ClassifierWe simulated login processes for 34 web sites. 18 of 34 are phishing web sites selected from PhishTank.com [12] on May 13th, The other 16 are legitimate web sites.For every legitimate web site, both the successful login process and the failing one were simulated. We simulated failing login process by purposely using wrong passwords.

18The result of classification by AIWLURLLogin process ResultProbability of Successful login163.comFail3%126.com7%Blogbus.comSuccess85%Shineblog.comYahoo.com1%Google.comCrsky.com13%Whsee.comBloglines.com71%Fc2.com93%Phishing Site 1Phishing Site 2Phishing Site 3Phishing Site 4Phishing Site 5Phishing Site 6Phishing Site 7Phishing Site 8Phishing Site 9Phishing Site 10The result of classification by AIWLWe set the threshold of login process classification to be 70%. It means if the probability of successful login is more than 70%, we believe this login process is a successful one.

19Efficiency of the White-ListAIWL uses a white-list to detect phishing site. But if a legitimate web site frequently modifies its LUI which is stored in the white-list or users often login in a web site whose LUI is not stored in the white-list, AIWL will obviously often give a wrong warning in user’s login process.Change Rate of IP addressChange Rate of InputArea and ValueHashNumber of new LUIs of user per day

20Change Rate of IP addressProblem:Based on our monitor experiment on 15 popular login sites: aol.com; bebo.come; bay.co.uk; ebay.com; google.com; hi5.com; live.com; match.com; msn.com; myspace.com; passport.net; paypal.com; Yahoo.co.jp; Yahoo.com; Youtube.com, there are some changes from 4/8/2008 to 5/18/2008Solutions:A potential solution is to suggest the web master to fix the IPs of their authentication servers.Or design a secure protocol to change the legitimate IPs in the white list

21Change Rate of InputArea and ValueHashWe conducted the experiment to observe the change rate of InputArea and ValueHash for 11 most popular e-bank web sites in China and 15 most commonly used login sites described in section 4.3. The 11 most popular e-bank web sites are: spdb.com.cn, cmbchina.com, gdb.com.cn, com.cn, icbc.com.cn, cn, ccb.com.cn, bank-of-china.com, ecitic.com.The experiment of banks began on 4/8/2008 and ended on 5/18/2008. The 11 web sites were checked every day.NO CHANGE are be detected.

22Number of new LUIs of user per dayWe conducted this experiment to get the number of new LUIs of users per day. 8 students have participated in this experiment. The experiment began on 2/27/2008 and ended on 3/9/2008.

23DISCUSSION True Positives and False PositivesComparison with Other SolutionsLimitations of AIWL

24True Positives and False PositivesThe Naïve Bayesian classifier in AIWL has a perfect true positive and a 0% false positive rate for identifying a successful login process in our experiment.The efficiency of the white-list is also very good. Because the content of white list is stable, the almost all legitimate sites will not be alert (high true-positive), and all phishing sites will theoretically be alert (false-positive is 0, because AIWL uses a white-list).

25Comparison with Other SolutionsWe can provide more functions: LUI Authentication; Anti-Pharming.

26Limitations of AIWLIt is obvious that the white-list itself is the key point in this approach. If the white-list has been compromised, the whole application will lose its value.Wrong warning will affect the user’s willing to use our appoach.

27ConclusionThis paper proposes a practical approach, named Automated Individual White-List (AIWL), for anti-phishing.Our approach, AIWL is effective in detecting phishing and pharming attacks with low false positive.But, if the White-list based methods wants to reduce the rate of wrong warning, the help from the server side is necessary: standardize the LUI design; design a protocol to update the legitimate LUI features.