Email #16 was from CardMember Services with the subject &quot;Your Online Statement Is Now Available&quot; Email #17 was from [email_address] with the subject &quot;Reactivate your PayPal Account&quot;

9.
Little Knowledge of Phishing
• Only about half knew meaning of the term “phishing”
“Something to do with the band Phish, I take it.”

10.
Minimal Knowledge of Lock Icon
“I think that it means secured, it symbolizes
some kind of security, somehow.”
• 85% of participants were aware of lock icon
• Only 40% of those knew that it was supposed
to be in the browser chrome
• Only 35% had noticed https, and many of those did
not know what it meant

11.
Little Attention Paid to URLs
• Only 55% of participants said they had ever noticed
an unexpected or strange-looking URL
• Most did not consider them to be suspicious

12.
Some Knowledge of Scams
• 55% of participants reported being cautious when
email asks for sensitive financial info
– But very few reported being suspicious of email asking
for passwords
• Knowledge of financial phish reduced likelihood of
falling for these scams
– But did not transfer to other scams, such as
amazon.com password phish

13.
Naive Evaluation Strategies
• The most frequent strategies don’t help much in
identifying phish
– This email appears to be for me
– It’s normal to hear from companies you do business with
– Reputable companies will send emails
“I will probably give them the information that they asked for.
And I would assume that I had already given them that
information at some point so I will feel comfortable giving it to
them again.”

14.
Other Findings
• Web security pop-ups are confusing
“Yeah, like the certificate has expired. I don’t actually
know what that means.”
• Don’t know what encryption means
• Summary
– People generally not good at identifying scams they
haven’t specifically seen before
– People don’t use good strategies to protect themselves

16.
Web Site Training Study
• Laboratory study of 28 non-expert computer users
• Two conditions, both asked to evaluate 20 web sites
– Control group evaluated 10 web sites, took 15 minute break
to read email or play solitaire, evaluated 10 more web sites
– Experimental group same as above, but spent 15 minute
break reading web-based training materials
• Experimental group performed significantly better
identifying phish after training
– Less reliance on “professional-looking” designs
– Looking at and understanding URLs
– Web site asks for too much information
People can learn from web-based training materials,
if only we could get them to read them!

17.
How Do We Get People Trained?
• Most people don’t proactively look for training
materials on the web
• Many companies send “security notice” emails
to their employees and/or customers
• But these tend to be ignored
– Too much to read
– People don’t consider them relevant
– People think they already know how to protect themselves

18.
Embedded Training
• Can we “train” people during their normal use of
email to avoid phishing attacks?
– Periodically, people get sent a training email
– Training email looks like a phishing attack
– If person falls for it, intervention warns and highlights
what cues to look for in succinct and engaging format
P. Kumaraguru, Y. Rhee, A. Acquisti, L. Cranor, J. Hong, and E.
Nunge. Protecting People from Phishing: The Design and
Evaluation of an Embedded Training Email System. CyLab
Technical Report. CMU-CyLab-06-017, 2006.
http://www.cylab.cmu.edu/default.aspx?id=2253
[to be presented at CHI 2007]

28.
Anti-Phishing Phil
• A game to teach people not to fall for phish
– Embedded training focuses on email
– Game focuses on web browser, URLs
• Goals
– How to parse URLs
– Where to look for URLs
– Use search engines instead
• Available on our web
site soon

32.
Some Users Rely on Toolbars
• Dozens of anti-phishing toolbars offered
– Built into security software suites
– Offered by ISPs
– Free downloads
– Built into latest version of popular web browsers

33.
Some Users Rely on Toolbars
• Dozens of anti-phishing toolbars offered
– Built into security software suites
– Offered by ISPs
– Free downloads
– Built into latest version of popular web browsers
• Previous studies demonstrated usability
problems that need further work
• But how well do they detect phish?

35.
Testbed for Anti-Phishing Toolbars
• Manual evaluation was tedious, slow, error-prone
• Created a testbed that could semi-automatically
evaluate these toolbars
– Just give it a set of URLs to check (labeled as phish or not)
– Checks all the toolbars, aggregates statistics
• How to automate this for different toolbars?
– Different APIs (if any), different browsers
– Image-based approach, take screenshots of web browser
and compare relevant portions to known states

37.
Finding Fresh Phish for Test
• Need a source with lots of fresh phishing URLs
– Can’t use toolbar black lists if we are testing their tools
– Sites get taken down within a few days, need phish
less than one day old
• To observe how fast black lists get updated, the fresher
the better
• Experimented with several sources
– APWG - high volume, but many duplicates and legitimate
URLs included
– Phishtank.com - lower volume but easier to extract phish
– Other phish archives - often low volume or not fresh enough
• Choice of feed impacts results

41.
Results
• Only toolbar >90% accuracy has high false positive rate
• Several catch 70-85% of phish with few false positives
– After 15 minutes of training, users seem to do as well
• Few improvements in catch rates seen over 24 hours
– Suggests most toolbars not taking advantage of
available phish feeds to quickly update black lists
• Combination of heuristics and frequently updated black list
(and white list?) seems to be most promising approach
• Plan to periodically repeat study every quarter
• Should only consider this a rough ordering
– Different sources of phishing URLs lead to different results

43.
Robust Hyperlinks
• Developed by Phelps and Wilensky to solve
“404 not found” problem
• Key idea was to add a lexical signature to URLs that
could be fed to a search engine if URL failed
– Ex. http://abc.com/page.html?sig=“word1+word2+...+word5”
• How to generate signature?
– Found that TF-IDF was fairly effective
• Informal evaluation found five words was sufficient
for most web pages

45.
Adapting TF-IDF for Anti-Phishing
• Rough algorithm
– Given a web page, calculate TF-IDF for each word on page
– Take five terms with highest TF-IDF weights
– Feed these terms into a search engine (Google)
– If domain name of current web page is in top N search
results, consider it legitimate (N=30 worked well)

53.
Email Anti-Phishing Filter
• Philosophy: automate where possible, support
where necessary
• Goal: Create an email filter that detects phishing
emails
– Well explored area for spam
– Can we do better for phishing?