Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Usenix04

1.
On User Choice in Graphical Password Schemes Darren Davis Fabian Monrose Michael K. Reiter Johns Hopkins University Carnegie Mellon University {ddavis,fabian}@cs.jhu.edu reiter@cmu.eduAbstract would be able to remember stronger passwords of a graphical nature.Graphical password schemes have been proposed as In this paper we study a particular facet of graphicalan alternative to text passwords in applications that password schemes, namely the strength of graphi-support graphics and mouse or stylus entry. In cal passwords chosen by users. We note that notthis paper we detail what is, to our knowledge, the all graphical password schemes prescribe user cho-largest published empirical evaluation of the eﬀects sen passwords (e.g., [24]), though most do (e.g., [2,of user choice on the security of graphical password 8, 3, 4, 7]). However, all of these schemes can beschemes. We show that permitting user selection of implemented using either system-chosen or user-passwords in two graphical password schemes, one chosen passwords, just as text passwords can bebased directly on an existing commercial product, user-chosen or system-chosen. As with text pass-can yield passwords with entropy far below the the- words, there is potentially a tradeoﬀ in graphicaloretical optimum and, in some cases, that are highly passwords between security, which beneﬁts by thecorrelated with the race or gender of the user. For system choosing the passwords, and usability andone scheme, this eﬀect is so dramatic so as to ren- memorability, which beneﬁt by permitting the userder the scheme insecure. A conclusion of our work to choose the password.is that graphical password schemes of the type westudy may generally require a diﬀerent posture to- Our evaluation here focuses on one end of thisward password selection than text passwords, where spectrum, namely user chosen graphical passwords.selection by the user remains the norm today. The graphical password schemes we evaluate are a scheme we call “Face” that is intentionally very closely modeled after the commercial PassfacesTM scheme [3, 24] and one of our own invention (to our1 Introduction knowledge) that we call the “Story” scheme. In the Face scheme, the password is a collection of k faces, each chosen from a distinct set of n > 1 faces, yield-The ubiquity of graphical interfaces for applications, ing nk possible choices. In the Story scheme, a pass-and input devices such as the mouse, stylus and word is a sequence of k images selected by the usertouch-screen that permit other than typed input, to make a “story”, from a single set of n > k im-has enabled the emergence of graphical user authen- ages each drawn from a distinct category of imagetication techniques (e.g., [2, 8, 4, 24, 7, 30]). Graphi- types (cars, landscapes, etc.); this yields n!/(n − k)!cal authentication techniques are particularly useful choices. Obviously, the password spaces yielded bywhen such devices do not permit typewritten input. these schemes is exhaustively searchable by a com-In addition, they oﬀer the possibility of providing a puter for reasonable values of k and n (we use k = 4form of authentication that is strictly stronger than and n = 9), and so it relies on the authenticationtext passwords. History has shown that the dis- server refusing to permit authentication to proceedtribution of text passwords chosen by human users after suﬃciently many incorrect authentication at-has entropy far lower than possible [22, 5, 9, 32], tempts on an account. Nevertheless, an argumentand this has remained a signiﬁcant weakness of user given to justify the presumed security of graphicalauthentication for over thirty years. Given the fact passwords over text passwords in such environmentsthat pictures are generally more easily remembered is the lack of a predeﬁned “dictionary” of “likely”than words [23, 14], it is conceivable that humans choices, as an English dictionary provides for En-

2.
glish text passwords, for example (c.f., [8, Section The rest of this paper is structured as follows. We3.3.3]). describe related work in Section 2. In Section 3 we describe in more detail the graphical passwordFor our study we utilize a dataset we collected dur- schemes that we evaluate, and discuss our dataing the fall semester of 2003, of graphical password sources and experimental setup. In Section 4 we in-usage by three separate computer engineering and troduce our chosen security measures, and presentcomputer science classes at two diﬀerent universi- our results for them. In Section 5 we discuss issuesties, yielding a total of 154 subjects. Students used and ﬁndings pertinent to the memorability of thegraphical passwords (from one of the two schemes two schemes. Finally, we conclude in Section 6.above) to access their grades, homework, homeworksolutions, course reading materials, etc., in a man-ner that we describe in Section 3.2. At the endof the semester, we asked students to complete anexit survey in which they described why they picked 2 Related Workthe faces they did (for Face) or their chosen sto-ries (for Story) and some demographic informationabout themselves. This work, and in particular our investigation of the Face scheme, was motivated in part by scientiﬁc lit-Using this dataset, in this paper we evaluate the erature in psychology and perception. Two resultsFace and Story schemes to estimate the ability of documented in the psychological literature that mo-an attacker to guess user-chosen passwords, possibly tivated our study are:given knowledge of demographic information aboutthe user. As we will show, our analysis suggeststhat the faces chosen by users in the Face scheme • Studies show that people tend to agree aboutis highly aﬀected by the race of the user, and that the attractiveness of both adults and children,the gender and attractiveness of the faces also bias even across cultures. (Interested readers arepassword choice. As to the latter, both male and referred to [10] for a comprehensive literaturefemale users select female faces far more often than review on attractiveness.) In other words, themale faces, and then select attractive ones more of- adage that “beauty is in the eye of the be-ten than not. In the case of male users, we found holder,” which suggests that each individualthis bias so severe that we do not believe it possible has a diﬀerent notion of what is attractive, isto make this scheme secure against an online attack largely false. For graphical password schemesby merely limiting the number of incorrect password like Face, this raises the question of what in-guesses permitted. We also quantify the security of ﬂuence general perceptions of beauty (e.g, fa-the passwords chosen in the Story scheme, which cial symmetry, youthfulness, averageness) [1, 6]still demonstrates bias though less so, and make rec- might have on an individual’s graphical pass-ommendations as to the number of incorrect pass- word choices. In particular, given these a pri-word attempts that can be permitted in this scheme ori perceptions, are users more inclined to chosebefore it becomes insecure. Finally, we benchmark the most attractive images when constructingthe memorability of Story passwords against those their passwords?of the Face scheme, and identify a factor of the Storyscheme that most likely contributes to its relative • Studies show that individuals are better able tosecurity but also impinges on its memorability. recognize faces of people from their own race than faces of people from other races [31, 20,On the whole, we believe that this study brings into 11, 29]. The most straightforward account ofquestion the argument that user-chosen graphical the own-race eﬀect is that people tend to havepasswords of the type we consider here are likely to more exposure to members of their own racialoﬀer additional security over text passwords, unless group relative to other-race contact [31]. Asusers are somehow trained to choose better pass- such, they are better able to recognize intra-words, as they must be with text passwords today. racial distinctive characteristics which leads toAnother alternative is to utilize only system-chosen better recall. This so-called “race-eﬀect” [13,passwords, though we might expect this would sacri- 15] raises the question of whether users wouldﬁce some degree of memorability; we intend to eval- favor members of their own race when selectinguate this end of the spectrum in future work. images to construct their passwords.

3.
To the best of our knowledge, there has been noprior study structured to quantify the inﬂuence ofthe various factors that we evaluate here, includingthose above, on user choice of graphical passwords,particularly with respect to security. However, priorreports on graphical passwords have suggested thepossibility of bias, or anecdotally noted apparentbias, in the selection or recognition of passwords.For example, a document [24] published by the cor-poration that markets PassfacesTM makes referenceto the race-eﬀect, though stops short of indicatingany eﬀect it might have on password choice. In astudy of twenty users of a graphical password sys-tem much like the Story scheme, except in which thepassword is a set of images as opposed to a sequence,several users reported that they did not select pho-tographs of people because they did not feel theycould relate personally to the image [4]. The samestudy also observed two instances in which users se-lected photographs of people of the same race as Figure 1: In the Face scheme, a user’s password is athemselves, leading to a conjecture that this could sequence of k faces, each chosen from a distinct setplay a role in password selection. of n > 1 faces like the one above. Here, n = 9, and images are placed randomly in a 3 × 3 grid.The Face scheme we consider here, and minor vari-ants, have been the topic of several user studies fo- black or white male or female, or an Asian, black orcused on evaluating memorability (e.g., [34, 27, 28, white male or female model. This categorization is3]). These studies generally support the hypothe- further discussed in Section 3.1. For our evaluationsis that the Face scheme and variants thereof of- we choose k = 4 and n = 9. So, while choosing herfer better memorability than text passwords. For password, the user is shown four successive 3 × 3instance, in [3], the authors report results of a grids containing randomly chosen images (see Fig-three month trial investigation with 34 students that ure 1, for example), and for each, she selects one im-shows that fewer login errors were made when us- age from that grid as an element of her password.ing PassfacesTM (compared to textual passwords), Images are unique and do not appear more thaneven given signiﬁcant periods of inactivity between once for a given user. During the authenticationlogins. phase, the same sets of images are shown to the user, but with the images randomly permuted.Other studies, e.g., [34, 4], have explored memora-bility of other types of graphical passwords. We em- In the Story scheme, a password is a sequence ofphasize, however, that memorability is a secondary k unique images selected by the user to make aconsideration for our purposes. Our primary goal is “story”, from a single set of n > k images, each de-to quantify the eﬀect of user choice on the security rived from a distinct category of image types. Theof passwords chosen. images are drawn from categories that depict every- day objects, food, automobiles, animals, children, sports, scenic locations, and male and female mod- els. A sample set of images for the story scheme is3 Graphical Password Schemes shown in Figure 2.As mentioned earlier, our evaluation is based on two 3.1 Imagesgraphical schemes. In the Face scheme, the pass-word is a collection of k faces, each selected froma distinct set of n > 1 faces. Each of the n faces As indicated above, the images in each scheme wereare chosen uniformly at random from a set of faces classiﬁed into non-overlapping categories. In Face,classiﬁed as belonging to either a “typical” Asian, there were twelve categories: typical Asian males,

4.
Images of “female models” were gathered from a myriad of pageant sites including Miss USATM , Miss UniverseTM , Miss NY Chinese, and fashion mod- eling sites. Images of “male models” were gath- ered from various online modeling sources including FordModels.com and StormModels.com. For the Story scheme, the “men” and “women” cat- egories were the same as the male and female models in our Face experiment. All other images were cho- sen from PicturesOf.NET and span the previously mentioned categories. To lessen the eﬀect that an image’s intensity, hue, and background color may have on inﬂuencing a user choice, we used the ImageMagick library (see www.imagemagick.org) to set image backgrounds to a light pastel color at reduced intensity. Ad- ditionally, images with bright or distracting back- grounds, or of low quality, were deleted. All remain-Figure 2: In the Story scheme, a user’s password is ing images were resized to have similar aspect ratios.sequence of k unique images selected from one set of Of course, it is always possible that diﬀerences inn images, shown above, to depict a “story”. Here, such secondary factors inﬂuenced the results of ourn = 9, and images are placed randomly in a 3 × 3 experiment, though we went to signiﬁcant eﬀort togrid. avoid this and have found little to support a hypoth- esis of such inﬂuence.typical Asian females, typical black males, typical 3.2 Experimentblack females, typical white males, typical whitefemales, Asian male models, Asian female mod-els, black male models, black female models, white For our empirical evaluation we analyze observa-male models and white female models. In the Story tions collected during the fall semester (roughly thescheme, there were nine categories: animals, cars, four month period of late-August through early-women, food, children, men, objects, nature, and December) of 2003, of graphical password usage bysports. three separate computer engineering and computer science classes at two diﬀerent universities, yieldingThe images used for each category were carefully a total of 154 subjects. Each student was randomlyselected from a number of sources. “Typical male” assigned to one of the two graphical schemes. Eachand “typical female” subjects include faces selected student then used the graphical password schemefrom (i) the Asian face database [26] which con- for access to published content including his ortains color frontal face images of 103 people and her grades, homework, homework solutions, course(ii) the AR Face database [17] which contains well reading materials, etc., via standard Java enabledover 4000 color images corresponding to 126 peo- browsers. Our system was designed so that instruc-ple. For the AR database we used images in angle 2 tors would not post documents on the login server,only, i.e, frontal images in the smile position. These but rather that this server was merely used to en-databases were collected under controlled conditions crypt and decrypt documents for posting or retrievaland are made public primarily for use in evaluating elsewhere. As such, from a student’s perspective,face recognition technologies. For the most part, the login server provided the means to decrypt doc-the subjects in these databases are students, and uments retrieved from their usual course web pages.we believe provide a good representative populationfor our study. Additional images for typical male Since there was no requirement for users to changesubjects were derived from a random sampling of their passwords, most users kept one password forimages from the Sports IllustratedTM NBA gallery. the entire semester. However, a total of 174 pass-

5.
Population Scheme 4 Security evaluation Gender Race Face Story any any 79 95 Male any 55 77 Recall that in both the Face and Story schemes, Female any 20 13 images are grouped into non-overlapping categories. Male Asian 24 27 In our derivations below, we make the simplifying Female Asian 12 8 assumption that images in a category are equiva- Male Black 3 - lent, that is, the speciﬁc images in a category that Female Black - - are available do not signiﬁcantly inﬂuence a user’s choice in picking a speciﬁc category. Male Hispanic - 2 Female Hispanic - - First we introduce some notation. An -element tu- Male White 27 48 ple x is denoted x( ) . If S is either the Face or Story Female White 8 4 scheme, then the expression x( ) ← S denotes the Table 1: Population breakdown (in passwords). selection of an -tuple x( ) (a password or password preﬁx, consisting of image categories) according to S, involving both user choices and random algo- rithm choices. 4.1 Password distributionwords were chosen during the semester, implyingthat a few users changed their password at least In this section we describe how we approximatelyonce. During the evaluation period there were a to- compute Pr p(k) ← S for any p(k) , i.e., the proba-tal of 2648 login attempts, of which 2271 (85.76%) bility that the scheme yields the password p(k) . Thiswere successful. Toward the end of the semester, probability is taken with respect to both randomstudents were asked to complete an exit survey in choices by the password selection algorithm and userwhich they described why they picked the faces they choices.did (for Face) or their chosen stories (for Story)and provide some demographic information about We compute this probability inductively as follows.themselves. This information was used to validate Suppose p( +1) = q ( ) r(1) . Thensome of our ﬁndings which we discuss shortly. Ta-ble 1 summarizes the demographic information for Pr p( +1) ←Sour users. A gender or race of any includes those forwhich the user did not specify their gender or race. = Pr q ( ) ← S ·Such users account for diﬀerences between the sumof numbers of passwords for individual populations Pr q ( ) r(1) ← S | q ( ) ← S (1)and populations permitting a race or gender of any. if p( +1) is valid for S and zero otherwise, where Pr q (0) ← S = 1. Here, p( +1) is valid iﬀ < k defThe students participating in this study did so vol-untarily and with the knowledge they were par- and, for the Story scheme, p( +1) does not con-ticipating in a study, as required by the Institu- tain any category more than once. The secondtional Review Boards of the participating univer- factor Pr q ( ) r(1) ← S | q ( ) ← S should be under-sities. However, they were not instructed as to the stood to mean the probability that the user selectsparticular factors being studied and, in particular, r(1) after having already selected q ( ) according tothat the passwords they selected were of primary scheme S. If the dataset contains suﬃciently manyinterest. Nor were they informed of the questions observations, then this can be approximated bythey would be asked at the end of the study. Assuch, we do not believe that knowledge of our study # q ( ) r(1) ← S Pr q ( ) r(1) ← S | q ( ) ← S ≈ ,inﬂuenced their password choices. In addition, since # q( ) ← Spersonal information such as their individual grades (2)were protected using their passwords, we have rea- i.e., using the maximum likelihood estimation,son to believe that they did not choose them inten- where # x( ) ← S denotes the number of occur-tionally to be easily guessable. rences of x( ) ← S in our dataset, and where

6.
# x(0) ← S is deﬁned to be the number of pass- Assumption 4.1 permits us to replace (2) bywords for scheme S in our dataset. Pr ˆ q ( ) r(1) ← S | q ( ) ← SA necessary condition for the denominator of (2) ˆto be nonzero for every possible q (k−1) is that the # . . . s( ) r(1) ← Sdataset contain N k−1 samples for scheme S where ≈ (4)N ≥ n denotes the number of image categories for # . . . s( ˆ) ← SS. (N = 12 in Face, and N = 9 in Story.) N k−1 is ˆover 1700 in the Face scheme, for example. And, of where s( ) is the ˆ-length suﬃx of q ( ) and we deﬁnecourse, to use (2) directly to perform a meaningful # . . . s(0) ← S to be the total number of categoryapproximation, signiﬁcantly more samples would be choices (k times the number of passwords) in ourrequired. Thus, we introduce a simplifying, Markov dataset for scheme S. Here, the necessary conditionassumption: a user’s next decision is inﬂuenced only for the denominator of (4) to be nonzero for each ˆ ˆby her immediately prior decision(s) (e.g., see [16]). s( ) is that the dataset for S contain N samples,In other words, rather than condition on all of the e.g., in the Face scheme, twelve for ˆ = 1, and soprevious choices made in a password (q ( ) ), only on.the last few choices are taken into account. Let. . . x( ) ← S denote the selection of an -tuple, We further augment the above approach with ≥ , for which the most recent selections are smoothing in order to compensate for gaps in thex( ) . data (c.f., [16]). Speciﬁcally, we replace (4) with Pr ˆ q ( ) r(1) ← S | q ( ) ← SAssumption 4.1 There exists a constant ˆ ≥ 0 ˆsuch that if ≥ ˆ then # . . . s( ) r(1) ← S + λ ˆ · Ψ ˆ−1 ≈ (5) # . . . s( ˆ) ← S + λ ˆ Pr q ( ) r(1) ← S | q ( ) ← S ˆ ˆ ˆ ≈ Pr . . . s( ) r(1) ← S | . . . s( ) ← S (3) where s( ) is the ˆ-length suﬃx of q ( ) ; λ ˆ > 0 is a real-valued parameter; and where if ˆ > 0 then ˆwhere s( ) is the ˆ-length suﬃx of q ( ) . We denoteprobabilities under this assumption by Pr ˆ[·]. Ψ ˆ−1 = Pr ˆ−1 q ( ) r(1) ← S | q ( ) ← S and Ψ ˆ−1 = 1/N otherwise. Note that as λ ˆ is re-In other words, we assume that if ≥ ˆ, then the duced toward 0, (5) converges toward (4). And,user’s next selection r(1) is inﬂuenced only by her as λ ˆ is increased, (5) converges toward Ψ ˆ−1 , i.e.,last ˆ choices. This appears to be a reasonable as- a probability under Assumption 4.1 for ˆ − 1, asumption, which is anecdotally supported by certain stronger assumption. So, with suﬃcient data, wesurvey answers, such as the following from a user of can use a small λ ˆ and thus a weaker assumption.the Face scheme. Otherwise, using a small λ ˆ risks relying too heavily ˆ on a small number of occurrences of . . . s( ) ← S, and so we use a large λ ˆ and thus the stronger as- “To start, I chose a face that stood out from sumption. the group, and then I picked the closest face that seemed to match.” 4.2 MeasuresWhile this user’s intention may have been to choosea selection similar to the ﬁrst image she selected, weconjecture that the most recent image she selected, We are primarily concerned with measuring the abil-being most freshly on her mind, inﬂuenced her next ity of an attacker to guess the password of a user.choice at least as much as the ﬁrst one did. Assump- Given accurate values for Pr p(k) ← S for eachtion 4.1 also seems reasonable for the Story scheme p(k) , a measure that indicates this ability is theon the whole, since users who selected passwords by “guessing entropy” [18] of passwords. Informally,choosing a story were presumably trying to continue guessing entropy measures the expected number ofa story based on what they previously selected. guesses an attacker with perfect knowledge of the

7.
2^12probability distribution on passwords would need inorder to guess a password chosen from that distri- 2^10bution. If we enumerate passwords p1 (k) , p2 (k) , . . .in non-increasing order of Pr pi (k) ← S , then the 2^8guessing entropy is simply 2^6 i · Pr pi (k) ← S (6) 2^4 i>0 2^2Guessing entropy is closely related to Shannon en-tropy, and relations between the two are known.1 2^0 2^-15 2^-12 2^-9 2^-6 2^-3 2^0 2^3 2^6 2^9 2^12 2^15Since guessing entropy intuitively corresponds more !0closely to the attacker’s task in which we are inter- GS avg GSme GS25 GS10 Guessing Entropyested (guessing a password), we will mainly considermeasures motivated by the guessing entropy. Figure 4: Measures versus λ0 for StoryThe direct use of (6) to compute guessing entropyusing the probabilities in (5) is problematic for two and so we believe this to be a more robust use ofreasons. First, an attacker guessing passwords will (5). We use this sequence to conduct tests withbe oﬀered additional information when performing our dataset in which we randomly select a smalla guess, such as the set of available categories from set of “test” passwords from our dataset (20% ofwhich the next image can be chosen. For example, the dataset), and use the remainder of the data toin Face, each image choice is taken from nine images compute the enumeration Π.that represent nine categories of images, chosen uni-formly at random from the twelve categories. This We then guess passwords in order of Π until eachadditional information constrains the set of possible test password is guessed. To account for the ﬁrstpasswords, and the attacker would have this infor- issue identiﬁed above, namely the set of availablemation when performing a guess in many scenarios. categories during password selection, we ﬁrst ﬁlterSecond, we have found that the absolute probabil- from Π the passwords that would have been invalidities yielded by (5) can be somewhat sensitive to given the available categories when the test pass-the choice of λ ˆ, which introduces uncertainty into word was chosen, and obviously do not guess them.calculations that utilize these probabilities numeri- By repeating this test with non-overlapping test setscally. of passwords, we obtain a number of guesses per test password. We use Gavg to denote the average S2^12 over all test passwords, and Gmed to denote the me- S2^10 dian over all test passwords. Finally, we use Gx S for 0 < x ≤ 100 to denote the number of guesses 2^8 suﬃcient to guess x percent of the test passwords. 2^6 For example, if 25% of the test passwords could be guessed in 6 or fewer guesses, then G25 = 6. S 2^4 2^2 We emphasize that by computing our measures in this fashion, they are intrinsically conservative given 2^0 our dataset. That is, an attacker who was given 80% 2^-15 2^-12 2^-9 2^-6 2^-3 2^0 2^3 2^6 2^9 2^12 2^15 of our dataset and challenged to guess the remain- !0 GS avg GS me GS 25 GS 10 Guessing Entropy ing 20% would do at least as well as our measures suggest. Figure 3: Measures versus λ0 for FaceTo account for the second of these issues, we use the 4.3 Empirical resultsprobabilities computed with (5) only to determinean enumeration Π = (p1 (k) , p2 (k) , . . .) of passwordsin non-increasing order of probability (as computed To aﬃrm our methodology of using Gavg , Gmed , and S Swith (5)). This enumeration is far less sensitive to Gx as mostly stable measures of password quality, Svariations in λ ˆ than the numeric probabilities are, we ﬁrst plot these measures under various instances

8.
of Assumption 4.1, i.e., for various values of ˆ and,for each, a range of values for λ ˆ. For example, in 2^6.5the case of ˆ = 0, Figures 3 and 4 show measuresGavg , Gmed , G25 and G10 , as well as the guessing S S S S 2^6.0entropy as computed in (6), for various values ofλ0 . Figure 3 is for the Face scheme, and Figures 4 2^5.5is for the Story scheme. 2^5.0The key point to notice is that each of Gavg , Gmed , S SG25 and G10 is very stable as a function of λ0 , S Swhereas guessing entropy varies more (particularly 2^4.5 2^10 2^5for Face). We highlight this fact to reiterate our 2^0reasons for adopting Gavg , Gmed , and Gx as our S S S 2^4.0 !0 2^-5measures of security, and to set aside concerns over 2^-10 2^-5 2^0 2^-10whether particular choices of λ0 have heavily inﬂu- 2^5 2^10enced our results. Indeed, even for ˆ = 1 (with some !1degree of back-oﬀ to ˆ = 0 as prescribed by (5)), val- Figure 6: G25 versus λ0 , λ1 for Face Sues of λ0 and λ1 do not greatly impact our measures.For example, Figures 5 and 6 show Gavg and G25 for S S Population Gavg S Gmed S G25 S G10 SFace. While these surfaces may suggest more vari-ation, we draw the reader’s attention to the small Overall 790 428 112 35range on the vertical axis in Figure 5; in fact, the Male 826 404 87 53variation is between only 1361 and 1574. This is in Female 989 723 125 98contrast to guessing entropy as computed with (6), White Male 844 394 146 76which varies between 252 and 3191 when λ0 and λ1 Asian Male 877 589 155 20are varied (not shown). Similarly, while G25 varies Sbetween 24 and 72 (Figure 6), the analogous compu- Table 2: Results for Story, λ0 = 2−2tation using (5) more directly—i.e., computing the jsmallest j such that i=1 Pr pi (k) ← S ≥ .25—varies between 27 and 1531. In the remainder of and the Face scheme, respectively. Populations withthe paper, the numbers we report for Gavg , Gmed , S S less than ten passwords are excluded from these ta-and Gx reﬂect values of λ0 and λ1 that simultane- S bles. These numbers were computed under Assump-ously minimize these values to the extent possible. tion 4.1 for ˆ = 0 in the case of Story and for ˆ = 1 in the case of Face. λ0 and λ1 were tuned as indicated in the table captions. These choices were dictated2^10.65 by our goal of minimizing the various measures we consider (Gavg , Gmed , G25 and G10 ), though as al- S S S S2^10.60 ready demonstrated, these values are generally not particularly sensitive to choices of λ0 and λ1 .2^10.55 The numbers in these tables should be considered2^10.50 in light of the number of available passwords. Story 2^10.45 2^10 Population Gavg S Gmed S G25 S G10 S 2^10.40 2^5 Overall 1374 469 13 2 2^0 2^10.35 !0 Male 1234 218 8 2 2^-5 2^-10 2^-5 Female 2051 1454 255 12 2^0 2^-10 2^5 Asian Male 1084 257 21 5.5 2^10 !1 Asian Female 973 445 19 5.2 White Male 1260 81 8 1.6 Figure 5: Gavg versus λ0 , λ1 for Face S Table 3: Results for Face, λ0 = 2−2 , λ1 = 22Tables 2 and 3 present results for the Story scheme

9.
has 9 × 8 × 7 × 6 = 3024 possible passwords, yielding Moreover, there was also signiﬁcant correlationa maximum possible guessing entropy of 1513. Face, among members of the same race. As shown in Ta-on the other hand, has 94 = 6561 possible passwords ble 5, Asian females and white females chose from(for ﬁxed sets of available images), for a maximum within their race roughly 50% of the time; whiteguessing entropy of 3281. males chose whites over 60% of the time, and black males chose blacks roughly 90% of the time (thoughOur results show that for Face, if the user is known the reader should be warned that there were onlyto be a male, then the worst 10% of passwords can three black males in the study, thus this number re-be easily guessed on the ﬁrst or second attempt. quires greater validation). Again, a number of exitThis observation is suﬃciently surprising as to war- surveys conﬁrmed this correlation, e.g.:rant restatement: An online dictionary attack ofpasswords will succeed in merely two guesses for “I picked her because she was female and10% of male users. Similarly, if the user is Asian Asian and being female and Asian, I thoughtand his/her gender is known, then the worst 10% of I could remember that.”passwords can be guessed within the ﬁrst six tries. “I started by deciding to choose faces ofIt is interesting to note that Gavg is always higher S people in my own race ... speciﬁcally, peoplethan Gmed . This implies that for both schemes, S that looked at least a little like me. Thethere are several good passwords chosen that sig- hope was that knowing this general piece ofniﬁcantly increase the average number of guesses information about all of the images in myan attacker would need to perform, but do not af- password would make the individual facesfect the median. The most dramatic example of easier to remember.”this is for white males using the Face scheme, whereGavg = 1260 whereas Gmed = 81. S S “... Plus he is African-American like me.”These results raise the question of what diﬀerent Female Male Typical Typicalpopulations tend to choose as their passwords. In- Pop. Model Model Female Malesight into this for the Face scheme is shown in Ta- Female 40.0% 20.0% 28.8% 11.3%bles 4 and 5, which characterize selections by gender Male 63.2% 10.0% 12.7% 14.0%and race, respectively. As can be seen in Table 4,both males and females chose females in Face signif- Table 4: Gender and attractiveness selection inicantly more often than males (over 68% for females Face.and over 75% for males), and when males chose fe-males, they almost always chose models (roughly80% of the time). These observations are also widely Insight into what categories of images diﬀerent gen-supported by users’ remarks in the exit survey, e.g.: ders and races chose in the Story scheme are shown in Tables 6 and 7. The most signiﬁcant deviations between males and females (Table 6) is that females “I chose the images of the ladies which chose animals twice as often as males did, and males appealed the most.” chose women twice as often as females did. Less pronounced diﬀerences are that males tended to se- “I simply picked the best lookin girl on each lect nature and sports images somewhat more than page.” females did, while females tended to select food im- ages more often. However, since these diﬀerences “In order to remember all the pictures for my login (after forgetting my ‘password’ 4 times in a row) I needed to pick pictures I could Pop. Asian Black White EASILY remember - kind of the same pitfalls Asian Female 52.1% 16.7% 31.3% when picking a lettered password. So I chose Asian Male 34.4% 21.9% 43.8% all pictures of beautiful women. The other Black Male 8.3% 91.7% 0.0% option I would have chosen was handsome White Female 18.8% 31.3% 50.0% men, but the women are much more pleasing White Male 17.6% 20.4% 62.0% to look at :)” Table 5: Race selection in Face. “Best looking person among the choices.”

10.
100%were all within four percentage points, it is not clear Face Story 98%how signiﬁcant they are. Little emerges as deﬁnitive 96%trends by race in the Story scheme (Table 7), par- 94%ticularly considering that the Hispanic data reﬂects Correct Login % 92%only two users and so should be discounted. 90% 88% 86% 84%5 Memorability evaluation 82% 80% 0 4 8 12 16In this section we brieﬂy evaluate the memorabil- Time Since Last Login Attempt (days)ity of the schemes we considered. As described inSection 2, there have been many usability studies Figure 8: Memorability versus time since last loginperformed for various graphical password schemes, attempt. Each data point represents the average ofincluding for variants of the Face scheme. As such, 90 login attempts.our goal in this section is not to exhaustively eval-uate memorability for Face, but rather to simplybenchmark the memorability of the Story scheme ently few of them actually chose stories, despite ouragainst that of Face to provide a qualitative and suggestion to do so. Nearly 50% of Story users re-relative comparison between the two. ported choosing no story whatsoever in their exit surveys. Rather, these users employed a variety ofFigure 7 shows the percentage of successful logins alternative strategies, such as picking four pleasingversus the amount of time since the password was pictures and then trying to memorize the order ininitially established, and Figure 8 shows the per- which they picked them. Not surprisingly, this con-centage of successful logins versus the time since tributed very signiﬁcantly to incorrect password en-that user’s last login attempt. Each ﬁgure includes tries due to misordering their selections. For exam-one plot for Face and one plot for Story. A trend ple, of the 236 incorrect password entries in Story,that emerges is that while memorability of both over 75% of them consisted of the correct images se-schemes is strong, Story passwords appear to be lected in an incorrect order. This is also supportedsomewhat harder to remember than Face. We do anecdotally by several of the exit surveys:not ﬁnd this to be surprising, since previous studieshave shown Face to have a high degree of memora- “I had no problem remembering the fourbility. pictures, but I could not remember the 100% original order.” Face Story 95% “No story, though having one may have helped to remember the order of the pictures better.”Correct Login % 90% 85% “... but the third try I found a sequence that I could remember. ﬁsh-woman-girl-corn, 80% I would screw up the ﬁsh and corn order 50% of the time, but I knew they were the pic- 75% tures.” 70% 0 20 40 60 80 Time Since Password Change (days) As such, it seems advisable in constructing graphical password schemes to avoid having users rememberFigure 7: Memorability versus time since password an ordering of images. For example, we expect thatchange. Each data point represents the average of a selection of k images, each from a distinct set of100 login attempts. n images (as in the Face scheme, though with im- age categories not necessarily of only persons), willOne potential reason for users’ relative diﬃculty in generally be more memorable than an ordered se-remembering their Story passwords is that appar- lection of k images from one set. If a scheme does

11.
Pop. Animals Cars Women Food Children Men Objects Nature Sports Female 20.8% 14.6% 6.3% 14.6% 8.3% 4.2% 12.5% 14.6% 4.2% Male 10.4% 17.9% 13.6% 11.0% 6.8% 4.6% 11.0% 17.2% 7.5% Table 6: Category selection by gender in Story Pop. Animals Cars Women Food Children Men Nature Objects Sports Asian 10.7% 18.6% 11.4% 11.4% 8.6% 4.3% 17.1% 11.4% 6.4% Hispanic 12.5% 12.5% 25.0% 12.5% 0.0% 12.5% 12.5% 12.5% 0.0% White 12.5% 16.8% 13.0% 11.5% 6.3% 4.3% 16.8% 11.1% 7.7% Table 7: Category selection by race in Storyrely on users remembering an ordering, then the im- (as is [4]), and our study indicates that password se-portance of the story should be reiterated to users, lection in this scheme is suﬃciently free from bias tosince if the sequence of images has some semantic suggest that reasonable limits could be imposed onmeaning then it is more likely that the password password guesses to render the scheme secure. Foris memorable (assuming that the sequences are not example, the worst 10% of passwords in the Storytoo long [21]). scheme for the most predictable population (Asian males) still required twenty guesses to break, sug- gesting a limit of ﬁve incorrect password guesses might be reasonable, provided that some user ed-6 Conclusion ucation is also performed. The relative strength of the Story scheme must beThe graphical password schemes we considered in balanced against what appears to be some diﬃcultythis study have the property that the space of pass- of memorability for users who eschew the advice ofwords can be exhaustively searched in short order using a story to guide their image selection. An al-if an oﬄine search is possible. So, any use of these ternative (besides better user education) is to per-schemes requires that guesses be mediated and con- mit unordered selection of images from a larger setﬁrmed by a trusted online system. In such scenarios, (c.f., [4, 7]). However, we believe that further, morewe believe that our study is the ﬁrst to quantify fac- sizeable studies must be performed in order to con-tors relevant to the security of user-chosen graphical ﬁrm the usability and security of these approaches.passwords. In particular, our study advises againstthe use of a PassfacesTM -like system that permitsuser choice of the password, without some means tomitigate the dramatic eﬀects of attraction and race 7 Acknowledgmentsthat our study quantiﬁes. As already demonstrated,for certain populations of users, no imposed limiton the number of incorrect password guesses would The authors would like to thank Joanne Houlahansuﬃce to render the system adequately secure since, for her support and for encouraging her studentse.g., 10% of the passwords of males could have been to use the graphical login server. We also extendguessed by merely two guesses. our gratitude to all the students at Carnegie Mel- lon University and Johns Hopkins University whoAlternatives for mitigating this threat are to pro- participated in this study.hibit or limit user choice of passwords, to educateusers on better approaches to select passwords, or toselect images less prone to these types of biases. Theﬁrst two are approaches initially attempted in the Notescontext of text passwords, and that have appearedin some graphical password schemes, as well. TheStory scheme is one example of the third strategy