[Update: The main purpose of this post is to present and demonstrate a method of risk estimation and quantification to support practical policy decision. The email password policy is just a simplistic case to facilitate the debate. I also modified the blog post title and the text below to make it clear that this method is aimed to support quantitative risk estimation.]

What is the risk-driven, correct frequency of changing my email password?

<crickets…. silence… more silence>

Yes, we all can quote that “PCI DSS says 90 days” or “whatever regulation says 30 days”, but what does risk say? What actuarial information we need – if we are to define risk through probability of loss? What info about my email usage? Value of information stored there? Frequency of attacks on other similar email accounts? Chances of attack success? My approach to protecting the password? My personal password reuse “policy?” Anything else? On a related note, maybe this is simpler: what is my risk [of having the account compromised] if I change the password every 30 days, 90 days, 300 days?

So, any idea how to go about it?

This little experiment might well show us that “risk-based security” is an awesome thing – but not one achievable in this world today… [emphasis in original]

I wanted to blog about this, but hadn’t collected enough specifics. Now I can, thanks to the blog conversation by David Mortman, Rich Mogull, Chris Popper, and “Steve”, we have some smart/experienced people providing the needed detail.

Below, I offer a method for reasoning in order to estimate relative risk of alternatives that is compatible with quantitative risk analysis management, but doesn’t require massive amounts of risk calculations. I use the conversation by Mortman, et. al. as an example of this method in action (armchair-style).

The Method — Abductive Validation

The following is a fairly generic method to guide decisions when you only have partial information/evidence and a rough estimate of overall risk. It is a form of abductive validation, which is reasoning to the best explanation from available evidence along with evaluation criteria. [Update: it’s also called “Analysis of Competing Hypothesis” in the Intelligence community. C.f. book . There is a cool extension using Subjective Logic described in slides and a paper]

Step 1. Frame your decision alternatives in terms of hypotheses that could, in principle, be refuted with enough evidence. In this case, I propose the following hypotheses:

[Update: having these four hypotheses is critical to the method, rather than just a single hypothesis such as #1. The reason is that different pieces of evidence may support or contradict one or more of these hypothesis. For example, just because a given piece of evidence contradicts #1 doesn’t mean it supports #2 or #3.]

Step 2. Define the Risk( ) function to make it operational.

Start with a metric for aggregate risk for the whole business unit. I prefer “Total Cost of Security” but other aggregate risk assessment methods will do (e.g. NIST 800 series) . Next, go through the steps to reach at least a rough estimate of aggregate risk. In the process, you should be able to identify the operational factors that have the biggest effect on aggregate risk. These are called “risk drivers“, and the concept is analogous to “cost drivers” in Activity-based Costing. You should be able to rank order them by degree of impact. Any thing that increases the risk drivers also tends to increase the aggregate risk metric.

Then, define your Risk( ) function in relation to these risk drivers. This is the key to making this method tractable yet also sufficient to guide decisions using quantitative risk management principles. In other words, you aren’t seeking a risk function that relates the policy variable directly to aggregate risk. Instead, you relate the policy variable to the risk drivers.

Here’s a simple example using email password policy. Let’s say your organization has only three risk drivers: 1) Number of confidentiality breaches of person-to-person communications, 2) Number of breaches of end-user email accounts, and 3) Number of non-employees who have access internal information system. The Risk( ) function would be defined according to the effect that password policy had to increase or decrease any of these risk drivers. You can either use quantitative or qualitative functions to evaluate impact on the risk drivers.

Step 3. Collect evidence regarding each hypothesis, both pro and con. Evidence can be quantiative, qualitative, or a mix. (You’ll need formal rules for evaluating the quality and strength of the evidence, but to keep this post from getting too long, I won’t go into that. ) Stop when you have enough evidence to choose among the hypotheses, or you run out of time or money. (If this happens, you are left with the inconclusive hypothesis #4, above.)

Step 4. Evaluate the evidence and decide to support or refute each hypotheses. If multiple conflicting hypotheses are supported, then either collect more evidence to exclude one of the hypotheses, or accept a mixed or ambiguous conclusion.

Example of Arguments and Evidence — Pro and Con

Mortman’s blog conversation is an excellent example of how this evidence-based argumentation should be done. This is an armchair debate, so they don’t follow the steps exactly, the discussion is very incomplete. But I think it has enough specifics to show how a formal study could proceed. (BTW, all uses of the word “evidence” in the commentary below is short-hand for “evidence that he thinks someone could collect…”. No one is actually pointing to solid evidence.)

Mortman starts with a challenge statement that expresses his support for hypothesis #3 (PW change policy makes no difference):

Show me any reasonable evidence that changing all your users’ passwords every 90 days reduces your risk of being exploited.

In the first comment, Steve offers evidence in support of hypothesis #1 (PW change policy reduces risk):

Aside from regular password change intervals, is there a way to mitigate offline brute-force attack? Assuming an attacker uses any of a number of methods to grab a password hash, and that the hash isn’t some sort of weak LM silliness, an attacker is left with a long-running brute force process, depending on the computational power available. For most organizations, a password change policy of 90 or 180 days would likely make the results of the brute force moot.

Given that offline brute-force is a realistic threat, isn’t a password change policy a reasonable control?

Mortman counters this argument in comment 2, not by refuting it, but by offering more evidence in support of hypothesis #3 (no difference):

It sounds like a realistic threat, except for the fact, that if someone has been able to get your password hashes, then they are unlikely to need to brute force passwords. They already have the access they need to get to the data that they want. If you own the authentication system, passwords no longer matter. Even if they need or want passwords, they now have the ability to capture them at will.

Steve counters Mortman in comment 3, again by offering evidence in support of hypothesis #1:

The specific scenario I was thinking of involved cracking an Active Directory domain member, and then dumping the hashes of the last 10 logged-in users (which is used to authenticate users when the domain controller is unavailable). There’s a good chance that a regular user’s PC will have had a Help Desk or other more-privileged account logged in within the last 10, and by cracking that hash, the attacker would gain access to higher privileges.

Chris Pepper chimes in, basically agreeing with Mortman on hypothesis #3, appealing to evidence that such attacks are not frequent in most scenarios:

90-day password change policies are stupid in >90% of scenarios. There are probably some DoD scenarios under active attack where they make sense. The clowns who insist normal business systems need 30 or 90 day password expirations don’t mean *all* users should disbelieve *all* professional security advice.

Then he offers a counter argument to Steve, with evidence in support of hypothesis #3 (i.e. password strength matters much more than change frequency):

Then he offers evidence supporting Steve’s evidence in favor of hypothesis #1:

There are various ways you might get a UNIX /etc/shadow file from backups, so reversing password hashes is a real threat.

Mortman responds to Steve by proposing a sub-problem to evaluate (i.e. strength of has relative to time to crack):

Okay so lets take your scenario. The question you have to ask, is how long is the hash going to stand up to attack. With a strong hash, it’s going to be a heck of a lot longer then 90 or even a 180 days, possibly years. In that case, what’s the justification for changing it on a 90-180 day schedule? Realistically though, what this means is that you now have a more complex risk question around how likely is it that someone is going to break in and get the hashes and how long are you willing for them to have use of those passwords?

Then Mortman asserts that this sub-problem may be unresolvable due to lack of evidence:

We at this point have little to no data on how likely this sort of attack is to occur so we can’t even take even a bad guess at if 90 days is a good number or a bad number. But until we have some data, we’re just making stuff up so make ourselves feel like we’re doing something.

Based on all the recent breach reports and investigations, it doesn’t look like password cracking is a major vector anymore (I’m not willing to stand behind that statement, but that’s my reading of these reports).

Finally, Mogull hints that there might be evidence in favor of hypothesis #2 (policy increases risk by increasing total costs):

With modern systems (no more NTLANMAN) is it really a risk? Is that risk greater than the cost of password rotations?

—–

And so on…

I hope this example gives you a feeling of how the Abduction Validation method can work in practice. If you really care about the quality of the answer, you would need to be formalized through investigation, data collection, and experiements. Also, by enumerating hypotheses in the way I described, this method has the great advantage of telling you when the evidence is inadequate to support any hypothesis or alternative.

Glad to hear it, Phil. Have you been using this technique for a while, or just recently? How long did it take before you and your team got comfortable with it? How much extra time and effort did it take?

I ask because a knee-jerk reaction I get is “We don’t have time/resources to do all this analysis”.

Sorry for the delayed reply. I luckily ran back into this post. There is no alert system here. If you want to chat, find me via LinkedIn.

It took many iterations to figure out and it’s still not perfect, but my premise was to first not give up (come up with something useful), set deadlines for each iteration, figure out what numerical values we had and mix these quantitative measures and with guessed qualitative numbers to calculate risk scores, stack them up to rank priorities, and go from there. When the teams had time, I asked them to spot check and engage on Medium-to-Low risk requests, and use that as a feedback loop to assess how/if the calculations were off and adapt the numbers and/or the algorithm.

For the overall application/process risk register, we cycled through reviews semi-annually and mainly used qualitative measures to adjust.

Before I left my last post, we were plugging operational risk, brand risk, and security risk into the corporate GRC movement to add to their scoring system and getting out of the one-off risk business.

My teams needed this at the time to stack rate and rank security requests coming in and a way to identify what was most important to security and why we were spending time in certain areas. It also helped demonstrate team effectiveness and resource utilization by risk. I more than doubled my headcount in my time, got people promoted by demonstrating success, and achieved a lot in the short time I was there at a very large, global company affecting almost 8,000 developers that were not use to taking leadership from the security team, having standards (let alone security standards), and were use to working in a very siloed manner. The heart of my SDL program was based on these risk assessments and establishing a risk register that accounted for our application and process inventory and risk rating and stack ranking ALL of this company’s apps. No small feat but done in under a couple of years, moving a lot of earth, and helping people [understand that they needed to] march in the same direction for the sake of security [and privacy and compliance based on risk].

At least in a Windows network, there’s another reason to frequently change passwords. If I pop a machine in your domain (AD), I can pull the cached authentication hashes out of memory and use them to authenticate to other Windows machines in the domain. That’s right, without cracking them. It’s called Pass the Hash, you can find lots of information about it with a quick Google. Do you ever use remote administration tools with domain administrator credentials? If so, each machine you administer has that hash sitting around for some time.

Changing the passwords invalidates those hashes. So, how often do you want to make that attacker work to get back in?

Yes, someone on that machine already has access to that data. However, in some scenarios, you need to worry about someone establishing a foothold and holding it for later use. That’s the time that you have to worry about cached credentials and password cracking.

On the flip side, the more often you make your users change passwords, the harder time they’ll have remembering complex passwords. At some point they’ll either dial down their complexity (everyone loses here) or start writing them down. Ouch.

The real answer is one time passwords generated by some token. Best of both worlds

This is promising. Something I’m not quite understanding is how tangential controls are assessed in this methodology, and if it’s simple to do so, where do you stop?

To understand what I mean by tangential controls, let’s look at Windows platforms as an example where there are several password-related controls: length, history, complexity, and so on. This set of controls is directly related to the conversation presented in the original post.

But, in the conversation, there were at least two other sets of controls hinted at. At one point, the argument in favor of changing passwords required adding a machine into the domain (it could have been implied that this is someone authorized to do so). Furthermore, there is an overall assumption that everything sent between servers is encrypted where it ought to be, which may not be the case. Windows has several SMB-related encryption/signature controls, and encryption won’t necessarily be enabled, which may lead to grabbing hashes on an improperly secured network. Windows provides another set of controls aimed at the prevention of passive network attacks, and there’s another set of controls seeking to ensure that smart switches, VLANs, and other mitigating factors are considered or in place.

This is the long way around to the point that assessing the risk for any single control or set of related controls (i.e. password controls) might require looking at the risk of certain tangential controls. To properly assess the risk involved in password cracking, it’s important not just to look at what controls related directly to the task, but also to those that enable the task.

If I have no controls over who could add machines to my domain, then that affects my password lifetime. If I have no controls over who can sniff my network and I do not have encryption enabled for SMB traffic, then that also affects my password lifetime.

But, are there controls affecting passive network attacks? Are there controls affecting the controls governing adding machines to the domain?

You have to start somewhere and our industry is mature enough (it should be at least) to know what generically is most important, compensate per the business that you’re in and then also work with the business to know what’s important to them, know the business current and future goals, guide them towards figuring out where the risk is, and then mash this all up to determine what areas flow into one another and then to rate against one another as a whole.

It sounds simple, but it’s not. A lot of people speak of holistic security but rarely do they bring risk into the equation.

To be direct, password strength is a moot point to me in itself. Set your standard and measure compliance to it. Know that AD versus 2factor auth has different security strength and apply where needed and account for the risk in the application.

The whole view is more important. If passwords are the only threat vector that exists, fine, but doubt that addresses the whole picture.