Richard Bejtlich's blog on digital security, strategic thought, and military history.

Sunday, May 31, 2009

Information Security Incident Rating

I've been trying to describe to management how close various individual information assets (primarily computers -- desktops, laptops, etc.) are to the doomsday scenario of sensitive data exfiltrated by unauthorized parties. This isn't the only type of incident that worries me, but it's the one I decided to tackle first. I view this situation as a continuum, rather than a "risk" rating. I'm trying summarize the state of affairs for an individual asset rather than "model risk."

In the far left column I've listed some terms that may be unfamiliar. The first three rows bear "Vuln" ratings. I list these because some of my businesses consider the discovery of a vulnerability in an asset to be an "incident" by itself. Traditional incident detectors and responders don't think this way, but I wanted to include this aspect of our problem set. For these first three rows, I consider these assets to exist without any discoverable or measurable adversary activity. In other words, assets of various levels of vulnerability are present, but no intruder is taking interest in them (as far as we can tell).

The next four rows (Cat 6, 3, 2, 1) should be familiar to those of you with military CIRT background. About 7 or 8 years ago I wrote this Category Descriptions document for Sguil. You'll remember Cat 6 as Reconnaissance, Cat 3 as Attempted Intrusion, Cat 2 as User Intrusion, and Cat 1 as Root/Admin Intrusion. I've mapped those "true incidents" here. These incidents indicate an intruder is taking interest in a system, to the degree that the intruder gains user or root level control of it. In the event the intruder doesn't need to gain control of the asset in order to steal data, you can simply jump to the appropriate description of the event in the final three rows.

The final three rows (Breach 3, 2, 1) are what you might consider "post exploitation" activities, or direct exploitation activities if no control of the asset is required in order to accomplish the adversary's data exfiltration mission. They loosely map to the reinforcement, consolidation, and pillage phases of compromise I outlined years ago. I've used the term "Breach" here to emphasize the seriousness of this aspect of an intrusion. (Gunter's recent post Botnet C&C Participation is a Corporate Data Breach reinforced my decision to use the term "breach" in situations like this.) Clearly Breach 3 is a severe problem. You might still be able to avoid catastrophe if you can contain the incident at this phase. However, intruders are likely to quickly move to Breach 2 and 1 phases, when it's Game Over.

If there has to be an "impact 0" rating, I would consider that to be the absence of an information asset, i.e., it doesn't exist. Any asset whatsoever has value, so I don't see a 0 value for any existing systems.

At the other end of the spectrum, if we have to "crank it to 11," I would consider an 11 to be publication of incident details in a widely-read public forum like a major newspaper or online news site.

I use the term "impact" in this sense: what is the negative impact of having the individual asset in the state described? In other words, the negative impact of having an asset with impact 1 is very low. We would all like to have assets that require an intruder to apply substantial effort to compromise the asset and exfiltrate sensitive data. At the other end of the spectrum we have the "game over" impact -- the intruder has exfiltrated sensitive data or is suspected of exfiltrating sensitive data based on volume, etc. Even if you can't tell exactly what an intruder exfiltrated, if you see several GBs of data leaving a system that houses or access sensitive data, you can be fairly confident the intruder grabbed it.

I listed some sample colors for those who understand the world in those terms.

What do you think of this rating system? I am curious to hear how others explain the seriousness of an incident to management.Richard Bejtlich is teaching new classes in Las Vegas in 2009. Regular Las Vegas registration ends 1 July.

Update: Since writing this post, I've realized it is more important to think of these events as intrusions. The word "incident" applies to a broader set of events, including DDoS, lost or stolen devices, and the like. My use of the word "intruder" throughout the post indicates my real intention.

21 comments:

I think the descriptions for the 10 impact levels make sense - as too does the order - but I wonder whether it's too many? I'm not sure that having 10 different levels of impact facilitates the necessary corporate response that it should/would govern.

You see a related problem with CVSS and the scores running 1-10. Sure, you've got a great scoring system, but at the end of the day the folks on the ground still translate it High, Medium and Low -- which further translates to the actions "fix it now", "fix it later" and "ignore it for now" (OK, so perhaps I'm being a little cynical).

I'd also perhaps question how an organization would derive the impact value. Some of the level definitions are subjective and would require a human to make the translation. An automated system will struggle with "substantial", "moderate" and "little effort", and sensitive/nonsensitive data can be a political minefield within an organization. Then again, perhaps an automated system would be impossible in this situation because the data sources are too different?

I like the overall rating system, and Vuln/Breach breakdown is good - but does it make sense to include the Vuln 1-3 within an Incident Rating system? Aren't they part of the risk calculation prior to the incident?

I agree with Gunter. I think this scan would be great for security professionals, but if trying to relay the severity of an intrusion to executive management, 10 may be overkill. Management does want to know 1,2,3 and they need to make a decision fast.

I had low-medium-high in mind when I designed the chart, in case we had to reduce to that level. 1-3 are low, 4-6 are medium, and 7-9 are high. 10 is game over so it's beyond high. If I had to put these on a picture it would probably be green, amber, and red, with black saved for game over.

If you need to apply more rigor to the first three rows, you can use CVSS scores. Roughly speaking you could have something like "1) No known vulnerabilities, 2) one or more patchable vulnerabilities, and 3) one or more unpatchable vulnerabilities."

I would, but CVSS brakes fundamental laws of the universe by using math between ordinal scales. Also, even if it did perform calculations in a rational manner, it's limited in probabilistic information input.

What's even worse is that in 1-3 you're talking about the force applied vs. your ability to resist it. CVSS might be one piece of relevant information, but it cannot give you accurate information regarding your ability to resist that force. You have to move beyond CVSS, beyond threat/vuln pairing even, to discuss an ability to P/D/R within context of variability for various skill sets.

Finally, I just hate the idea that you can simply substitute their opinions & guesses for yours and pretend that the risk of being wrong is "transferred" to their "authority".

Our incident definitions are similar to yours in that they are very much defined by access to sensitive data. However, other events must also be taken into consideration and ranked.

For example, a virus that infects 80% of an organization's computers rendering them unusable and forcing support staff to redirect weeks of manhours of effort should be classified very high. And the severity of the event may vary with the organization's calender and business processes. But no sensitive data may be compromised.

An incident that puts incorrect or malicious information on a corporate web site can do unlimited damage to reputation and could conceivably result in legal liabilities due to infected visitors. But no sensitive data may be compromised.

Scale must be considered. Number of machines, people, processes affected.

We're getting ready to rewrite our incident scoring to take into account these other factors and scenarios. I'm not sure how we're going to approach it yet but I know we need to do it.

CVSS unfortunately lacks sufficient depth to handle the likelihood of an attack happening. Similarly it doesn't account for compensating controls, so I'm not convinced that it's the solution in this case.

Well, don't get me wrong, I like what you're doing there, I just don't think CVSS is the *only* source of information you would want to consider. And I think CVSS can be informative, just within context.

This looks very much like the ranges of risk that would be generated in a standard "likelihood-vs.-consequence" qualitative risk heat map. In essence, you're defining the bad things that could happen (or in this case, are currently are happening). I think this is a very useful way of categorizing the methods by which sensitive data can egress, and I think the ratings are nice and granular (which is good).

This is a great scale for gauging incidents that are in progress, but I think it could also be turned into a more proactive measurement system through qualitative risk. To do that, you could define the level of effort for the intruder to realize each rating/threat (which roughly translates into the "likelihood" of qualitative risk) and the impact to the business if the threat is realized (the "consequence" of qual. risk).

This kind of matrix can really help a security professional show the value of security processes and controls (or the lack thereof); it's also a great way to justify budgets (especially if you enforce signoff of risk acceptance... nothing makes an exec more terrified than the idea of being able to trace a data breach to his/her inaction... :)

I am trying to relate to the primary target/audience, as implied by the statement: "I am curious to hear how others explain the seriousness of an incident to management.". What I am usually challenged with, vis-a-vis this very specific issue, would be to equate levels with $$($...), i.e. cost of [incident level being addressed] support services vs. "depth" of impact.

Richard - I know I'm not saying anything you don't already know - just thinking out loud for a moment...

It sounds to me like the next logical step is to relate the potential technical impact to the impact on the business so that it becomes more relevant to the intended audience. Obviously, at the point you've reached #10 and #7 you've bypassed any mitigating controls. Knowing what user/app/data was targeted and/or exfiltrated is critical to understanding the impact to the business and therefore should be the focus of the escalation, right? (I know this assumes a lot of visibility or data categorization/management, but doesn't everything?)

(side note: I hate FUD... but ... we should do a red team though exercise on what we "could" do to the business with information lost as a result of recent breaches - I think we could re-define worst case loss in exponential terms)

In the end, it seems we as security teams really are just another intelligence feed to the exec mgmt layer. Providing the best information available to us based on the data, our understanding of the business, our experience/expertise and the relevant facts as they are known at the time. The Exec team will make both tactical and strategic decisions and hopefully, if we've communicated clearly, they will at least have considered our input. I would love to get to a more influential posture, but it seems to be that we as a profession have to mature a bit more before we get there. Better, clearer and easier Risk Models are a good start but IMHO they need a lot more "real-world" application before they become usable.

Quantifying (qualitative) risk by using the technical impact of an Incident means too much is lost in translation to executives. It is almost as if we're going to need to be business experts and understand how data impacts business in order to communicate the impact and affect change. In the end, I think we have to find a better way to communicate, back to the basics - Clear, Concise, Accurate and Timely.

Let me start by saying it's easy to criticize and hard to create or do. So, I'm not criticizing the effort in anyway, but providing some thoughts only.

It sounds like this process is trying to relay battlefield data to a four star general (management) when he's more interested in the overall posture and status of his forces. The general doesn't necessarily want to know a particular asset or force is taking fire, but wants to know when an objective is achieved or that substantial reinforcements need to be committed. For management it makes more sense to provide the posture of assets and their position relative to compromise. I struggled for years to characterize the state of assets in relation to a compromise event, but finally landed on an abstraction of time-based security. Since trying to operationalize time-based security is extremely difficult I fell back to substituting distance for time. In other words, I use an index of 0-5 which represents the concept of effective protection (EP) applied to an asset. The amount of effective protection (or distance from a compromise event) increases as the index number increases resulting in an asset which is more resistant to compromise. The index takes into account the assets' relationships to other assets (level of network connectivity, trusted connections to other systems, etc), its control state (maturity/density/balance) and the environment in which it operates (public, semi-public and private). I know many readers of this blog don't believe control-based assessments have value, but I'm not sure how one would characterize the posture of an asset without them. Control state is just one aspect used to construct the index and many operational aspects are included. You're own perspective calls out the need to minimize, monitor, keep up-to-date, and control access. Those are controls and if they're not defined then their state and effectiveness can't be measured.

The process I've described above provides management some perspective on where a particular asset is in relation to a compromise event from an overall posture perspective. If they see an asset of significant value with a unacceptably low index number appropriate steps can be taken to elevate the asset to a higher level of effective protection. The Information Security Incident Rating described in the original post would provide a more real-time view of attacks in progress and would enable an organization to prevent or verify a zero effective protection event.

Effective protection is achieved when controls and operational security processes applied to a resource in aggregate successfully protect the asset from compromise. I use the term effective protection instead of protection since it's possible to have some controls operating effectively and be compromised.

Our threat matrix uses "Exploitability" as the top row and includes "inside job" and "not likely" as probability factors. that turns out to be a big deal when clients need to meet a non-exploitable regulatory standard.Exploitability Columns: Easy – This column indicates that the vulnerability was able to be exploited during the same onsite visit in which it was discovered. Hard – This column indicates that the vulnerability was able to be exploited only after a return visit or required more research and possibly a custom application to exploit the vulnerability, or could not be exploited during the alotted time. Not Likely – This column indicates that the vulnerability required multiple successive dependencies to be in alignment for a successful exploit to occur – or – required significant custom application development to exploit the vulnerability Inside Job – This column indicates that the vulnerability requires access to the FI infrastructure or the JHA infrastructure for a successful exploit to occur. These findings need to be compared against legal and regulatory standards. Not Currently Exploitable – This classification is used when a finding may or may not have a security impact in the current environment; however it’s real value is understood when the context of the finding is considered against the future development efforts of the client.Severity/Impact/Payoff Rows: Informational: Successful exploitation has no real impact. Low Impact: Successful exploitation leads little or no value to criminals. Medium Impact: Successful exploitation yeilds a significant payoff to the criminals. High Payoff: Successful exploitation leads to client identity theft, severe data loss.

In my opinion, the security is subjetive, I mean, previously, in the organization we have to define what is a security incident to my organization and what events triggers my security incident response process. For example, for a ISP, a scanning ports represent a typical activity, but no for a bank institution. Is the same case to the CVSS formula, maybe it's practical and funtional for someone, maybe not for the rest. But always considering a guide beside, like NIST and SANS documents, known taxonomies, for metion some kind of documentation, but not start from scratch.