Malicious hackers have long collected and used data in a systematic manner. For example, they investigate all the public-facing servers of a particular target company, as well as document their IP addresses, services, software versions, and back-end relationships. They collect as much publicly accessible information as possible, including harvested credentials, then test potential weak spots.

So-called advanced persistent threat (APT) adversaries have large databases with deep information on each target to identify existing hacking pathways. Typically, a separate database holds their collection of zero-days. When a target is identified, those databases help determine the plan of attack.

According to sources who have spoken publicly, zero-days are used as a last resort. That implies that APT hackers have detailed lists of vulnerabilities for each target.

Elite attack squads

I’ve seen many APT teams come back to an existing vulnerable target, move to different servers they had access before, and type long, complicated directory path names without a mistake.

They immediately pull up the CIO and CISO’s email account and type in the right passwords as fast as the legitimate owners. They type in keyword searches that involve terms they’ve searched for earlier in combination with new terms. They will use one company’s partnership to break into the other company’s network. They know the major players, the key databases, and the most valuable file shares -- and they do this at every target company. It’s obvious they are using databases to track relevant information.

The people who are behind the data curve are the defenders.

Most defenders have at most a few good databases to support defense, beginning with a database detailing all the malware detected by their main antivirus product. They probably have a rudimentary aggregated event log and possibly a vulnerability database listing their own assets' found vulnerabilities. What they lack is a complete picture from end to end.

Your cyber security dream database

I know a few companies working on “dream” cyber security databases. They inventory all their existing security databases, bring them into one or more larger aggregated databases, and normalize them to derive valuable information.

For threat intelligence, they will track not only external, generalized threat intelligence, but also their own local attacks. This is huge because most companies (for reasons I can’t explain) fail to track their own security incidents. They will often know more about how the world or a specific industry is hacked than they do about their own experiences.

Not that creating a consolidated database about attacks on your company is necessarily easy -- this particular data stream usually requires information from several different databases, including antimalware, firewall logs, event logs, Web server logs, file auditing, and application auditing, at a minimum.

You have to start by trying to accurately identify the past, current, and most likely threats and exploits, then figure out how you can detect them. For example, if you have been attacked successfully by APT in the past, which of your tools would detect the same (or likely) APT methods in the future? If you were successfully exploited by password-guessing or pass-the-hash attacks, which detection methods would mostly likely clue you into them happening again?

The idea is to identify all the ways in which you could detect a particular attack -- which tools, which configurations -- and figure out the gaps. By understanding your threats and how you can detect them, you can start to figure out which detection methods work best and which have too many false positives and false negatives.

Many companies (and vendors) are working to create massive lists along these lines. For example: What are all the ways to detect pass-the-hash attacks? How do you detect buffer overflow attacks? The idea is to take all those methods, then automate attack-detection and alerting. You want the computer to figure out whether a string of bad logons is a hacking problem or an errant script or if it's several people coming back from holiday at the same time.

You’ll often hear this referred to as “machine learning” by vendors trying to sell computer security software, but it’s not. Machine learning is when the computer figures out, using regression analysis, how to detect and alert on an event on its own, without being previously instructed.

Your database of mitigation measures

After you’ve collated localized threats and figured out how to detect them, it's time to move onto the last stage: mitigation. You want to marry your deployed defenses against the most likely threats facing your high-value assets.

A mitigation database should show how many of your existing, deployed mitigations would work to reduce the risk of a particular threat -- and note the gaps. Most mitigations work against multiple threats, but you also surely have some that are very specific.

You might find you have multiple mitigations intended to minimize the same threat -- maybe too many in some cases. By the same token, almost inevitably you find gaps where no mitigations have been applied -- or mitigations that don’t seem to be doing the job.

Threat intelligence married to detection married to mitigation allows you to account for all the most likely threats and to hold deployed defenses accountable for stopping those threats. Without a "superdatabase" that contains all three, you can’t make such value-based decisions.

Your cyber security database in action

Along with those three food groups, the best security databases should allow mature business intelligence queries to run.

Here’s a great example: Suppose a new Web-based cross-site scripting (CSS) attack starts making the rounds, particularly in your industry. With the appropriate databases and query language, you could ask how many high-value servers you had in your environment that were susceptible to those same cross-site scripting attacks.

You could then query which deployed mitigations would stop the CSS attacks and which were deployed against those servers. In a few minutes you could report to management the threat from the new attack and how big of a risk it was in your own environment.

A good computer security defense database not only lets you track statistics, but gives you valuable information during your time of need. Instead of waiting for something to happen or taking guesses, you can assess the threat and the risk with real-time information.

The old cliche that your data is your most valuable asset still holds true, but data hasn't been taken seriously enough in the computer security world. How are your computer security defense databases doing?