What are Heuristics?

It is generally well-understood that antimalware programs—the software which detects computer viruses, worms, trojan horses and other threats to your system—work by scanning files using signatures they already have. A signature could be as simple as a string[i] (like using the "find" command in your word processor to locate a particular piece of text) or as complex as a tiny macro or subroutine which tells the scanning engine what to look for and where to find it.

Signature scanning works very well for detecting threats which have already been identified but how do antimalware programs detect new, previously unseen threats? One of the methods used is heuristics. But what are heuristics, and how do they work? Randy Abrams, ESET's Director of Technical Education, finds the following definitions helpful in explaining heuristics:

Heuristic (from the Greek "Ε?ρ?σκω" for "find" or "discover") is an adjective for experience-based techniques that help in problem solving, learning and discovery.Source: Wikipedia

And for computer science: Heuristic – In computer science, a heuristic algorithm, or simply a heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, in the fashion of a general heuristic, but for which there is no formal proof of its correctness.Source:Wikipedia

The science of heuristics studies how information is discovered and learned. It explains how one looks at problems and finds solutions to them by induction (as opposed to deduction). Often, a heuristic is a "rule of thumb" one might have learned.

In computer science, a heuristic is an algorithm which consistently performs quickly and/or provides good results. But for antimalware software, heuristics can also have a more specialized meaning: Heuristics refers to a set of rules—as opposed to a specific set of program instructions—used to detect malicious behavior without having to uniquely identify the program responsible for it, which is how a classic signature-based "virus scanner" works, i.e. identifying the specific computer virus or other program.

The heuristic engine used by an antimalware program might include rules for the following:

a program which tries to copy itself into other programs (in other words, a classic computer virus)

a program which tries to write directly to the disk

a program which tries to remain resident in memory after it has finished executing

a program which decrypts itself when run (a method often used by malware to avoid signature scanners)

a program which binds to a TCP/IP port and listens for instructions over a network connection (this is pretty much what a bot—also sometimes called drones or zombies—do)

a program which attempts to manipulate (copy, delete, modify, rename, replace and so forth) files which are required by the operating system

a program which is similar to programs already known to be malicious

Some heuristic rules may have a heavier weight (and thus, score higher) than others, meaning that a match with one particular rule is more likely to indicate the presence of malicious software, as are multiple matches based on different rules.

Even more advanced heuristics might trace through the instructions in a program’s code before passing it to the computer’s processor for execution, allow the program to run in a virtual environment or "sandbox" to examine the behavior performed by and changes made to the virtual environment and so forth. In effect, antimalware software can contain specialized emulators that allow it to "trick" a program into thinking it is actually running on the computer, instead of being examined by the antimalware software for potential threats.

Keep in mind while the term "program" was used above, it does not necessarily mean executable programs such as .COM files or .EXE files. A heuristic engine could be examining processes and structures in memory, the data portion (or payload) of packets travelling over a network and so forth.

Likewise, a heuristic engine does not simply scan through files like a classic antivirus program looking for known patterns. It might trace through the instructions in a program before passing the code to the processor for execution, allow the program to run in a virtual environment or "sandbox" and examine the behavior performed in and changes made to the virtual environment and so forth.

The advantage of heuristic analysis of code is it can detect not just variants (modified forms) of existing malicious programs but new, previously-unknown malicious programs, as well. Combined with other ways of looking for malware, such as signature detection, behavioral monitoring and reputation analysis, heuristics can offer impressive accuracy. That is, correctly detecting a high proportion of real malware yet exhibiting a low false positive alarm rate as well, since misdiagnosing innocent files as malicious can cause severe problems.

Understanding how heuristics work can be something of a specialty in the antimalware field. If you are interested and would like to know more about this field, I would suggest the Heuristic Analysis—Detecting Unknown Viruses white paper written by David Harley and Andrew Lee. For more technical examination of anti-malware technology, Peter Szor’s book The Art of Computer Virus Research and Defense, though several years old, is still worth reading.

Aryeh Goretsky, MVP, ZCSE
Distinguished Researcher

[i] In fact, the process is a lot more sophisticated than that, nowadays, but that is a starting point for understanding the first principles of signature scanning.

Hello.
ESET's virus researcher said that my file (B6D2EB7F009FA46D517E433B9CB5FAF9) is already detected by a generic signature, but NOD32 detects it only by advanced heuristics. Please, explain this.

Randy Abrams

Generic signatures can use a variety of technologies. ESET’s generic signatures are very complex and at time involve the use of advanced heuristics at the same time. This is part of the way we improve detection while keeping false positives very low.

Yegor

Thanks for the answer.
Another question: how do you fix false positives? Add a checksum to ESET's whitelist or something else?

Aryeh Goretsky

There are numerous ways in which false positive alarms can be fixed, from re-weighting heuristics to checksums and hashes of various types to anti-signatures/whitelisting. The exact method would vary based on the underlying reason(s) that the false positive occurred.

Randy Abrams

I don’t fix false positives, that’s the virus lab’s job :) It really depends upon the false positive. Sometimes it is a modification of a signature that is used to fix a false positive. Sometimes a change in heuristics is required to fix a false positive. Whitelisting is relatively rarely used to fix false positives, but in some cases is warranted.

Matthieu Kaczmarek

Heuristics are also known as unproved methods which seem to work in practice. In other terms heuristics are not silver bullets, they usually yield false positive and false negatives. Those metrics are usually difficult to evaluate, actually, few people try to evaluate where heuristics are wrong.

We all know that antivirus software cannot be perfect, that is why they are usually updated: old fashioned signature detection is a trustable process with +25 years of experience.

Heuristics are younger, as a security professional I would need some objective and reproducible metrics on heuristics efficiency. So that I’ll be able to manage risks and to decide if it worth to take the risk to rely on heuristics or if I should only used classic signatures on critical hosts.

Aryeh Goretsky

Signature scanning does not pre-date heuristic scanning technology by many years, Matthieu. Around 1990-1991, when I was at McAfee Associates, we began using what we called “fuzzy logic” to detect (and subsequently remove) boot sector and master boot record infecting viruses. If you ever came across a report of a "Generic Boot " or a "Generic MBR " virus with VIRUSCAN and CLEAN-UP, this type of "heuristic" was the underlying technology being used, although we did not use that term, then. The first time I actually heard of heuristics was from Fridrik Skulason around 1991. I believe Frans Veldman and Righard Zwienenberg also independently developed heuristics while at ESaSS (ThunderByte) as well.

"Classic signatures" generate false positive and false negative alarms, as we saw in several high-profile cases last year. Heuristics can, and in ESET’s case, are used in conjunction with classic signatures. Together, they can actually reduce the chance of a false positive or a false negative alarm.

As far as testing goes, retrospective testing of anti-malware software is one way the testing of heuristics can be done: Freeze the signature updates and then running the program against newer malware to see what is detected and what is not. Results will, of course, decay over time, but repeated testing should allow you to determine how well heuristic detection is performing.

Randy Abrams

Heuristics do not usually yield false positives.. Signatures usually yield false negatives as each signature is for a specific threat it is prone to miss every other threat. AV without heuristics is not viable today. Generic signatures, a type of heuristic approach are used by virtually every antivirus vendor. It would be foolish to rely on heuristics or signatures alone, or to rely upon any security product alone.

Matthieu Kaczmarek

Hi thanks for the feedback.

From I post I though that what you called heuristic was related to program behavior. From my point of view regular expression are merely signatures. I agree that signatures yield false positives and false negatives. Nevertheless the mechanisms is known and controled; ie taking an instance of a false alarm it is straighforward to understand the cause of the error. When your dealing with behaviors it is quite more complex.

Concerning ‘Heuristics do not usually yield false positives’, trust is good control is better. There is hardly any reproducible studies on false positives and false negatives. For example VB tests are not reproducible since the test cases remains proprietary.

Heuristics are unproved methods inspired from intuition and which seem to work in practice. I think that foolishness is about reliying on such an obscure technlogy that is neither controlable nor provable.

Randy Abrams

Heuristics are hardly obsure. Heuristics are used in virtually all antivirus products today. There are ways to test both false positives and false negatives. False positives are easily tested for an on-demand scan, but just as with signatures, a new program or a change to an existing signature or heuristic may create a false positive at some point. For testing efficacy there is retrospective testing. You freeze a product for a period of time and collect brand new threats that were not known to exist at the time the product was last updated. At the end of the period of time you then scan without updating. Anything detected is a heuristic detection (with very few exceptions) and anything missed is a false negative. When looking at the results of standard testing combined with retrospective testing it becomes pretty clear that the combination of technologies offers significantly better protection.

Roque

Muy agradecido por toda la informacion ..EXELENTE…………

Pedro Torres

Can Heuristics triggers detections when a similarity to a Generic signature (wildcard) is found?