9
Algorithm: Aho-Corasick  Developed 1975, bibliographic search  Based on finite automaton (graph) o Circles are search states o Edges are transitions o Double circles are final states/output  And a failure function o What to do when no suitable transition o I.e., where to resume “matching”

10
Algorithm: Aho-Corasick  When virus scanning, search for virus signature, which is bit string  For simplicity, illustrate algorithm using English words  For our example…  Scan for any of the following words: o hi, hips, hip, hit, chip

11
Algorithm: Aho-Corasick

12
Aho-Corasick Example

13
Algorithm: Aho-Corasick  How to construct automaton? o And failure function  Build the automaton --- next slide o A “trie”, also known as a “prefix tree”  Then determine failure function o Two slides ahead

15
Aho-Corasick: Failure Function  Depth 1 nodes o Fail goes back to start state  For other states o Go back to earliest place where search can resume o Pseudo-code is in the book

16
Aho-Corasick  The bottom line…  Linear search that can find multiple signatures o Like searching in parallel for related signatures  Efficient representation of automaton is the challenge o Both time and space issues

20
Algorithm: Veldman  Veldman allows for wildcards and complex signatures o Aho-Corasick does not  But both algorithms analyze every byte of input  Is it possible to do better? o That is, can we skip some of the input?

30
 Here, we illustrated simplest form of the algorithm  More advanced forms can handle 10s of thousands of signatures  Worst case performance is terrible o Sequential search thru every byte of input for every signature…  But tests show it’s good in practice

31
Testing  How can we know if scanner works?  Test on live viruses? o Might not be a good idea  EICAR standard antivirus test file o Not too useful either  So, what to do? o Author doesn’t have any suggestions!

32
Improving Performance  “Grunt scanning” --- scan everything o Slow slow slow  Search only beginning and end of files  Scan code entry point o And points reachable from entry point  If position of virus in file is known… o Make it part of the “signature”  Limit scans to size of virus(es)

33
Improving Performance  Only scan certain types of files o Not so viable today  Only rescan files that have changed o How to detect change? o Where to store this info? Cache? Database? Tagged to file? o Updates to signatures? Must rescan… o How to checksum efficiently?

34
Improving Performance  How to checksum efficiently? o Checksum entire file might take longer than scanning it o Only checksum parts that are scanned  How to avoid checksum tampering? o Encrypt? Where to store the key? o Checksum the checksums? o Other?

45
Behavior Monitor/Blocker  “Care must be taken… because anomalous behavior does not automatically imply viral behavior” o That’s an understatement!  This is the fundamental problem in anomaly detection o Potential for lots of false positives

49
Emulation  Emulation and polymorphic detection o Let virus decrypt itself o Then use ordinary signature scan  When has decryption occurred? o Use some heuristics… o Execution of code that was modified (decrypted) or in such a memory location o More than N bytes of modified code, etc.

52
Memory Emulation  This could be difficult… o 32-bit addressing, so 4G of “memory”  Do we need to emulate all of this? o No, most apps only uses small amount  Keep track of memory that’s modified and where it is located o Only need to deal with memory that is modified by a specific app/virus

54
Emulation Controller  When does emulation stop? o Can’t expect to run code to completion…  Use heuristics to decide when to stop o Number of instructions? o Amount of time? o Threshold on percent of instructions that modify memory? o “Stoppers”? E.g., assume virus wouldn’t write output before being malicious

64
Detection: Bottom Line  Static analysis is fast o Good approach when it works  Dynamic analysis can “peel away a layer of obfuscation” o Dynamic analysis is relatively costly

65
Verification, Quarantine, Disinfect  What to do after virus detected? 1. Verify that it really is a virus 2. Quarantine infected code 3. Disinfect --- remove infection  These are done rarely, so can be slow and costly in comparison to detection

69
Quarantine  Isolate detected virus from system o Then ask user if it’s OK to disinfect o Or do further analysis of virus  How to quarantine virus? o Copy to a “quarantine” directory? o Hide it in “invisible” location? o Encrypt it?

73
Disinfect  Disinfection methods…  Use the virus to disinfect o Stealth virus may give original code  Generic disinfection o Virus may restore code when executed o Might be dangerous to run virus code… o …emulation is a better strategy, maybe even disinfect as part of detection

75
Virus Databases  How to update database/signatures? o Push or pull? o Automatic or manual? o How often to update? o How to distribute updates? o Distribute entire database or deltas?  Also must be able to update AV software

81
Macro Viruses  One redeeming feature…  They operate in restricted domain o So easier to determine “normal” o Reduces number of false positives  Most/all are not parasitic o More like companion viruses  All the usual detection techniques can be applied

82
Macro Viruses: Disinfection  Delete all macros in infected document  Delete all associated macros  Delete macro if in doubt (heuristic)  Emulation to find all macros used by infected macro, and delete them  Basic idea? o Err on side of caution/deletion  Macro viruses not so common today