Every antivirus or security suite product promises to protect you from a horde of security risks and annoyances. But do they work? When evaluating these products for review, we put their claims to the test in many different ways. Each review reports the results of our tests, as well as hands-on experience with the product. This article will dig deeper, explaining just how these tests work.

Of course, not every test is appropriate for every product. Many antivirus utilities include protection against phishing, but some don't. Most suites include spam filtering, but some omit this feature, and some antivirus products add it as a bonus. Whatever features a given product offers, we put them to the test. Click any test in the list below to jump straight to a description of that test.

Testing Real-Time Antivirus

Every full-powered antivirus tool includes an on-demand scanner to seek out and destroy existing malware infestations and a real-time monitor to fend off new attacks. In the past, we've actually maintained a collection of malware-infested virtual machines to test each product's ability to remove existing malware. Advances in malware coding made testing with live malware too dangerous, but we can still exercise each product's real-time protection.

Each year in early spring, when most security vendors have finished their yearly update cycle, we gather a new collection of malware samples for this test. We start with a feed of the latest malware-hosting URLs, download hundreds of samples, and winnow them down to a manageable number.

We analyze each sample using various hand-coded tools. Some of the samples detect when they're running in a virtual machine and refrain from malicious activity; we simply don't use those. We look for a variety of different types, and for samples that make changes to the file system and Registry. With some effort, we pare the collection down to about 30, and record exactly what system changes each sample makes.

To test a product's malware-blocking abilities, we download a folder of samples from cloud storage. Real-time protection in some products kicks in immediately, wiping out known malware. If necessary to trigger real-time protection, we single-click each sample, or copy the collection to a new folder. We take note of how many samples the antivirus eliminates on sight.

Next, we launch each remaining sample and note whether the antivirus detected it. We record the total percentage detected, regardless of when detection happened.

Detection of a malware attack isn't sufficient; the antivirus must actually prevent the attack. A small in-house program checks the system to determine whether the malware managed to make any Registry changes or install any of its files. In the case of executable files, it also checks whether any of those processes are actually running. And as soon as measurement is complete, we shut down the virtual machine.

If a product prevents installation of all executable traces by a malware sample, it earns 8, 9, or 10 points, depending on how well it prevented cluttering the system with non-executable traces. Detecting malware but failing to prevent installation of executable components gets half-credit, 5 points. Finally, if, despite the antivirus's attempt at protection, one or more malware processes is actually running, that's worth a mere 3 points. The average of all these scores becomes the product's final malware-blocking score.

Testing Malicious URL Blocking

The best time to annihilate malware is before it ever reaches your computer. Many antivirus products integrate with your browsers and steer them away from known malware-hosting URLs. If protection doesn't kick in at that level, there's always an opportunity to wipe out the malware payload during or immediately after download.

While oue basic malware-blocking test uses the same set of samples for a season, the malware-hosting URLs we use to test Web-based protection are different every time. We get a feed of the very newest malicious URLs from London-based MRG-Effitas and typically use URLs that are no more than a day old.

Using a small purpose-built utility,we go down the list, launching each URL in turn. We discard any that don't actually point to a malware download, and any that return error messages. For the rest, we note whether the antivirus prevents access to the URL, wipes out the download, or does nothing. After recording the result, the utility jumps to the next URL in the list that isn't at the same domain. We do skip any files larger than 5MB, and also skip files that have already appeared in the same test. We keep at it until we've accumulated data for at least 100 verified malware-hosting URLs.

The score in this test is simply the percentage of URLs for which the antivirus prevented downloading malware, whether by cutting off access to the URL completely or by wiping out the downloaded file. Scores vary widely, but the very best security tools manage 90 percent or more.

Testing Phishing Detection

Why resort to elaborate data-stealing Trojans, when you can just trick people into giving up their passwords? That's the mindset of malefactors who create and manage phishing websites. These fraudulent sites mimic banks and other sensitive sites. If you enter your login credentials, you've just given away the keys to the kingdom. And phishing is platform-independent; it works on any operating system that supports browsing the Web.

These fake websites typically get blacklisted not long after their creation, so for testing we use only the very newest phishing URLs. We gather these from phishing-oriented websites, favoring those that have been reported as frauds but not yet verified. This forces security programs to use real-time analysis rather than relying on simple-minded blacklists.

Symantec's Norton Security has long been an outstanding detector of such frauds. Since the actual URLs used differ in every test, we report results as the difference between a product's detection rate and Norton's. We also compare the detection rate with that of the phishing protection built into Chrome, Firefox, and Internet Explorer.

We use five computers (most of them virtual machines) for this test, one protected by Norton, one by the product under testing, and one each using the three browsers alone. A small utility program launches each URL in the five browsers. If any of the five returns an error message, we discard that URL. If the resulting page doesn't actively attempt to imitate another site, or doesn't attempt to capture username and password data, we discard it. For the rest, we record whether or not each product detected the fraud.

In many cases, the product under testing can't even do as well as the built-in protection some browsers. Only a very few products come close to matching Norton's detection rate.

Testing Spam Filtering

These days email accounts for most consumers have the spam vacuumed out of them by the email provider, or by a utility running on the email server. In fact, the need for spam filtering is steadily dwindling. Austrian test lab AV-Comparatives tested antispam functionality a few years ago, finding that even Microsoft Outlook alone blocked almost 90 percent of spam, and most suites did better, some of them much better. The lab does not even promise to continue testing consumer-facing spam filters, noting that "several vendors are thinking of removing the antispam feature from their consumer security products."

In the past, we ran our own antispam tests using a real-world account that gets both spam and valid mail. The process of downloading thousands of messages and manually analyzing the contents of the Inbox and spam folder took more time and effort than any of the other hands-on tests. Expending maximal effort on a feature of minimal importance no longer makes sense.

There are still important points to report about a suite's spam filter. What email clients does it support? Can you use it with an unsupported client? Is it limited to POP3 email accounts, or does it also handle IMAP, Exchange, or even Web-based email? Going forward, we'll carefully consider each suite's antispam capabilities, but we will no longer be downloading and analyzing thousands of emails.

Testing Security Suite Performance

When your security suite is busily watching for malware attacks, defending against network intrusions, preventing your browser from visiting dangerous websites, and so on, it's clearly using some of your system's CPU and other resources to do its job. Some years ago, security suites got the reputation for sucking up so much of your system resources that your own computer use was affected. Things are a lot better these days, but we still run some simple tests to get an insight into each suite's effect on system performance.

Security software needs to load as early in the boot process as possible, lest it find malware already in control. But users don't want to wait around any longer than necessary to start using Windows after a reboot. Our test script runs immediately after boot and starts asking Windows to report the CPU usage level once per second. After 10 seconds in a row with CPU usage no more than 5 percent, it declares the system ready for use. Subtracting the start of the boot process (as reported by Windows) we know how long the boot process took. We run many repetitions of this test and compare the average with that of many repetitions when no suite was present.

In truth, you probably reboot no more than once per day. A security suite that slowed everyday file operations might have a more significant impact on your activities. To check for that kind of slowdown, we time a script that moves and copies a large collection of large-to-huge files between drives. Averaging several runs with no suite and several runs with the security suite active, we can determine just how much the suite slowed these file activities. A similar script measures the suite's effect on a script that zips and unzips the same file collection.

The average slowdown in these three tests by the suites with the very lightest touch can be as low as 1 percent. At the other end of the spectrum, a very few suites average 25 percent, or even more. You might actually notice the impact of the more heavy-handed suites.

Testing Firewall Protection

It's not as easy to quantify a firewall's success, because different vendors have different ideas about just what a firewall should do. Even so, there are a number of tests we can apply to most of them.

Typically a firewall has two jobs, protecting the computer from outside attack and ensuring that programs don't misuse the network connection. To test protection against attack, we use a physical computer that connects through the router's DMZ port. This gives the effect of a computer connected directly to the Internet. That's important for testing, because a computer that's connected through a router is effectively invisible to the Internet at large. We hit the test system with port scans and other Web-based tests. In most cases we find that the firewall completely hides the test system from these attacks, putting all ports in stealth mode.

The built-in Windows firewall handles stealthing all ports, so this test is just a baseline. But even here, there are different opinions. Kaspersky's designers don't see any value in stealthing ports as long as the ports are closed and the firewall actively prevents attack.

Program control in the earliest personal firewalls was extremely hands-on. Every time an unknown program tried to access the network, the firewall popped up a query asking the user whether or not to allow access. This approach isn't very effective, since the user generally has no idea what action is correct. Most will just allow everything. Others will click Block every time, until they break some important program; after that they allow everything. We perform a hands-on check of this functionality using a tiny browser utility coded in hour, one that will always qualify as an unknown program.

Some malicious programs attempt to get around this kind of simple program control by manipulating or masquerading as trusted programs. When we encounter an old-school firewall, we test its skills using utilities called leak tests. These programs use the same techniques to evade program control, but without any malicious payload. We do find fewer and fewer leak tests that still work under modern Windows versions.

At the other end of the spectrum, the best firewalls automatically configure network permissions for known good programs, eliminate known bad programs, and step up surveillance on unknowns. If an unknown program attempts a suspicious connection, the firewall kicks in at that point to stop it.

Software isn't and can't be perfect, so the bad guys work hard to find security holes in popular operating systems, browsers, and applications. They devise exploits to compromise system security using any vulnerabilities they find. Naturally the maker of the exploited product issues a security patch as soon as possible, but until you actually apply that patch, you're vulnerable.

The smartest firewalls intercept these exploit attacks at the network level, so they never even reach your computer. Even for those that don't scan at the network level, in many cases the antivirus component wipes out the exploit's malware payload. We use the CORE Impact penetration tool to hit each test system with about 30 recent exploits and record how well the security product fended them off.

Finally, we run a sanity check to see whether a malware coder could easily disable security protection. We look for an on/off switch in the Registry and test whether it can be used to turn off protection (though it's been years since we found a product vulnerable to this attack). We attempt to terminate security processes using Task Manager. And we check whether it's possible to stop or disable the product's essential Windows services.

Testing Parental Control

Parental control and monitoring covers a wide variety of programs and features. The typical parental control utility keeps kids away from unsavory sites, monitors their Internet usage, and lets parents determine when and for how long the kids are allowed to use the Internet each day. Other features range from limiting chat contacts to patrolling Facebook posts for risky topics.

we always perform a sanity check to make sure the content filter actually works. As it turns out, finding porn sites for testing is a snap. Just about any URL composed of a size adjective and the name of a normally-covered body part is already a porn site. Very few products fail this test.

We use a tiny in-house browser utility to verify that content filtering is browser independent. We issue a three-word network command (no, I'm not publishing it here) that disables some simple-minded content filters. And we check whether we can evade the filter by using a secure anonymizing proxy website.

Imposing time limits on the children's computer or Internet use is only effective if the kids can't interfere with timekeeping. We verify that the time-scheduling feature works, then try evading it by resetting the system date and time. The best products don't rely on the system clock for their date and time.

After that, it's simply a matter of testing the features that the program claims to have. If it promises the ability to block use of specific programs, we engage that feature and try to break it by moving, copying, or renaming the program. If it says it strips out bad words from email or instant messaging, we add a random word to the block list and verify that it doesn't get sent. If it claims it can limit instant messaging contacts, we set up a conversation between two of our accounts and then ban one of them. Whatever control or monitoring power the program promises, we do our best to put it to the test.

Interpreting Antivirus Lab Tests

We don't have the resources to run the kind of exhaustive antivirus tests performed by independent labs around the world, so we pay close attention to their findings. We follow two labs that issue certifications and four labs that release scored test results on a regular basis, using their results to help inform our reviews.

ICSA Labs and West Coast Labs offer a wide variety of security certification tests. We specifically follow their certifications for malware detection and for malware removal. Security vendors pay to have their products tested, and the process includes help from the labs to fix any problems preventing certification. What we're looking at here is the fact that the lab found the product significant enough to test, and the vendor was willing to pay for testing.

Based in Magdeburg, Germany, the AV-Test Institute continuously puts antivirus programs through a variety of tests. The one we focus on is a three-part test that awards up to 6 points in each of three categories: Protection, Performance, and Usability. To reach certification, a product must earn a total of 10 points with no zeroes. The very best products take home a perfect 18 points in this test.

To test protection, the researchers expose each product to AV-Test's reference set of over 100,000 samples, and to several thousand extremely widespread samples. Products get credit for preventing the infestation at any stage, be it blocking access to the malware-hosting URL, detecting the malware using signatures, or preventing the malware from running. The best products often reach 100 percent success in this test.

Performance is important—if the antivirus noticeably puts a drag on system performance, some users will turn it off. AV-Test's researchers measure the difference in time required to perform 13 common system actions with and without the security product present. Among these actions are downloading files from the Internet, copying files both locally and across the network, and running common programs. Averaging multiple runs, they can identify just how much impact each product has.

The Usability test isn't necessarily what you'd think. It has nothing to do with ease of use or user interface design. Rather, it measures the usability problems that occur when an antivirus program erroneously flags a legitimate program or website as malicious, or suspicious. Researchers actively install and run an ever-changing collection of popular programs, noting any odd behavior by the antivirus. A separate scan-only test checks to make sure the antivirus doesn't identify any of over 600,000 legitimate files as malware.

We gather results from four (previously five) of the many tests regularly released by AV-Comparatives, which is based in Austria and works closely with the University of Innsbruck. Security tools that pass a test receive Standard certification; those that fail are designated as merely Tested. If a program goes above and beyond the necessary minimum, it can earn Advanced or Advanced+ certification.

AV-Comparatives's file detection test is a simple, static test that checks each antivirus against about 100,000 malware samples, with a false-positives test to ensure accuracy. And the performance test, much like AV-Test's, measures any impact on system performance. Previously, we included the heuristic / behavioral test; this test has been dropped.

We consider AV-Comparatives's dynamic whole-product test to be the most significant. This test aims to simulate as closely as possible an actual user's experience, allowing all components of the security product to take action against the malware. Finally, the remediation test starts with a collection of malware that all tested products are known to detect and challenges the security products to restore an infested system, completely removing the malware.

Where AV-Test and AV-Comparatives typically include 20 to 24 products in testing, SE Labs generally reports on no more than 10. That's in large part because of the nature of this lab's test. Researchers capture real-world malware-hosting websites and use a replay technique so that each product encounters precisely the same drive-by download or other Web-based attack. It's extremely realistic, but arduous.

A program that totally blocks one of these attacks earns three points. If it took action after the attack began but managed to remove all executable traces, that's worth two points. And if it merely terminated the attack, without full cleanup, it still gets one point. In the unfortunate event that the malware runs free on the test system, the product under testing loses five points. Because of this, some products have actually scored below zero.

In a separate test, the researchers evaluate how well each product refrains from erroneously identifying valid software as malicious, weighting the results based on each valid program's prevalence, and on how much of an impact the false positive identification would have. They combine the results of these two tests and certify products at one of five levels: AAA, AA, A, B, and C.

For some time we've used a feed of samples supplied by MRG-Effitas in our hands-on malicious URL blocking test. This lab also releases quarterly results for two particular tests that we follow. The 360 Assessment & Certification test simulates real-world protection against current malware, similar to the dynamic real-world test used by AV-Comparatives. A product that completely prevents any infestation by the sample set receives Level 1 certification. Level 2 certification means that at least some of the malware samples planted files and other traces on the test system, but these traces were eliminated by the time of the next reboot. The Online Banking Certification very specifically tests for protection against financial malware and botnets.

Coming up with an overall summary of lab results isn't easy, since the labs don't all test the same collection of programs. We've devised a system that normalizes each lab's scores to a value from 0 to 10. Our aggregate lab results chart reports the average of these scores, the number of labs testing, and the number of certifications received. If just one lab includes a product in testing, we consider that to be insufficient information for an aggregate score.

About the Author

Neil Rubenking served as vice president and president of the San Francisco PC User Group for three years when the IBM PC was brand new. He was present at the formation of the Association of Shareware Professionals, and served on its board of directors. In 1986, PC Magazine brought Neil on board to handle the torrent of Turbo Pascal tips submitted b... See Full Bio

Get Our Best Stories!

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.