Google issued a new study on Wednesday detailing how it is becoming more difficult to identify malicious websites and attacks, with antivirus software proving to be an ineffective defense against new ones.

The company’s engineers analyzed four years worth of data comprising 8 million websites and 160 million web pages from its Safe Browsing service, which is an API (application programming interface) that feeds data into Google’s Chrome browser and Firefox and warns users when they hit a website loaded with malware.

Google said it displays 3 million warnings of unsafe websites to 400 million users a day. The company scans the Web, using several methods to figure out if a site is malicious.

“Like other service providers, we are engaged in an arms race with malware distributors,” according to a blog post from Google’s security team.

That detection process is becoming more difficult due to a variety of evasion techniques employed by attackers that are designed to stop their websites from being flagged as bad, according to the report.

The company uses a variety of methods to detect dangerous sites. It can test a site against a “virtual machine honeypot,” which is a virtual machine that visits a website and notes its behavior. It also uses browser emulators for the same purpose, which record an attack sequence. The browser emulator is an HTML parser and a modified open-source JavaScript engine.

Other methods include ranking a website by reputation based on its hosting infrastructure, and another line of defense is antivirus software.

One of the ways hackers get around VM-based detection is to require the victim to perform a mouse click. Many sites are rigged to automatically deliver an exploit and execute an attack if an unpatched software program is found.

Google describes it as a kind of social engineering attack, since the malicious payload appears only after a person interacts with the browser. Google is working around the issue by configuring its virtual machines to do a mouse click.

Browser emulators can be confused by attacks when the malicious code is scrambled, a method known as obfuscation. Since the browser emulator isn’t a real browser, it won’t necessarily execute the obfuscated JavaScript code in the same way as a real browser. The only explanation for the more complex JavaScript is that it is designed to halt emulated browsers and make manual analysis of the code more difficult, the engineers wrote.

Google is also encountering “IP cloaking,” where a malicious website will refused to serve harmful content to certain IP ranges, such as those known to be used by security researchers. In August 2009, Google found that some 200,000 sites were using IP cloaking. It forces researchers to scan the sites from IP ranges that are “unknown by the adversary,” the report said.

Antivirus software programs rely on signatures as one method to detect attacks. But the engineers wrote that the software often missed code that has been “packed,” or compressed in a way that it is unrecognizable but will still execute.

Since it can take time for AV vendors to refine their signatures and remove ones that cause false positives, the delay allows the malicious content to stay undetected.

“While AV vendors strive to improve detection rates, in real time they cannot adequately detect malicious content,” the Google researchers wrote. “This could be due to the fact that adversaries can use AV products as oracles before deploying malicious code into the wild.”