Determining the Geolocation of Systems Infected with Malware

For me, one of the most interesting new bits of data included in the latest Microsoft Security Intelligence Report (SIRv11) is related to the methods we use to identify the geolocation of systems reporting malware infection data to us.

Malware infection rate data for over a hundred locations is reported using a measure called computers cleaned per mille (CCM). CCM represents the number of computers cleaned of malware for every 1,000 executions of the Microsoft Malicious Software Removal Tool (MSRT). The CCM figure allows us to compare infection rates of different locations around the world without compensating for different populations or computer install bases. That is, the data is normalized.

The latest volume of the Security Intelligence Report, volume 11, introduces a significant change in the way location is determined for computers whose administrators have opted into providing telemetry data to Microsoft. In previous volumes of the report, Windows-based computers reporting information were classified by countries and regions according to the administrator-specified setting under the Location tab or menu in Region and Language in Control Panel.

For volume 11 of the Microsoft Security Intelligence Report, location is also determined by geolocation of the IP address used by the computer submitting the telemetry data[1]. In addition to providing what Microsoft believes will be a more accurate gauge of regional infection rates, this change provides an interesting perspective on computer usage habits around the world.

Using IP addresses to determine the location of systems sharing telemetry instead of using the administrator-specified Location setting of the computer creates slight differences in the trends observed in most countries/regions reported in the Security Intelligence Report. In a few cases, the reported infection rate has changed significantly. Figures 2 and 3 below show trends for the locations with the largest CCM decreases and increases caused by the switch to IP geolocation.

Figure 2: The five locations with the largest CCM decreases caused by the switch to IP geolocation

Very few locations saw their infection rates fall as a result of the switch to IP geolocation—in fact, among locations with at least 100,000 MSRT executions in the first quarter of 2011 (1Q10), the five shown in Figure 2 were the only locations that underwent a CCM decrease greater than 1.0 point.

Figure 3: The five locations with the largest CCM increases caused by the switch to IP geolocation

By contrast, there were more than 100 locations whose CCMs rose after applying IP geolocation, with 35 of them moving 10 points or more, and four rising more than 20 points, as shown in Figure 3. In general, most of the locations with significant increases have smaller populations and relatively few reporting computers. The 61.5 CCM for Qatar in 1Q11 is the largest CCM figure ever reported in the Microsoft Security Intelligence Report, and is 55.1 points higher than the figure reported for Qatar for 4Q10 using the administrator-configured locale setting to determine location. Notably, the five locations in which the CCM decreased significantly represent the largest populations using five of the most widely used languages on the Internet: France and French, Spain and Spanish, Russia and Russian, Taiwan and Chinese (Traditional), and the United States and English. This finding suggests that, rather than using the locale settings designated for their country or region, many computer administrators in smaller locations might be using locale settings for larger ones, particularly larger locations in which the dominant language is one spoken by the computer’s user. As a result, the reported infection rates were being skewed for some locations. For example, if a Spanish-speaking computer administrator outside Spain configured a computer with the locale settings for Spain, any malware detections on the computer would have been reported for Spain using the previous method for determining location. This factor would have the effect of over-reporting malware detections for Spain, and underreporting malware detections for the country or region in which the computer was actually located. Switching to IP address-based geolocation corrects this anomaly and provides more accurate regional infection statistics. That said, IP address-based geolocation is not a perfect measure either, but should be more accurate. Computer security and response professionals in the more affected locations should consider these findings carefully when developing plans for safeguarding their populations’ computers.