Detecting Malicious URLs - Part 2. “Where”

In part1 we reviewed the current state of URL detection techniques. Two general approaches were discussed: blacklisting and heuristic detection, describing their strengths and weaknesses. Blacklisting technology is quite straightforward, although heuristic technologies required more explanation.

One of the methods we mentioned last time was host-based feature analysis where we focus on “where”, “by whom” and “when” a domain name was registered. This can help evaluate the reputation of a domain name and its host.

This time we will analyse the geographic location of the host where a webpage is placed and attempt to make a conclusion about existence of any dependencies.

Non-malicious or “whitelisted” URLs can be obtained from Alexa – the popular web metrics provider.

A list of phishing URLs can be downloaded from PhishTank PhishTank, a phishing aggregator operated by OpenDNS.

Malware URLs can be extracted from malware and verified by analysing any traffic it generates, filtering out requests to trusted websites, such as: google.com, microsoft.com and whatismyip.com.

We now have the following sets of URLs to work with:

URL Type

Count

Trusted

10000

Phishing

100000

Malware

250

Using a GeoIP database we can obtain necessary information about the countries where the host’s IP originates.

After applying GeoIP information to analysed URLs we can project the gathered information into map charts.

Trusted URLs geographic distribution:

Phishing URLs geographic distribution:

Malware URLs geographic distribution:

On all three charts we can see that the majority of hosts of all URL types are located in the United States. To understand the geographic peculiarities of phishing and malware URLs let us exclude top 20 countries mentioned in the “trusted” list and consider the resulting sets of phishing and malware countries only.

Top phishing countries:

Top malware countries:

Now we can see that phishing domains, among others, are likely to be found in the Virgin Islands (British), Belgium, Chile, Hong Kong and Thailand. However, all together it is less than 3% of all phishing URLs in the test. So despite the existence of geographic specific of phishing websites it covers only a small part of webpages. The majority of them have the same origin as the ones in the trusted set.

Conversely, the next table shows that the location of 45% of all malware URLs is distinguished from the trusted set. The top of the list is occupied by Ukraine with 15% of hosted malware. There are also other former Soviet Union countries in the list: Georgia, Lithuania, Belarus – totaling 7%.

An interesting fact is that we have also 6 malware URLs hosted in Antarctica (not shown on the map). It could be a mistake within the GeoIP data or research labs located in Antarctica contain malware and scientific computers are used to spread it, although this is doubtful because of the low Internet connection speed there.

As a result of GeoIP data analysis we found particular dependencies between URL type and its geographic location, but we should use this information carefully as geographic location alone is not enough to identify a malicious URL.