Pivoting on the Data: Who Tracks the Trackers?

March 7, 2017, Ian Cowger

Also by Jordan Herman

Trackers have become a regular presence on the web, and are a valuable resource for web admins to maintain visibility over their online assets. Code such as associated Google Analytics ID, tracking pixels, cookies, or social network connections tell a story about the traffic flow and composition of an organization’s online assets. However, when a website’s HTML is scraped and reposted for a phishing campaign, these trackers can tell a different kind of story.

Often, threat actors forget to change or remove official tracking code when developing phishing pages and even utilize it themselves to measure the success of their malicious campaigns, just like a legitimate organization would. As a result, this code can be an invaluable clue to analysts investigation their activity, if they’re able to search for it.

Let’s take a look at a fertile target for phishers: online gaming communities. If patrons of these sites fall for phish attacks like the one singled out below and provide their credentials, it is possible their credit card or other payment information used to pay for online games will be compromised.

A Google Analytics ID Tells a Different Kind of Story

Different trackers are used in various locations. The most commonly replicated trackers by attackers are found on login pages or for social media. For this particular example, we’ll use Steam’s Google Analytics Tracking ID as a tracker:

Fig-1 A list of phishing pages using this Google Analytics tracking ID

From PassiveTotal, we can see that there have been 629 hosts observed using this Google Analytics Tracking ID tracker on their page. As you can see above, numerous malicious sites have the Google Analytics Tracking ID on them. Let’s pick one and check it out:

Fig-2 A phishy domain listed in PassiveTotal

Below, we not only see that the domain is spelled incorrectly, but that it’s only been resolving since January:

Fig-3 We pivot on the phishing domain’s resolution data in PassiveTotal

RiskIQ collects these trackers from the Document Object Models (DOMs) retrieved from the RiskIQ crawling platform and the two billion resources we capture on a daily basis. Chances are, if we flag it as a phish in PassiveTotal, it can be traced back to a RiskIQ crawl. After pivoting to the RiskIQ tool, I was able to find the phish quickly. We crawled it in October of last year, and our machine learning model automatically categorized it as phishing:

Fig-4 A RiskIQ Phish List hit inside the RiskIQ tool

From here, you can check out our crawl data of the phish, the full DOM, all resources, and the cause sequence. Highlighted in the DOM, you can see the title for a Steam Community (community is Сообщество in Russian) page:

Fig-5 The DOM confirms: the brand name is present

Going a step further, we can take the stored resources and rebuild a view of what the page looked like at the time of the crawl via our snapshot utility:

Fig-6 The final test, taking a look for ourselves

Yep, that’s a Phish.

Use Trackers to Your Advantage

Every piece of information is necessary for investigating and tracking down threats to your organization. In addition to using your trackers to find phishing pages, once you have identified attacker-owned infrastructure such as the domains and IPs they’re using to serve phishing pages, you can then check to see if they have implemented trackers of their own. After all, these trackers can be just as useful for bad actors as they are for legitimate businesses tracking their campaigns.

These values can provide insights into additional infrastructure that typically goes unnoticed by static data sets. RiskIQ has data about trackers from includes IDs from providers like Google, Yandex, Mixpanel, New Relic, Clicky, and more. Once we have attacker-owned trackers, we can then pivot off of them and discover additional infrastructure.