Over the past month I’ve spent some time looking at intelligent fraud and anomaly detection systems, authoring a journal paper comparing a handful of methods, and more recently focussing my attention on Detica’s systems. Plus I’m working with someone to develop a multi-featured case management system for tracking malware, to save us switching between applications.

But anyway, the technologies for intelligence gathering are actually far beyond what Glenn Greenwald published from the rather outdated Snowden archive, and it’s not simply about the warehousing of intercepted data. Stream mining ‘digests’ the data in real-time, deriving from it information that analytics systems can evaluate and contextualise without human intervention. The analyst works on the end product of this. Another thing worth mentioning about the Snowden/NSA thing is that nobody’s quite sure what’s retained or discarded in the stream mining process.

Analytics has been deployed long before the ‘threat intelligence’ snake oil industry materialised, and it has uses in preventing or mitigating real threats. For example, the protection of bank accounts over perhaps the last two decades, which is where my interest in this began. An advanced field-tested detection system would also have prevented victims of identity theft being wrongly associated with Operation Ore between 1999 and 2002. Alert Logic’s own service, again a much-needed system for detecting genuine threats, was built around a core system dating from the late 90s.

I’ve singled out Detica’s NetReveal for reasons that should become obvious, the primary one being it appears the most advanced I’ve come across after much digging around.

AML Applications
After being first deployed in 2005 as a proof-of-concept system for the Insurance Fraud Bureau, NetReveal was adopted by AXA, Zurich, Nationwide and several other major financial institutions, so it therefore might be the very system I was looking for when writing up the review paper.
Anti Money Laundering (AML) is actually just one of the ‘use cases’ for NetReveal, and one application where the capabilities are truly tested – the whole point of money laundering is to get money from one place to another without authorities knowing, typically by disguising transactions among legitimate or routine activities. I’ve seen real-world examples of this in the past: One involved ordering surplus on behalf of an employer, selling it then pocketing the money. Another example involved non-existent employees on the books, all presumably with the same bank account created by the fraudster. Of course, this probably continued for years after I resigned, because management only saw payrolls and accounts. They didn’t give minimum-waged employees the time of day, and so didn’t have the situational awareness for spotting the discrepancies.

So the problem NetReveal must solve is quite complex. How can it discern fraudulent transactions from legitimate transactions? How can transactions be associated with seemingly unconnected events? How could a system identify a suspicious event and tie it to a sequence of other events? More importantly, how can the system be made to work in real-time?
What I’ve found is there are two broad categories of fraud detection. First there are the rule-based, signature-based and expert systems – these tend to be static, comparing current transactions with signatures of known fraud cases. While these are fast and efficient, they’re less reliable. Secondly there are deeper analysis methods such as clustering, pattern recognition and Bayesian systems – these are more adaptive and thorough, but computationally more expensive.

So what NetReveal does is take the raw data from whatever sources, categorise them into entities, perform some analysis, construct a relational map and determine the weighting of each link. The latter two stages are possible with Maltego Casefile and Palantir anyway, as a very simple but highly effective method of revealing patterns an intelligence analyst would otherwise miss. The following screenshot is an example from my malware tracking project that would, with a much larger database, be useful in attributing malware and incidents to known ‘actors’:

On the analytics side it appears a hybrid of several intelligent fraud detection systems, the specifics I won’t reveal here as they’re also used by electronic payment systems and have fundamental limitations that aren’t easily resolved. What I could reveal is that Detica was rather ahead of the curve, as the research papers I’ve found that proposed hybrid systems were mostly published after 2009. Alert Logic, an entirely different company also dealing with vast amounts of data, also appears to have followed the hybrid model.

Modules
From what I can determine, NetReveal is a modular system that can be remixed for whatever customer, with the following three ‘components’:
* Detection Modules
* Analysis Modules
* Investigation Modules

NetReveal also got reworked specifically for ‘cyber threats’, in the form of the CyberReveal product.

Detection modules appear to provide the basic rule-based system that’s highly efficient at handling real-time data. Its function is mainly to flag anything deemed as suspicious or matching predefined rules, and could be used to filter out redundant data from the sources to reduce load on the analytics engine(s).

Analysis modules are essentially a highly advanced form of analytics engine, doing stuff that’s computationally more expensive and time-consuming. It analyses transactions after they have been completed, possibly adapting the detection modules.

Investigation modules appear to provide a glorified search engine and visualisation thing, which is pretty much what you get with Casefile. Whereas information is entered manually for Casefile, Detica’s eye candy is presenting a higher volume of information from an advanced back-end.

Since everything went ‘Web 2.0’, there’s been a huge change in the way intelligence is gathered, and it’s safe to say the vast majority of today’s espionage and advanced targeted attacks begin with footprinting through Internet-based research without the target being aware of it, using information that’s published on the Internet or provided to the media. Because the information/data is already public, it’s much easier to outsource, distribute and share intelligence more efficiently than ever.
This means anyone can use the same methods to their advantage, and in terms of enhancing security, for gaining more detailed knowledge of the threats, their capabilities, relationships and how they operate. Perhaps most importantly we can, in some cases, use it to make fairly accurate predictions.

Although there’s a fair amount of literature out there on Open Source Intelligence (OSINT) suggesting various structured processes for going about this, such as Treadstone 71’s Cyber Intelligence Lifecycle, a definitive ‘beginners’ guide’ seems hard to find online, if one does indeed exist. The good news is anyone can develop effective OSINT capabilities, using tools that are freely available (there are always new ones to discover). Perhaps the only caveats here are the resources and background knowledge needed for doing it professionally.

The Techniques and Attributes
Basically OSINT is another term for research, and there’s far more to it than scraping data from whatever sources. The information has to be verified and pieced together, since the majority of it will be biased in some way. Analytical skills, information management and objectivity are essential to get decent results. The researcher must learn the context of each piece of information, and where it fits in a much larger picture.

Over the last couple of years I’ve adopted a couple of techniques that could be applied in practically any investigation. The first of these is building a timeline, which is the quickest and easiest way of putting data into context and gradually forming a reliable hypothesis from the beginning.
Relationship mapping is the other technique, where an object-oriented structure is developed that describes the environment around the subject, and the relationships between the people and entities the subject interacts with. What this does is enable the researcher to see the bigger picture and make assumptions. For example, the subject is likely to have characteristics that are common across the entities in its network, or perhaps the subject is influenced by other entities and events in ways that weren’t previously known. Paterva’s freely available Maltego Community Edition and Case File, which I’ve briefly played with, have been developed specially for this, but a conventional mindmapping program could also be used.

Legal and Ethical Stuff
The main thing to remember is OSINT is about examining information (and data) that’s public, and it should not involve invasions of privacy. A legitimate researcher must know where the line is drawn between OSINT and espionage, the latter including stuff like eliciting information, actual (illegal) network penetration and eavesdropping – in other words gaining information that hasn’t pro-actively been made public. If this is being done as part of a penetration test, the researcher should be aware this constraint doesn’t apply to most attackers, and that anything obtainable is fair game to them.

There are situations where gathering intelligence is the equivalent of playing with a hornets’ nest, a couple of examples being Op CARTEL, Op Darknet and basically anything that involves messing with organised crime. This kind of work requires experience and competence in a range of other areas, an understanding of how the players operate, a lot of preparation, and especially the backing of some authority. These are things to consider before doing this on a freelance basis.

Where to Start?
First we need a starting point, an identifier which is fairly unique, such as a username, email address or even just an IP address, which a profile can be built around. It’s much easier if the subject owns a domain (I’ll come to that).
If we’re researching an organisation, its web site is always the best place to gather all the initial information, such as email addresses, phone numbers, subdomains, job descriptions, information on recent procurements, supply chain, etc. Don’t forget to examine URLs and HTML code.

Search Engine Techniques
Conventional search engines like Bing, Google and IXQuick are always the next best source, and should always turn up more data than we’ve gathered so far. A little more digging is needed to pull information from the ‘Deep Web’ – the 90% of the web that’s not immediately accessible because of the way indexing, ranking and caching works. This is where ‘Google hacking’ and the ‘Advanced Search’ feature comes in. A few examples are:"search term"
"search term” site:domain.com
"search term 1" AND "search term 2" AND "search term 3"
cache:domain.com
filetype:doc
site:domain.com | filetype:pdf

Then there are other ‘deep web’ and ‘reputation management’ search engines like SiloBreaker, WhosTalkin, Pipl, etc. I’ve found them only marginally more useful than ‘Google hacking’.

If the subject owns a site or blog, it’s worth using the URL as a search term. The results should include various other sites where that URL was posted, which could generate several other leads. The network of associates will reveal potential traits and attributes that may have been omitted from the subject’s profile.

Command Line and Domain Tools
Often the best information is derived using the old-fashioned command line tools, especially during the reconnaissance stage of a penetration test. Standard tools such as ping, traceroute, nslookup, whois, dig and even nmap are useful for footprinting, especially when combined with knowledge of how traffic is routed across the Internet. Many of these services are provided online by Robtex, CentralOps.net and ServerSniff.net. If the subject owns a domain or web site, a good place to start is with the common domain tools such as whois, nslookup and dig. It’s not unknown for a whois entry to reveal the full name and address of an individual site owner. IP address and domain searches may return different results, so it’s important to check both.

Other command line tools are wget and strings, which together can be used for pulling all the content off a web site and extracting metadata from stuff like images, documents and executable files. This by itself can reveal a host of other unvetted data the subject never intended to make public.

Physical Locations and Geodata
Coupled with ping, traceroute and whois, InfoSniper will pin down the approximate physical locations of servers. Whether that’s of any use depends on what’s being researched and why. If the subject (person or organisation) is running its own Internet services, maybe on a Wide Area Network, their physical locations will be available.
Of course, this could be taken a stage further with Google Maps and Street View, if a penetration test involves visiting any location in person.

Social Networking
A decade ago there were numerous obscure forums and chatrooms, on which people normally communicated under pseudonyms. Any third-party wanting to build a profile of someone had to know where to look. Now people are registering on just a handful of social networks under their full names, and this is where it’s possible to aggregate data from ready-made profiles containing personally identifiable information.

LinkedIn is kind of a double-edged sword. It’s valuable for professional networking and information exchange, and on the other hand a perfect tool for finding an entry point into an organisation, or to leverage the information in order to compromise it. The profiles of employees/members of a given organisation can reveal whether they have common skills in particular software applications, platforms and operating systems, the points of contact for the IT department, key personnel, etc. Status updates can reveal if the subject has posted using an iPhone or Android device, and whether those devices are being issued by the employer. Are there patterns in the timing and geolocation data that reveal a routine of some sort? Do the updates reveal when the subject is most likely to be online? Which employees are open to manipulation?

Basic Principles of Sock Puppeting
An entire book (or at least another blog post) could be written about the creation, development and deployment of artificial online personas, or ‘sock puppets’, as they recently become known. They are very commonly used for reasons other than OSINT, with varying levels of skill – PR, ‘perception management’, infiltration, industrial espionage, etc.
Because the use of sock puppets is about deception rather than anonymity, it’s a grey area. Again, remember the differences between OSINT, social engineering and espionage, and know where to draw the line.

I purposely avoided the term ‘fake identity’ because sock puppeting involves deploying actual personas that are carefully developed and span multiple online accounts just like any legit identity, so they’re quite real in themselves. For this to work, each persona must have a background, must be established well in advance of any investigation, and should blend in perfectly with everyone else in the ‘online environment’. Anyone who does get curious will hopefully be content with finding whatever profiles were planted as cover. Treadstone 71 recommends building comprehensive profiles with detailed histories, but my advice would be to keep things simple and consistent so there’s less room for error.
Where server logs might be available to the subject/adversary, the researcher must use further measures to mask identifying data using a proxy/privoxy combination. With everything set up properly, it should be very difficult for anyone to associate that persona with the researcher.

Categories

Profile

My name is Michael, and I’m a software developer specialising in clinical systems integration and messaging (API creation, SQL Server, Windows Server, secure comms, HL7/DICOM messaging, Service Broker, etc.), using a toolkit based primarily around .NET and SQL Server, though my natural habitat is the Linux/UNIX command line interface.
Before that, I studied computer security (a lot of networking, operating system internals and reverse engineering) at the University of South Wales, and somehow managed to earn a Masters’ degree. My rackmount kit includes an old Dell Proliant, an HP ProCurve Layer 3 switch, two Cisco 2600s and a couple of UNIX systems.
Apart from all that, I’m a martial artist (Aikido and Aiki-jutsu), a practising Catholic, a prolific author of half-completed software, and a volunteer social worker.