Fraud detection in retail

Fraud detection is all about connecting the dots. We are going to see how to use graph analysis to identify stolen credit cards and fake identities. For the purpose of this article we have worked with Ralf Becher, irregular.bi. Ralf is Qlik Luminary and he provides solutions to integrate the Graph approach into Business Intelligence solutions like QlikView and Qlik Sense.

Third party fraud in retail

Third party fraud occurs when a criminal uses someone else’s identity to commit fraud. For a typical retail operation this takes the form of individuals or groups of individuals using stolen credit card to purchase high-value items.

Fighting it is a challenge. In particular, it means having a capability to detect potential fraud cases in large datasets and a capability to distinguish between real cases and false positives (the cases that look suspicious but are legitimate).

Traditional fraud detection systems focus on threshold related to customers activities. Suspicious activities include for example multiple purchases of the same product, high number of transactions per person or per credit card.

For the this article, we have prepared a dummy dataset typical of an online retail operation. It includes:

order details: product, amount, order-id, date;

personal details: first name, last name;

contact info: phone, email;

payment: credit card;

shipping: address, zip, city, country;

tracking: IP address.

To analyse the connections in our data, we stored it in a Neo4j, the leading graph database. The graph approach lies in modelling data as nodes and edges. Here is a schema of our data represented as a graph:

Finding suspicious transactions

Now that the data is stored in Neo4j, we can analyse it.

First of all we need to set a benchmark for what’s normal. Here is an example of a transaction:

Example of a legitimate account

Now that we have an idea of what not to look we can start thinking about patterns specifically associated with fraud. One such pattern is a personal piece of information (IP, email, credit card, address) associated with multiple persons.

Neo4j includes a graph query language called Cypher that allows us to detect such a pattern. Here is how to do it:

What this query does is search for shared personal pieces of information. It returns all groups of at least two persons and two orders connected by a common personal information.

To verify the accuracy of our query, fine-tune it or evaluate how to act on the alerts it returns, we will use graph visualization.

Case#1: multiple people sharing the same email

The address edmund@gmail.com (center) is shared by 3 people (purple nodes)

Here we can see that 3 persons are sharing the same email. Are we looking at a potential fraud? If we expand the graph, we can see that 3 persons have distinct addresses, IPs, phones and credit cards.

Data associated with the 3 distinct people using edmund@gmail.com

In isolation, each of this person looks normal. Edmund Cagliostro for example seems like a legitimate customer.

Details of Edmund Cagliostro

The fact that these seemingly distinct accounts share a common address is suspicious. It justifies to further investigate Edmund Cagliostro and its connections.

Case#2: multiple people using the same IP address

Our query also reveals an IP address shared by multiple persons.

An IP address (center) with connections to 5 persons (purple) and orders (orange)

We can see that IP address 0.106.244.75 is shared by 5 people. Once again this is suspicious and should be investigated.

Graph visualization can help us inspect potential fraud cases and quickly evaluate them.

Identifying a ring of fraudsters

Now that we have found a couple of suspicious fraud cases, it’s time to dig deeper. We want to assess the full impact of an individual fraud to take appropriate actions.

Let’s say we noticed in our dummy dataset that a “Leisa Gugliotta” is involved in a fraud. Not only do we want to block any transactions from her but we also need to identify her potential accomplices. In order to do that, we need to see who else is using the personal information used by Leisa Gugliotta.

We can run the same analysis via Linkurious. The result is the following graph:

The people involved the fraud ring led by Leisa Gugliotta

This picture makes it easy to view that our retail operation has been targeted by a fraud ring. Leisa Gugliotta shares a credit card with one other person and a email address with 4 people. These fraudsters can all be identified by the connections between them. Now we can freeze their accounts and add their information to our blacklist.

Third party fraud means that personal pieces of information are reused to create fake identifies (know as synthetic identities). Graph analysis makes it possible to spot that pattern and prevent fraud. Through graph visualization, we can quickly evaluate potential fraud cases and make informed decisions. Try Linkurious now to learn more!

9 Responses to “Fraud detection in retail”

Hi, I really enjoy these posts. A small point on this one is that the image titled “An IP address (center) with connections to 5 persons (purple) and orders (orange)” is not clickable unlike the other ones. 🙂

Linkurious has documentation, including setup for their commercial product. However the scripts they provide do not apply to the open source version on github at https://github.com/Linkurious/linkurious.js The scripts they refer to do not exist among the files in the open source version. Linkurious claims on their web site that there version “is” available on github, but the link is broken. I believe that since they now charge a price for their version, they had to remove the free version they had on github. I think this was just to get people interested in their product. Since their is not documentation for linkurious.js, I will not be using it.

In addition to linkurious.js, we sell Linkurious Enterprise and Linkurious Starter. These are available only through a commercial license. You need to buy licenses to download these products: https://linkurio.us/product/#plans

why would one need a graph database to find similar items? RDBMS does it as well. Intresting would be how “close/near” potential accomplices are to an identified person, by sharing the not unified information, like adress, house number etc

Be careful assuming that it is suspicious for people to share IP addresses. What if they are on public WiFi? The same could be true about street addresses and phone numbers in cultures where it is more common to share resources. The people sharing the phone or address could be family or room mates. While it may be a good starting point to look for shared resources, the evidence should considered in tandem with other fraud indicators.