Abstract [en]

The investigation of fraud in business has been a staple for the digital forensics practitioner since the introduction of computers in business. Much of this fraud takes place in the retail industry. When trying to stop losses from insider retail fraud, triage, i.e. the quick identification of sufficiently suspicious behaviour to warrant further investigation, is crucial, given the amount of normal, or insignificant behaviour. It has previously been demonstrated that simple statistical threshold classification is a very successful way to detect fraud~\cite{Lopez-Rojas2015}. However, in order to do triage successfully the thresholds have to be set correctly. Therefore, we present a method based on simulation to aid the user in accomplishing this, by simulating relevant fraud scenarios that are foreseeing as possible and expected, to calculate optimal threshold limits. This method gives the advantage over arbitrary thresholds that it reduces the amount of labour needed on false positives and gives additional information, such as the total cost of a specific modelled fraud behaviour, to set up a proper triage process. With our method we argue that we contribute to the allocation of resources for further investigations by optimizing the thresholds for triage and estimating the possible total cost of fraud. Using this method we manage to keep the losses below a desired percentage of sales, which the manager consider acceptable for keeping the business properly running.

Abstract [en]

The problem we address in these domains is different types of fraud. We limit ourselves to isolated cases of relatively straightforward fraud. However, in this thesis the ultimate aim is to introduce our approach towards the use of computer simulation for fraud detection and its applications in financial domains. Fraud is an important problem that impact the whole economy. Currently, there is a lack of public research into the detection of fraud. One important reason is the lack of transaction data which is often sensitive. To address this problem we present a mobile money Payment Simulator (PaySim) and Retail Store Simulator (RetSim), which allow us to generate synthetic transactional data that contains both: normal customer behaviour and fraudulent behaviour.

These simulations are Multi Agent-Based Simulations (MABS) and were calibrated using real data from financial transactions. We developed agents that represent the clients and merchants in PaySim and customers and salesmen in RetSim. The normal behaviour was based on behaviour observed in data from the field, and is codified in the agents as rules of transactions and interaction between clients and merchants, or customers and salesmen. Some of these agents were intentionally designed to act fraudulently, based on observed patterns of real fraud. We introduced known signatures of fraud in our model and simulations to test and evaluate our fraud detection methods. The resulting behaviour of the agents generate a synthetic log of all transactions as a result of the simulation. This synthetic data can be used to further advance fraud detection research, without leaking sensitive information about the underlying data or breaking any non-disclose agreements.

Using statistics and social network analysis (SNA) on real data we calibrated the relations between our agents and generate realistic synthetic data sets that were verified against the domain and validated statistically against the original source.

We then used the simulation tools to model common fraud scenarios to ascertain exactly how effective are fraud techniques such as the simplest form of statistical threshold detection, which is perhaps the most common in use. The preliminary results show that threshold detection is effective enough at keeping fraud losses at a set level. This means that there seems to be little economic room for improved fraud detection techniques.

We also implemented other applications for the simulator tools such as the set up of a triage model and the measure of cost of fraud. This showed to be an important help for managers that aim to prioritise the fraud detection and want to know how much they should invest in fraud to keep the loses below a desired limit according to different experimented and expected scenarios of fraud.