Overview

Credit card fraud is a big problem for businesses and consumers, so being able to detect fraudulent transactions is very important. Using a great dataset from Kaggle, I wanted to take a look at this problem more closely. The dataset contains transactions made by credit cards in September 2013 by European cardholders. It presents transactions that occurred in two days, where there were 492 frauds out of 284,807 transactions.

First, let’s take a look at the data

Time

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

V11

V12

V13

V14

V15

V16

V17

V18

V19

V20

V21

V22

V23

V24

V25

V26

V27

V28

Amount

Class

1

0

-1.3598071336738

-0.0727811733098497

2.53634673796914

1.37815522427443

-0.338320769942518

0.462387777762292

0.239598554061257

0.0986979012610507

0.363786969611213

0.0907941719789316

-0.551599533260813

-0.617800855762348

-0.991389847235408

-0.311169353699879

1.46817697209427

-0.470400525259478

0.207971241929242

0.0257905801985591

0.403992960255733

0.251412098239705

-0.018306777944153

0.277837575558899

-0.110473910188767

0.0669280749146731

0.128539358273528

-0.189114843888824

0.133558376740387

-0.0210530534538215

149.62

0

2

0

1.19185711131486

0.26615071205963

0.16648011335321

0.448154078460911

0.0600176492822243

-0.0823608088155687

-0.0788029833323113

0.0851016549148104

-0.255425128109186

-0.166974414004614

1.61272666105479

1.06523531137287

0.48909501589608

-0.143772296441519

0.635558093258208

0.463917041022171

-0.114804663102346

-0.183361270123994

-0.145783041325259

-0.0690831352230203

-0.225775248033138

-0.638671952771851

0.101288021253234

-0.339846475529127

0.167170404418143

0.125894532368176

-0.00898309914322813

0.0147241691924927

2.69

0

3

1

-1.35835406159823

-1.34016307473609

1.77320934263119

0.379779593034328

-0.503198133318193

1.80049938079263

0.791460956450422

0.247675786588991

-1.51465432260583

0.207642865216696

0.624501459424895

0.066083685268831

0.717292731410831

-0.165945922763554

2.34586494901581

-2.89008319444231

1.10996937869599

-0.121359313195888

-2.26185709530414

0.524979725224404

0.247998153469754

0.771679401917229

0.909412262347719

-0.689280956490685

-0.327641833735251

-0.139096571514147

-0.0553527940384261

-0.0597518405929204

378.66

0

4

1

-0.966271711572087

-0.185226008082898

1.79299333957872

-0.863291275036453

-0.0103088796030823

1.24720316752486

0.23760893977178

0.377435874652262

-1.38702406270197

-0.0549519224713749

-0.226487263835401

0.178228225877303

0.507756869957169

-0.28792374549456

-0.631418117709045

-1.0596472454325

-0.684092786345479

1.96577500349538

-1.2326219700892

-0.208037781160366

-0.108300452035545

0.00527359678253453

-0.190320518742841

-1.17557533186321

0.647376034602038

-0.221928844458407

0.0627228487293033

0.0614576285006353

123.5

0

5

2

-1.15823309349523

0.877736754848451

1.548717846511

0.403033933955121

-0.407193377311653

0.0959214624684256

0.592940745385545

-0.270532677192282

0.817739308235294

0.753074431976354

-0.822842877946363

0.53819555014995

1.3458515932154

-1.11966983471731

0.175121130008994

-0.451449182813529

-0.237033239362776

-0.0381947870352842

0.803486924960175

0.408542360392758

-0.00943069713232919

0.79827849458971

-0.137458079619063

0.141266983824769

-0.206009587619756

0.502292224181569

0.219422229513348

0.215153147499206

69.99

0

6

2

-0.425965884412454

0.960523044882985

1.14110934232219

-0.168252079760302

0.42098688077219

-0.0297275516639742

0.476200948720027

0.260314333074874

-0.56867137571251

-0.371407196834471

1.34126198001957

0.359893837038039

-0.358090652573631

-0.137133700217612

0.517616806555742

0.401725895589603

-0.0581328233640131

0.0686531494425432

-0.0331937877876282

0.0849676720682049

-0.208253514656728

-0.559824796253248

-0.0263976679795373

-0.371426583174346

-0.232793816737034

0.105914779097957

0.253844224739337

0.0810802569229443

3.67

0

The data is anonymous and has been transformed using a PCR transformation due to confidentiality issues, so we are left with 28 features labelled V1-V28, the timestamp which is the number of seconds elapsed since the first transaction, the transaction amount and the feature ‘Class’ which takes the value 1 in case of fraud and 0 otherwise.

The models we’ll be using, particularly the neural networks, tend to perform better with data in the range of [0,1] so we will begin by scaling the data

Sampling

One other thing to notice is that the data is highly unbalanced

This will be the biggest hurdle to developing an accurate predictive model. We need to be intelligent about how we sample the data to balance it out before proceeding with the models. There are many options for this, but I decided on an SMOTE technique, not only because it tends to work well in highly unbalanced datasets, but because there’s also a very simple R plugin called unbalanced which makes this task a breeze.

We should check that the proportion of fraudulent transactions in our split datasets are around the same as the complete dataset. This is very important since there are an exceedingly low number of these transactions. We should also check the dimensions of the split data

You’ll notice that we now have only 6192 rows of training data, but the data is much more balanced.

Neural Network with Backpropagation

Now we can move onto fitting our first model, which is a Neural network with backpropagation. It has two hidden layers of 20 and 15 neurons respectively and a learning rate of 0.1. I chose these parameters because it gave the most accurate results. Sometimes deciding on the parameters of a neural network can be a case of trial and error, but in general hidden layers around 2/3 of the size of the input layer tend to work well.

We can now predict values for the test data and compare with the known output. The best way to visualize this is through a confusion matrix. The neural network outputs decimal values between 0 and 1. Since we require either 0 or 1 for our predictors, a threshold must be applied. I’ve written a short function which applies a threshold and computes the confusion matrix

The balanced accuracy is a more meaningful measure as it separates the positive and negative cases, and then takes the average. The overall accuracy is above 0.99 which may sound impressive, but when there is highly unbalanced data it doesn’t have much meaning. In this case, it becomes a game of trying to minimize false negatives, while keeping the false positives at an acceptable limit. A credit card company would need to investigate all false positives, and this would add work for employees.

Bayesian Neural Network

The next model we will fit is a Bayesian Neural Network. We are using a function in R called brnn. It fits a two layer feed-forward neural network and uses the Nguyen and Widrow algorithm to assign initial weights and the Gauss-Newton algorithm to perform the optimization

The Bayesian neural network has performed better than the first model. The balanced accuracy is higher and both the number of false negatives and false positives are lower. It has also detected 137 fraudulent transactions.

Support Vector Machine

The last model we will fit is a Support Vector Machine (SVM) algorithm. It seeks to find a hyperplane in N dimensions (where N is the number of predictors) in order to split the data points into two classes. We will use the svm function in the R package e1071

The model performs quite well, comparable with the first Neural Network

The biggest thing to notice is that the number of false positives is much lower than the previous two models. Optimizing and refining this model could lead to a balanced accuracy above 0.95, but there are limitations to this as it is a very simple model.

One idea is to take an aggregate of the predicted outputs of all the models and chose the most popular. I’ve written a simple function to do this

The function threshout adjusts the predicted values based on a threshold, and the voting function outputs the combined predicted values

The combined models performed quite well, and the balanced accuracy is very close to the Bayesian Neural Network. One thing to notice is that the number of false positives is lower than any of the other models, whilst still keeping the false negatives low. This method of aggregating models shows promise and a variation of this could perform quite well. This would be the direction I’d take moving forward, trying to optimize the existing models while pursuing innovative methods to combine and create a stronger learning algorithm.