For years, I have been trying to get to the bottom of what type of self-learning predictive models and fraud scoring systems the vendors I cover provide. I often got the impression, that in many cases, it was a bit of a Wizard of Oz scenario, with some guys sitting behind a big door or curtain, mining the data and writing rules for each of their customers, based on the fraud they experienced and confirmed.

This was the real reason ‘tuning’ was needed and the systems did not work well out of the box, because the guys or gals hadn’t yet ‘tweaked the models for the organization’ meaning they hadn’t yet mined the company’s data and written the rules accordingly. This becomes especially problematic when the company doesn’t have any confirmed fraud. The irony in these situations is that you can’t prevent fraud until you experience enough of it!

It was also the main reason the models ‘degrade over time.’ The rules stop working once the bad guys catch on to them, so the cycle of data mining and rule creation by the guys and gals behind the curtain must start once again, sometimes costing the customers tens of thousands of dollars if not more.

I continue to learn that this is pretty much the way many of these ‘predictive models’ work. Most of them are essentially just rules.

The only time models can run ‘out of the box’ is when the customer’s situation is akin to their peers and the model is built on consortium data where the confirmed fraud of others’ experiences can help pinpoint fraud for each participant of the consortium. In most cases, vendors that base their models on consortium data use predictive modeling and scoring techniques, e.g. based on neural or bayesian networks, more often than the vendors who don’t. But consortium models have had their limits because many companies don’t want to share their fraud data with anyone – not the authorities, not their competitors and not the vendors.

Further, self-learning models aren’t a reality in fraud management, at least from what I have seen. The vendors have to run their own analyses to find the outliers – or the transactions not evaluated by the model – and then figure out what they have in common so they can manually adjust the model to take them into account.

In any event, the next time a vendor’s model seems like a black box, it probably means there are a few geeks behind the curtain mining your data and building rules. If nothing else, they should make it clear that after a set period, those rules will become ineffective so you will have to invite them back — and pay them a considerable amount of money — unless you’ve learned to write your own rules and ‘models.’

Category:

Avivah Litan
VP Distinguished Analyst 12 years at Gartner 30 years IT industry

Avivah Litan is a Vice President and Distinguished Analyst in Gartner Research. Her area of expertise includes financial fraud, authentication, access management, identity proofing, identity theft, fraud detection and prevention applications…Read Full Bio

Thoughts on How Smart are Predictive Models?

“Ignore that man behind the curtain!” boomed the voice of the Wizard of Oz, causing Dorothy and crew to tremble in fear. In the book by Frank Baum, the story is quite a bit different from the movie most of us know. The Wizard appears in a different form to each member of Dorothy’s party when they make their requests of him (they must also wear green-tinted glasses when viewing him). Not unlike the way the black-box predictive model appears to each client who is asking it to detect their unique form of fraud.

Before discussing how smart predictive models are, let’s discuss how they work a little bit first. State-of-the-art machine learning approaches to binary classification cover many fields, from credit card fraud detection to disease diagnostics in medicine. A binary classifier model assigns one of two categories to an example presented to it. For fraud, we have ‘fraud’ and ‘not fraud’ as the two classes. This gives rise to four possible outcomes as follows:

1. True Positive (TP) – a correct classification of a fraud as a fraud
2. False Positive (FP) – an incorrect classification of a non-fraud as a fraud
3. False Negative (FN) – an incorrect classification of a fraud as non-fraud
4. True Negative (TN) – a correct classification of a non-fraud as a non-fraud

The ‘True’ outcomes are where our classifier got things right, and the ‘False’ outcomes are where it has made an error. The counts of all these outcomes give us a “contingency table” from which we can figure things like detection rate, false positive rate, alerting rate, and so on. In the fraud detection business, we are generally only interested in the detection rate of the classifier below a false positive rate of at most 5%, and usually much lower.
For a thorough discussion on performance evaluation and optimization of fraud detectors, have a look at our white paper here: http://www.gcxanalytics.com/papers/GCX%20Fraud%20Detection%20Performance%20Evaluation-GCX.pdf

There are dozens of algorithms for generating binary classifiers from sample data, one need only look at the free machine learning applications like Weka and RapidMinder to get an idea. Some work better than others, and much of their success lies in the domain expertise of the people behind The Wizard’s curtain. Usually there is quite a bit of transformation done on the raw data from the client to create the actual training data for the algorithm. So, after all the ingredients are in the pan, it bakes for a bit, and, voila, out pops a predictive model.

So, when the analytics vendors are ‘tuning’ the model for a client, they are usually not creating an entirely new model. The up-front investment in the data transformation (or training schema definition) is where much of the valuable expertise is. But, they still have to do the data transforms, and run the algorithm to get a particular instance of that model from that particular data.

Data mining, machine learning, and predictive analytics are essentially data-driven approaches; i.e. the characteristics of the data determine the model. This is in contrast to ‘judgment-based’ systems, usually a set of rules developed from an analyst’s personal experience with the business domain.

So, it is generally true that ‘one size does not fit all’ in fraud detection models. Even in areas like credit or debit card fraud, where there is plenty of commonality to make a consortium model viable, performance improvements are possible. Here at GCX Analytics we develop tailored ‘re-scoring’ models for credit and debit card fraud detection, and achieve up 50% improvement in dollar detection rates while maintaining or reducing the false positive rate.

What about “self-learning?” We all learn, hopefully, and the idea that machines learn in the same way people is a bit of a misnomer. The basic structure of a predictive does not change over time unless a model is generated from new or augmented data; this process normally requires the supervision of an analytics expert. What is represented as “self-learning” is actually closer to adaptive weighting of predictive features based on prior outcomes. When a fraud investigator assigns an alert the status of ‘false positive,’ an adaptive model will in the future put a little less importance on the factors that created this false alert. It is an error-correction procedure, not really “self-learning.”

Rules, rules, rules. So many rules. Rules are popular with people because they can (sort of) understand them. Rules have the unfortunate aspect of being rather blunt instruments, since they only can say “fraud” or “not fraud” and leave out any finer reasoning. Old-school algorithms like ID3 and C4.5 decision trees are equivalent to rules. Advanced classifiers are generally establishing a hyper-dimensional partitioning surface to put ‘frauds’ on one side and ‘not frauds’ on the other, where the classification error costs determine the shape of the partitioning surface.

Another question worth asking is “How valuable are predictive models?” After all, we are in business to make money, or at least not lose it fraudsters. The four outcomes noted above also have their own benefit or cost, usually called ‘the payoff matrix.’ We can multiply the counts in the contingency table by the costs or benefits, tally them up, and get a net economic benefit of the detector.

Even better, if our detector is a scoring classifier, i.e. it produces a number indicating the likelihood of fraud, say, from 0 to 1000, we will get different net benefits at various score thresholds for sorting transactions or customers in frauds/victims or not. The cost/benefit function almost always has a maximum at some score, and this is where value of the predictive model to the business is best.

Lastly, the decay of models over time is inevitable. It’s not that the model changed, the data did. Fraudsters are quick studies, and engage in active counter-detection discovery protocols. At GCX, we recommend a quarterly model refresh, subject to performance monitoring over time (control charts).

About

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.