In this article I hope you learn the future of predictive analytics in decision management and how tighter integration between rules and learning are being developed that will adaptively improve diagnostic capabilities, especially in maximizing profitability and detecting adversarial conduct, such as fraud, money laundering and terrorism.

Business Intelligence

Visualizing business performance is obviously important, but improving business performance is even more important. A good view of operations, such as this nice dashboard[1], helps management see the forest (and, with good drill-down, some interesting trees).

With good visualization, management can gain insights into how to improve business processes, but if the view does include a focus on outcomes, improvement in operational decision making will be relatively slow in coming.

Whether or not you use business intelligence software to produce your reports or present dashboards, however, you can improve your operational decision management by applying statistics and other predictive analytic techniques to discover hidden correlations between what you know before a decision and what you learn afterwards to improve your decision making over time.

This has become known as decision management, thanks to Fair Isaac Corporation, but not until after they acquired Hecht Nielsen Corporation.

Enterprise Decision Management

HNC pioneered the use of predictive analytics to optimize decision making. Dr. Nielsen formed the company in 1986 to apply neural network technology to to predict fraud. The resulting application (perhaps it is more of a tool) is called Falcon. It works.

In 2002, Fair Isaac acquired HNC (for roughly $800,000,000 in stock) to pursue a “common strategic vision for the growth of the analytics and decision management technology market”. But shortly before the merger, HNC had acquired Blaze Software from Brokat for a song following the Dot Bomb of October, 2000 – a month before 9/11. This gave HNC not only great learning technology but, with a business rules management system (BRMS), the opportunity to play in broader business process management (BPM), including underwriting and rating (which is highly regulated), for example.

Of course, the business rules market has since become fairly mainstream and closely related to governance, risk and compliance (GRC), all of which were beyond the point decision making capabilities of either HNC or Fair Isaac before both these transactions.

Once Fair Isaac had predictive and rule technology under one roof, bright employees such as James Taylor, coined “Enterprise Decision Management”, or EDM for short.

Predictive Analytic Sweet Spots

Before it merged with Fair Isaac, HNC’s machine learning technology was successful (meaning it was saving tons of money, not just an application or two) in each of the following business to consumer (B2C) application areas:

Credit card fraud

Workmen’s compensation fraud

Property and casualty fraud

Medical insurance fraud

Clearly fraud, across insurance and financial services is a sweet spot for decision management. Today, that includes money laundering and, in general, any form of deceit, including adversarial forms, such as involving terrorism.

Predictive Analytic Challenges

The principle problem with predictive analysis is the care and feeding of the neural network or the business intelligence software. This involves formulating models, running them against example input data given outcomes, and examining the results. For the most part, this is the province of statisticians or artificial intelligence folk.

A secondary challenge involves the gap between the output of a predictive model and the actual decision. A predictive model generally outputs a continuous score rather than a discrete decision. To make a decision, a threshold is generally applied to this score.

Yes or no questions are answered by applying a threshold to a score produced by a formula or neural network to determine “true” or “false”.

Multiple choice questions are answered using a predictive model per choice and choosing the one with the highest score.

More complex decisions are answered as above using a predictive model that combines the scores produced by other predictive models.

In general, especially where decisions are governed by policy or regulation, predictive models and decision tables are combined with rules using one of the following approaches:

More complex decisions are answered as above using predictive models that are selected by rules in compliance with governing policy or regulations.

More complex decisions are answered using rules that consider the scores produced by predictive models in compliance with governing policy or regulations.

In general, governance, risk and compliance (GRC) requires rules in addition to any predictive models. Rules are also commonly used within or to select predictive models. And special cases and exceptions are common applications of rules in combination with predictive models.

Scorecards

A simple case of defining (or combining) predictive models is a scorecard. The following example shows a scorecard from Fair Isaac’s nice brochure on predictive analytics that could be part of a credit worthiness score:

Fair Isaac is the leader in credit scoring, of course. Their FICO score is the output of a proprietary predictive model.

The following example shows how Fair Isaac’s predictive model is combined with other factors in the mortgage industry (click it for a closer look):

Note all the exceptions and special cases spread throughout this scorecard!

This explains why business rules have been so popular in the mortgage industry. Pre-qualifying and quoting across many lenders clearly requires a business rules approach (which explains why Gallagher Financial embedded my stuff in their software a decade ago). Even a single lender has to deal with its own special cases and the bigger the lender the more there are (which is why Countrywide Financial>[2] developed its own rules technology, called Merlin, decades ago).

Decision tables

For anything but the simplest decisions, the results of predictive models are considered along with other data using rules to make decisions. In some cases, these rules are simple enough to fit into a decision table (or a decision tree rendered as a table) such as the following:

Tables like the one on the left can be used during underwriting to determine what variables are appropriate for gauging the risk of death covered by a life insurance policy. This demonstrates that rules (in this case, very simple rules) can be used to determine which predictive model (or inputs) to consider in a decision.

Tables like the one on the right correspond to decision trees and can be used instead of scorecards to set the base premium for auto insurance. Additional rules typically adjust for other factors like driver’s education classes, driving record, student drivers, and other special cases and exceptions. This is similar to the use of notes in the mortgage pricing sheet shown above.

The point is that real decisions are not as simple as a single predictive model, a scorecard, or a decision table. And once these decisions are defined and automated using any combination of these techniques, improving those decisions can seem overwhelming complex (just from a technical perspective!)

Predictive analytics is not enough for EDM

Enterprise Decision Management (EDM), discussed above, is all about this multi-dimensional decision technology environment (scorecards, decision tables, and rules) but also about bringing statistical and neural network technology in to improve the decision making process more easily and less manually or subjectively. The Fair Isaac brochure referenced above, for example, has some nice graphics showing statistical techniques (such as clustering) and graphs showing interconnected “nerves”.

There are several aspects of decision making that not even magically successful machine learning will eliminate, however:

The requirement to comply with governing policies or regulations.

Special cases that cannot be learned for various reasons, including:

Limitations on the number of variables used in predictive analytics.

Poorly understood, non-linear relationships in the data

A lack of adequate sampling for special cases

A need for certainty rather than probability

Exceptions that cannot be learned, as with special cases.

Of course, special cases and exceptions are common in both policy and regulation. For examples, consider policies that arise from contracts or customer relationships or the evolutionary nature of legislation, as reflected in the article on the earned income tax credit.

Rules are not enough for EDM

On the other extreme, commercial rule technology has not been capable of adaptively improving decision management. In fact, except when they are modified by people, the use of rules in decision management is completely static, as well as entirely black and white. There is no learning with any of the business rules management systems from leaders like Fair Isaac, Ilog, Haley, Corticon, or Pegasystems.

Amazingly, there are no mainstream rule systems today that deal with probability or other kinds of uncertainty. Without such support, every rule in the tools from the vendors previously mentioned is black and white. This makes them very awkward for applications such as diagnosis. And all decision management applications, including profit maximization and all forms of fraud detection, are intrinsically diagnostic. Prediction results in probabilities!

The earliest diagnostic expert systems were developed at Stanford. One used subjective probabilities to diagnose bacterial infections. Another used more rigorous probabilities to find or deposits (it more than paid for itself when it found a $100,000,000 molybdenum deposit circa 1990!). These applications were called MYCIN and PROSPECTOR, respectively.

This seems shocking really, since the technology of these systems is well-understood and technically almost trivial. The truth is that the Carnegie Mellon approach to business rules has won because it dealt with “the closed-world assumption”, which means that it could handle missing data better. But CMU’s approach was strictly black and white. Stanford was left in the dust commercially after the success of OPS5 at Digital Equipment Corporation and the commercialization of expert systems at Carnegie Group, Inference Corporation, IntelliCorp and Teknowledge left uncertainty in the dust during the mid-eighties. Neuron Data, which became Blaze, followed the same trail away from the uncertain toward the black and white of tightly governed and regulated decisions.

With nothing but black and white rules, EDM leaves it up to people to adapt the decisions. Sure they can use predictive analytics, but there is no closed loop from predictive analytics involving rules. Any new rules or any changes to rules follow the stand-alone business rules approach.

Innovate for Rewards with bounded Risk

One problem with black and white rules technology is that it forces you to be right. This stifles innovation. Ideally, you could formulate an idea and experiment with it at bounded risk. For example, you could say “what if we offered free checking to anyone who opens a new credit card account with us” and test it out. You don’t want to absorb the cost of thousands accepting your offer only to lose more on checking fees that you gain through credit card fees. So you indicate how often or how many such offers can be made.

Not surprisingly, this approach is tried-and-true. It’s most common form is the champion/challenger approach. Fair Isaac has been “championing” this approach for some time (see this from James Taylor).

But how do you close the loop? How do the rules learn when this new option should be used to maximize profit? The fact is, they don’t. People do it using predictive analytic techniques and manually refining the rules.

The problem, once again, is that the rules do not learn and that their outcomes are black and white. The rules do not offer a probability that this will be a profitable transaction. And they do not learn whether a transaction will be profitable over time, either. That’s up to the users of predictive technology and managers.

Adaptive Decision Management

Adaptive Decision Management (ADM) is the next step in EDM. In ADM the loop between predictive analytics and rules is closed. At a minimum this involves learning the probabilities or reliability of rules and their conclusions. This learning occurs using statistical or neural network techniques that can be trained, optimized, and tested off-line and – if your circumstances allow – even allowed to continue learning and adapting and optimizing while on-line. For example, advertising, promotional (e.g., pricing) and social applications almost always adapt continuously. Unfortunately, none of them use rules to do this yet, since the major players don’t support it!

Innovation and ADM

The adaptation of rule-based logic brings new flexibility and opportunity to the use of rules in decision management. Adding a black and white business rule requires complete certainty that the rule will result in only appropriate decisions. Of course, such certainty is a high hurdle. Adaptive rules have a relatively low hurdle.

With adaptive rules an innovative idea can be introduced with a low, or even a zero probability. As experience accumulates, the learning mechanism (again, statistical or neural) determines how reliable the rule is (i.e., how well it would have performed given outcomes). The technology can even learn how to weight and combine the conditions of rules so as to maximize their predictive accuracy. Without learning “inside a rule”, the probability of the rule as a whole may remain too low to be useful. And, unlike a black box neural network, the functions that combine conditions and the probabilities of rules are readily accessible, whether for insight or oversight.

The overall impact of adaptive rules is that you can put an idea into action within a generalized, probabilistic champion/challenger framework. And using techniques such as the subjective Bayesian method used in MYCIN or other more rigorous techniques as in PROSPECTOR, more patterns can be considered and leveraged with the continuously improving performance that EDM is all about.

Although they haven’t told me about it explicitly, I would expect Fair Isaac to move in this direction first among the current leaders given their EDM focus. I would not be surprised to see business intelligence (BI) vendors, perhaps SAS, move in this direction, too. I know it will happen since we are already working with one commercial source of adaptive rules technology. Unfortunately, Automata is under NDA about their approach for now, but stay tuned… In the meantime, if you’re interested in learning more, please drop us a note at info at haleyAI.com. And if you see any issues or good applications, we would love to hear them.

[1]A nice dashboard from from Financial Services Technology (http://www.fsteurope.com/) using Corda (http://www.corda.com/)[2]I recently helped Countrywide upgrade to our software, just as much for usability as performance improvements.

Does your business have logic that is more or less complicated than filing your taxes?

Most business logic is at least as complicated. But most business rule metaphors are not up to expressing tax regulations in a simple manner. Nonetheless, the tax regulations are full of great training material for learning how to analyze and capture business rules.

For example, consider the earned income credit (EIC) for federal income tax purposes in the United States. This tutorial uses the guide for 2003, which is available here. There is also a cheat sheet that attempts to simplify the matter, available here. (Or click on the pictures.)

What you will see here is typical of what business analysts do to clarify business requirements, policies, and logic. Nothing here is specific to rule-based programming. (more…)

For those who are interested in my former company, they are still committed to natural language business rules management technology, as shown in their most recent press release. They have also picked up on the public sector activity, especially eligibility, as discussed here.

From the release, CEO, Dominic O’Hanlon, said:

“With our natural language rule authoring capabilities and BRMS solutions, we are uniquely positioned to make our customers more competitive and agile in a fast-paced, highly-regulated world.”

“For the government market, Haley is a worldwide leader in using natural language technology to rapidly transform regulations, policies and rules into automated decision-making systems, to determine eligibility for government services, and in the taxation and immigration arenas.”

TIBCO is the CEP vendor most focused on the market for business rules, as reflected in Paul Vincent’s post here. Although I agree with Paul that rule vendors are not currently offering enough in terms of support for long-running processes, the conclusions that he draws in favor of considering a CEP alternative to a BRMS are not compelling yet.

Paul said that rules don’t address the following that are addressed by CEP:

Complex event processing (CEP) software handles many low-level events to recognize a high-level event that triggers a business process. Since many business processes do not consider low-level data events, BPM may not seem to need event processing. On the other hand, event processing would not be relevant at all if it did not occasionally trigger a business process or decision. In other words, it appears that:

CEP requires BPM but

BPM does not require CEP

The first point is market limiting for CEP vendors. Fortunately for CEP vendors, however, most BPM does require event-processing, however complex. In fact, event processing is perhaps the greatest weakness of current BPM systems (BPMS) and business rules management systems (BRMS), as discussed further below. (more…)

JBOSS Rules (formerly Drools) just described its imminent support for rules expressed in the CLIPS syntax here.

NASA derived CLIPS from the syntax of Inference Corporation’s Automated Reasoning Tool (ART) in the mid-80s. I designed and implemented the ART syntax with Chuck Williams on a team with Brad Allen and Mark Wright.

We have been teaching a computer to answer questions like, “How much did IBM’s earnings change last quarter?” It takes a fair bit of knowledge, including how to understand English, to answer this question. But teaching it what a “quarter” is brought back memories of debates with some former CMU colleagues about what units are and how to model time. Since quite a few people ask me for help with knowledge engineering and ontological matters, I thought some might be interested in parts of those debates.As you will see, a strong upper ontology of common knowledge is required to understand common business knowledge. Leveraging such an ontology is the only way to deliver business rules for under $50.

Sentences like “do something if more than a number of possibly related things have happened within a timeframe of something else happening” or “do something if nothing happens within a timeframe following something happening” are extremely common in business process management (BPM), complex event processing (CEP), and workflow. With a sense of time, a business rules management system (BRMS) can support BPM, CEP, and workflow applications almost trivially. Without a sense of time, most BRMS force users to perform computations.

For example, without a sense of time and an infrastructure that supports it, the sentence “call a customer if no response is received within 30 days of notifying the customer of a delinquency” has to be transformed into something like “if a notice is mailed on a date and the notice is a delinquency and the date of notification has a day number then compute the date for checking by adding 30 to the day number and check for a response to the delinquency notice on the date for checking”. The checking on a date for a response to a notice must also be implemented as a database (or persistent queue) of events to be polled or triggered by application code. Then a second rule is required to implement the check, as in “if checking whether a response has been received to a notice and the notice was given on a date of notice and the notice was given to a customer and there exists no record of communication with the customer since the date of notice then call the customer”. (Note that this is actually how most BRMS products would implement this. The natural language approach I prefer handles the original sentence.)

The discussion here reflects the general structure and content that a usable ontology for business process management requires. Most users of business rules management tools will find the need to understand and engineer this discussion in their tool of choice. As my Haley Systems customers know, much of this is reflected in Authority’s built-in ontology and English vocabulary, but quite a few of the points discussed here reflect improvements, especially concerning the confusion between units and amounts.

As you will see the discussion takes careful thinking. Some readers may find it onerous. If at any time you have had enough (or if you simply cannot take anymore!), please skip to the end and decide whether to fill in the conclusions by revisiting the body.

Work on acquiring knowledge about science has estimated the cost of encoding knowledge in question answering or problem solving systems at $10,000 per page of relevant textbooks. Regrettably, such estimates are also consistent with the commercial experience of many business rules adopters. The cost of capturing and automating hundreds or thousands of business rules is typically several hundred dollars per rule. The labor costs alone for a implementing several hundred rules too often exceed $100,000.

The fact that most rule adopters face costs exceeding $200 per rule is even more discouraging when this cost does not include the cost of eliciting or harvesting functional requirements or policies but is just the cost of translating such content into the more technical expressions understood by business rules management systems (BRMS) or business rule engines (BRE).

I recommend against adopting any business rule approach that cannot limit the cost of automating elicited or harvested content to less than $100 per rule given a few hundred rules. In fact, Automata provides fixed price services consistent with the following graph using an approach similar to the one I developed at Haley Systems.

A manager of an enterprise architecture group recently asked me how to train business analysts to elicit or harvest rules effectively. We talked for a bit about the similarities in skills between rules and requirements and agreed that analysts will fail to understand rules as they fail to understand requirements.

For example, just substitute rules in the historical distribution of requirements failures:[1]

I am working on some tutorial material for business analysts tasked with eliciting and harvesting rules using some commercial business rules management systems (BRMS). The knowledgeable consumers of this material intuitively agree that capturing business rules should be performed by business analysts who also capture requirements. They understand that the clarity of rules is just as critical to successful application of BRMS as the clarity of requirements is to “whirlpool” development.[1] But they are frustrated by the distinct training for requirements versus rules. They believe, and I agree, that unification of requirements and rules management is needed.

Consider these words from Forrester:

One might argue that Word documents, email, phone calls, and stakeholder meetings alone are adequate for managing rules. In fact, that is the methodology currently used for most projects in a large number of IT shops. However, this informal, ad hoc approach doesn’t ensure rigorous rules definition that is communicated and understood by all parties. More importantly, it doesn’t lend itself to managing the inevitable rules changes that will occur throughout the life of the project. The goal must be to embrace and manage change, not to prevent it. [2]

But note that Forrester used the word “requirements” everywhere I used “rules” above!

Both of the following statements are true, but the first is more informative:

Business Rules Management Systems (BRMS) typically produce forward chaining production rules that are interpreted by[1] a business rules engine (BRE) based on the Rete Algorithm.

BRMS typically generate rules that are interpreted by a BRE.

First, dropping the word “production” before “rules” loses information. BRMS do not typically generate rules that are not production rules. Consider, for example, the BRMS vendors involved in the OMG effort produced the Production Rule Representation (PRR) standard. The obvious question is:

What is different about production rules?

Second, dropping the words “based on the Rete Algorithm” loses information. The dominant rules vendors and open-source engines are all based on the Rete Algorithm.

Why does the Rete Algorithm matter?

Third, dropping the word “chaining” before “rules” loses information. Chaining refers to the sequential application of rules, as in a chain where each link is the application of one rule and links are tied together by their interaction. But:

Some strategy folks in an enterprise architecture group recently asked for help making rules more relevant to their organization. Their concerns ranged from when to embed rules in their middle tier versus encapsulate them within services to identifying ideal use cases and reference implementations. They were specifically interested in coupling rules with BPM and BI.

Such questions occur every time a group or enterprise considers adopting rules technology for more than a specific application. They are looking for guidelines, blueprints, or patterns that will help them disseminate understanding about when and how to use rules. They have adopted a BPM vendor which will be integrated with their selected rule vendor, each as enterprise standards, so they are particularly interested in the integration requirements between the two.

Two high level understandings are critical for success in furthering adoption of rules technology.

Some recent correspondence with clients and prospective adopters of business rules technology indicates interested mainstream has become increasing concerned and confused by consolidation in the business rules market.

On the analyst front, they read advice such as the following from Gartner:[1]

As Gartner has stated, the BRE market is a volatile technology sector, and market trends point to increased consolidation. In recent research, we stated that some consolidation will come from rules-to-rules acquisitions. Recent examples of this include Trilogy/Versata buying Gensym and now, RuleBurst purchasing Haley Systems.

Another form this consolidation will take is application vendors or business process management suite vendors buying much-needed rule technology, as seen in SAP’s recently announced intention to purchase Yasu Technologies. In either case, rule technology will persist, but the vendors selling the technology will often be different.

I agree with Gartner, enterprise app and BPM vendors desperately need rules technology. I also agree with the following analysis from Forrester:[2]

SAP’s decision to purchase Hyderabad, India-based Yasu Technologies greatly improves its business rules management capabilities. Other large vendors would be wise to follow SAP’s lead in the business rules market. If you look at the big vendors, they’re all going to need this technology. SAP’s competitors are going to have to step up to these requirements also.

It’s encouraging that SAP bought Business Objects and is now buying Yasu. We’re seeing requirements to link business rules and business intelligence or analytics. SAP has told us they have seen these requests, and we’re encouraged that SAP is now acting.”

Unfortunately, Gartner’s concluding advice could have been more constructive:

Prospective BRE customers: Buyer beware – the rule engine market is a volatile sector. Choose your vendors carefully and be prepared to see more BRE acquisitions.

Establish centralized or transferable infrastructure, including architectural aspects, tools and repositories that reflect and support established methodologies, reusable content, and reference implementations.

Establish criteria, best practices and rationale for various administrative matters, especially change management concerning the life cycles of content (e.g., regulations or policies) and applications (e.g., releases and patches).

I was quickly surprised to find myself struggling to write down recommendations for the skill set required to seed the core staff. My recommendations were less technical than the client may have expected. After further consideration, it became clear than any discrepancy in expectations arose from differences in our unvoiced strategic assumptions. Objectives, such as those listed above, are no substitute for a clearly articulated mission and strategy.