Monday, March 14, 2011

All business is an exercise in risk management. All organizations would
benefit from measuring, tracking and computing risk as a core process, much
like insurance companies do.

Predictive analytics does the trick, one customer at a time. This technology
is a data-driven means to compute the risk each customer will defect, not
respond to an expensive mailer, consume a retention discount even if she
were not going to leave in the first place, not be targeted for a telephone
solicitation that would have landed a sale, commit fraud, or become a "loss
customer" such as a bad debtor or an insurance policy-holder with high
claims.

Room: Golden Gate BDiamond Sponsor Presentation
The Hefty Toll of Fraud: How to Leverage Predictive Analytics and Social Network Analysis

As stocks and housing prices rise and fall and consumer confidence is very volatile in today's economic turmoil, one area of continuous growth has been - sadly - crime and fraud. Banks, insurance companies, and government entities are all seeing an increase in both the number and sophistication of fraudulent activities.

To fight fraud effectively, organizations must continually improve the monitoring of customer behavior across multiple accounts, systems, and agencies. They must develop a framework of components that support fraud detection, alert generation and management, and case management. Using a hybrid approach for fraud detection, the SAS Fraud Framework can include industry-specific business rules, anomaly detection, predictive models, and social network analysis. It can offer both top-down and bottom-up functionality for making hidden and risky networks visible to investigators.

This approach provides more actionable fraud detection, greater insight into suspicious activity report (SAR) management responsibilities, and improved operational efficiency, all while decreasing overall fraud spending by the organizations. Examples from both banking and childcare government support will be presented.

This case study describes the deployment of predictive analytics solution to non-statistical field operations users in order to improve their assessment of business risk in making lending decisions to second-hand car dealerships. The solution integrates various ERP and ODS systems to gather data and score in real time inventory financing applications. It helped the company decrease write offs by 10%.

In a product/services company, analytics generates its greatest value when a certain line-up of best practices is performed, ranging from gross intelligence to a more detailed understanding. This is achieved with a "three pillar" analytical approach: [Measurement Framework, Portfolio Analysis, and Customer Analysis]. Within each of these components, we move from a simpler "20,000 foot" view analysis, to deeper, more comprehensive analytics.

In this talk, Piyanka Jain will cover these components in detail, along with the tools and techniques required and gotchas to look out for. Auxiliary intelligence such as VOC (Voice of the Customer) and Competitor/Industry/Economic landscape analysis, which delivers an [outside-in] view of the business, will also be covered.

What you will walk away with is:
1. An understanding of the [analytics value chain], which sets predictive analytics into an impactful context
2. Analytics your organization needs, to better understand your business
3. Tools and methodologies best suited for the [three pillars] of analysis
4. Challenges to prepare for, as you embark on these analyses
5. Organizational support needed for analytics execution.

Survey analysis often involves hand-tuned analysis requiring weeks of labor to decipher the key relationships in survey responses. Proper coding of responses, colinearity, and missing data plague analysts in their pursuit of clear explanations of responder intent in the surveys. Additionally, while traditional statistical analyses, such as linear and logistic regression, can be used effectively in modeling survey responses, these models do not resonate with the business community in the same way they do with statisticians.

Employees are a key constituency at the Y and previous analysis has shown that their attitudes have a direct bearing on Member Satisfaction. This session will describe a successful approach for the analysis of YMCA employee surveys. Decision trees are built and examined in depth to identify key questions in describing key employee satisfaction metrics, including several interesting groupings of employee attitudes. Our approach will be contrasted with other factor analysis and regression-based approaches to survey analysis that we used initially. The predictive models described are currently in use and resulted in both greater understanding of employee attitudes, and a revised "short-form" survey with fewer key questions identified by the decision trees as the most important predictors.

In the recounting of analytics projects, my favorite part is "the reveal": where the idea that turned things around is disclosed. Often disarmingly simple (in retrospect) it is virtually always preceded by waves of failure. Yet failure, or at least an environment shockingly tolerant of it, may be essential to the emergence of such breakthroughs.

I will tell tales of some favorite "reveals" that led to technical successes. But, a true win must also be a business success. This requires dealing well with idiosyncratic carbon-based life forms. So we'll also discuss the (painfully acquired) lessons in the parallel universe of business.

As companies and investigators wrestle with the implementation and usage of comprehensive fraud, corruption, or bribery detection platforms and approaches, there is at least one clear trend that is emerging: best practices around comprehensive detection of this kind going forward will focus not on purely rules-based approaches, nor will they focus on purely predictive methodologies, but will instead incorporate a hybrid approach of the two.

At this session, we'll do a quick overview of the evolution of the database and analytics marketplace starting with the 1980's to the present. In addition we'll quickly highlight how today's challenges are forcing a fundamental rethink that undermines the status quo - many customers are discovering that ever more powerful hardware can't ultimately solve their most challenging analytic challenges. Finally we'll cover several new technologies and approaches that are emerging in the face of the Big Analytics challenge.

AdWords Quality at Google executes across a rich set of prediction applications in order to meets its business objectives, which include providing a top user experience, achieving a strong ROI for advertisers, and securing revenue for Google. This talk will cover several of these prediction problems, including:

Predicting creative quality and landing page quality

Estimating ad bounce rate

Estimating ad relevance and examination probability

Some of these must apply predictive modeling at a very large scale, involving billions of features and millions of users. This talk will discuss some of the practical lessons learned by the speaker while working on these problems at Google.

The algorithms at the heart of predictive analytics have been around for years - in some cases for decades. But now, as we see predictive analytics move to the mainstream and become a competitive necessity for organizations in all industries, the most crucial challenges are to ensure that results can be delivered to where they can make a direct impact on outcomes and business performance, and that the application of analytics can be scaled to the most demanding enterprise requirements.

This session will look at the obstacles to successfully applying analysis at the enterprise level, and how today's approaches and technologies can enable the true "industrialization" of predictive analytics.

Data prediction competitions facilitate a step change in the evolution of analytics outsourcing. They offer companies a cost effective to harness the "cognitive surplus" of researchers and analysts who are hungry for real-world data and motivated to excel whatever the prize. Competitions are particularly effective because there are any number of techniques that can be applied to a modeling problem, but we can't know in advance which will be most effective. By exposing the problem to a wide audience, competitions are a cost effective way to reach the frontier of what is possible from a given dataset. In just a few months, competitions hosted by Kaggle have helped further the state of the art in HIV research, chess ratings and have outperformed sports betting markets.

The field of quantitative finance, while it has attracted criticism over the past few years, is an enormously rich yet mine-laden domain for machine learning and statistics - full of unsolved problems that require both new science and innovative engineering. Cerebellum Capital was founded to create a system capable of autonomously finding, testing, refining, launching, improving, and when necessary decommissioning novel trading strategies. This talk will give a tour of the challenges and opportunities we have found as a group of outsiders approaching this domain as a computer science problem.

Understanding online behavior is key to driving customers to your online and offline store. By capturing "surfing" data in log files, and tracing customers psychographics via surveys, we are able to predict the most profitable customers for acquisition and retention. This session will also discuss targeting to present "the right offer to the right audience at the right moment."

Fraud is a costly problem for many businesses, and the efforts required to protect against it further compound the price. We will discuss the cultural and business hazards of addressing versus ignoring fraud, as well as the enormous ROI possible when adaptable quantitative tools are used to detect ever-changing anomalous behavior. Case studies from some of our consulting engagements will highlight lessons learned about what makes a potential fraud detection project ripe for success.

Why are certain industries and roles being so heavily impacted by the use of advanced analytics and others not? IIA's CEO will first share with attendees the list of industries and functional roles that his firm sees being most impacted this year and why, and then present a specific case study from a leading health insurer.

Industry leaders are very swiftly coming to an agreement that analytics will be a critical component of the competitive strategy in the 21st century. To that end the early adopters and the followers have started leveraging analytics in different ways. However to generate true benefit companies need to institutionalize Analytics while a whole host of companies are stopping at just leveraging Analytics to generate insights or to solve a given problem. In this presentation we will talk about what it means and what it takes to institutionalize analytics, and what habits organizations need to change to institutionalize Analytics.

Attendees Will Leave With:

how to convert an Analytics solution into an integral part of business processes

how to drive decision making using Analytics

how to generate organizational buy-in to make Analytics the front and center of business strategy

Using examples from his past companies and projects, Astro will give a tour through four key elements required to make products and services more intelligently automated. The talk will cover each of these four elements (data, patterns, access, and architecture), how each functions separately and how they function together to allow for automated products and services to be more intelligent and for them to be constantly improving in the automated intelligence they offer their users. The talk will also touch on feedback from users and user-interfaces for that feedback and how this plays into the intelligent technology ecosystem and lifecycle.

There is a heavy reliance on logistic regression in many Risk applications in industry at large. Such ubiquity results in part from ease of interpretability and for implementation parsimony including fast execution times. However, many other algorithmic options exist such as Ensembles, which can produce many suites of models that are aggregated for a common decision. We conduct some applied research to explore the efficacy of decision tree ensembles for PayPal fraud prediction. Both computational cost and predictive effectiveness are taken into account. We find that ensembles are a suitable contender against other methods, and with some advantages.

We present an approach that scores a prospect database with derived preference estimates for individual electricity pricing plan features. Customers make purchase decisions within hypothetical purchasing scenarios in a survey. Bayesian choice models predict the customer's purchase probability and estimates the lift in demand (utilities) for different levels of features tested. Predictive analytics is then used with database variables to score every record in a prospect database with likely preference for individual features. Marketing decisions are greatly enhanced by knowing which product features are most appealing to individual prospects and messaging is optimized. Lift in customer response will be presented.

One of the primary objectives in database marketing is computing the likelihood a customer will buy/transact within a time span (viz. next six months/one year). Various techniques such as Logistic Regression, successfully employed in a B2C context, have been found less efficient in a B2B context. In this session we describe a methodology using a Bayesian analysis framework to assign each customer a probability score that they will purchase within a time span. This provides the basis by which customers may be ranked and targeted more efficiently for marketing purposes.

David Smith, Vice President of Marketing for Revolution Analytics, relentless R Blogger co-author (with Bill Venables) of the tutorial manual An Introduction to R and long-time R coder will provide a brief introduction to the R language.
The feature speaker will be Byron Ellis, Director of Analytics at adBrite, Inc. in San Francisco, CA "A Few Of My Favorite Things: A Brief Tour of An Information Environment". Byron will show how he uses R with some of his favorite tools: node.js and MongoDB, to quickly build analytics systems that range from large scale interactive dashboards and ambient data displays to predictive models. Within the context of a hypothetical modeling system that has R at its core and integrates into a larger data environment, he will demonstrate how to do exploratory prototyping and share exploratory data analyses.

Speaker: David Smith, Vice President of Marketing for Revolution Analytics

Speaker: Byron Ellis, Director of Analytics, adBrite Inc.

Room: Salon 5 & 6Bay Area SAS Users Group Meeting

David Bell, State of Calif, Dept of Industrial RelationsReal Data, Real Headache? Using Proc Mixed and Maximum Entropy Correlated Equilibria to Longitudinally Analyze Small Sample Data.This presentation will demonstrate the power of mixed longitudinal hierarchical linear models (i.e., Proc Mixed) to measure progress of a treatment program of 10 participants over a 6 month time period. Standardized psychological tests were used to measure progress which leveraged reliability; however, meaningful analysis of individual and group progress was completed given the limitations of time and treatment participants.

William Jackman: Proc Glimmix – An Overview
The GLIMMIX procedure is a new procedure in SAS/STAT software. It was an add-on product in SAS 9.1 on the Windows platform. but now in SAS 9.2 is a production procedure.
This presentation is intended for those with a general statistical background who want to learn what PROC GLIMMIX is and what it does. It is not intended for those already using PROC GLIMMIX who want to learn more details about how to use it.

Tuesday, March 15, 2011

The progenitor and connoisseur of "competing on analytics" illustrates
what it takes to create an analytics-driven business. Hint: it's not about the data and it's not about the math. Tom is the authority on building broad capabilities for enterprise-level business intelligence
and he's back by popular demand. In this talk Tom addresses the organizational culture and business leadership required to make the most of the science of analysis, and shares stories of people who have
made this transition and the resulting competitive edge their organizations exploit. Learn how to reap the rewards of business analytics from the man who laid the groundwork and wrote the book.

Predictive analytics has taken off, across industry sectors and across
applications in marketing, fraud detection, credit scoring and beyond. Where
exactly are we in the process of crossing the chasm toward pervasive
deployment, and how can we ensure progress keeps up the pace and stays on
target?

This expert panel will address:

How much of predictive analytics' potential has been fully realized?

Where are the outstanding opportunities with greatest potential?

What are the greatest challenges faced by the industry in achieving wide
scale adoption?

Data competitions come of age: from movie recommendations to life and death. Possibly the biggest news in predictive modeling in 2011 is Heritage Provider Network's $3 million predictive modeling prize - the biggest data mining competition ever. It requires data scientists to build algorithms that predict who will go to hospital in the next year, so that preventive action can be taken. We will take this opportunity to release new information on the contest's timeline and intermediate progress prizes.

That's right, fleas! Fleas learn habits that place artificial limits on themselves. Unfortunately, many analytic professionals fall into this same trap. While there are options available today that can tremendously improve the efficiency and scalability of an analytic environment, many companies are stuck in the old way of doing things and are failing to cash in on the benefits. Would you like to spend more time on analysis and less time fighting to get your data together? If so, then listen to this discussion on how to make sure your organization has not fallen into a flea-like trap.

By now, the value of marketing analytics is widely recognized among the business community. Widespread success has been reported using predictive models to complete tasks ranging from optimizing placement of online advertisements to imputing people's movie preferences. What hasn't been discussed much is that building predictive models take valuable resources, that predictive models have modest accuracy, and that the business value of models is not fully considered. Without proper attention to these issues, analytics teams run the risk of over-promising, and under-delivering.

In this case study, we will outline the key steps of developing predictive models that deliver true business value. This requires understanding not just the predictive performance but also error rates, cost of errors, cost of investment, and return on investment. The context is a real-world example of a predictive model used to target selected customers for more expensive marketing communication

Attend this session and learn how management consultants at Beyond the Arc helped one of the world's largest banks build an effective Voice of the Customer program, leading to increased customer satisfaction, loyalty and retention.

Their secret? Integrating and then analyzing data from across the enterprise - including unstructured text data. Using IBM SPSS predictive analytic solutions, Beyond the Arc unlocked the value of data sources underutilized by the bank - such as survey comments, Twitter messages and call center notes. Discover how predictive analytics enables you to use diverse feedback channels to attract and retain your most profitable customers, identify fast-moving emerging issues, and improve the customer experience.

Attend this session and learn:

How to integrate new data sources and customer touch points

How to prepare your data to maximize analytical effectiveness

How to collect, analyze and act upon diverse feedback channels in a holistic way

How IBM SPSS modeling tools can help you easily analyze unstructured data

1:30pm-2:15pmRoom: Golden Gate BSpecial Plenary SessionThe State of the Social Data Revolution

With enterprises acting upon predictive models on a second-by-second basis, the thirst for more powerful and relevant data is only growing. Quenching that thirst, there's no hotter emerging wealth of data than social data. In this session, Dr. Weigend will:

Cover examples from industry verticals where he projects social data will deliver the greatest impact, including insurance (should consumers be priced by their friends' risky behavior?), retail (improving product recommendations) and telecommunications.

Uncover the influence of companies in the very business of collecting social data, including Google and Facebook, as well as other companies that make social data accessible to others.

Deliver the state of the social data revolution, framing the discussion for five additional conference sessions that address this topic, later the same day at Predictive Analytics World.

Now that investments have been made in web analytics infrastructure, how can one extract value from the vast amounts of data being collected and stored? A discussion of successful methods used for converting internet data to value:

Developing an appropriate context for defining value for your business

Ensuring the right data is being collected and connected to other sources

Increasing competition and more ads per keyword result in advertisers facing higher bid prices and an uphill task of maximizing ROI. This drives a need for marketers to look for smarter ways to leverage their data and run their SEM campaigns. Getting smarter with the long tail of keywords is one key lever.

In this presentation, we will see examples of how data mining can be used in SEM campaigns to improve efficiency:

Using Predictive Analytics to derive a value (revenue) per click for each keyword -- leading to informed CPC bidding

Clustering the long tail to segments keywords and inform keyword bidding.

As our online ticketing business grows exponentially we model and track penetration in key markets at Eventbrite. This session shows how you can accomplish four key goals of understanding your growth: trace historical penetration of the market, forecast future growth, predict market saturation, and determine which campaigns accelerate market penetration.

Is peer-to-peer lending different from the traditional bank-initiated lending? How do the customers/lenders think and act? We used a comprehensive p2p data and found some interesting behavioral and analytical results on social interactions.

Speaker: Aaron Lai, Former VP and Senior Quantitative Research Associate,Bank of America

Ingenix, an industry leader in healthcare information technology, has a data repository consisting of diagnosis, procedure, pharmacy, and lab claims covering 80 million lives over the last 17 years. In this presentation we show how this data is used to build statistical models which help in identifying individuals at risk of a certain disease, based on the similarity of their historical claims with others whose history and outcome is known. The objective is early intervention for people at risk so as to delay or even prevent the onset of the disease. We present a case study of developing the predictive models for type 2 diabetes mellitus. The entire process starting with filtering raw claims data, building regression models with diagnostics, solving for coefficients and computing accuracy measures, is completed in 30 minutes for population sizes of over 2 million people with 2,000 predictors on a Netezza TwinFin 12 platform running the In-Database Analytics library DB Lytix from Fuzzy Logix.

This live demo will give you insight as to how Analytics Software is used to develop predictive models in the insurance industry. Noe Tuason, Customer Research Manager for Insurance at the California State Automobile Association (CSAA) will speak to you about business challenges involving:

Networks are the common data structure that unify the otherwise diverse range of social media services. In this session learn how to extract social networks from various social media systems and analyze and visualize the structures found in collections of connections. Learn to use the free and open NodeXL add-in for Excel 2007/2010 to analyze email, twitter, facebook, youtube, www, flickr, and wiki networks.

An analytic model that is not in use has no value to an organization so analytic deployment is critical. And when analytic models must be applied to operational decisions - micro decisions about a single customer, a single claim or a single transaction - deploying analytics becomes more complex. This session will show how a business rules-based infrastructure is ideal for deploying analytics into operational systems. The power of rules to rapidly implement analytic models and to turn those analytic models into decision-making software components will be illustrated with real cases.

Track 2: Social Data and Telecom
Case Study: Major North American TelecomSocial Networking Data for Churn Analysis

A North American Telecom found that it had a window into social contacts - who has been calling whom on its network. This data proved to be predictive of churn. Using SQL, and GAM in R, we explored how to use this data to improve the identification of likely churners. We will present many dimensions of the lessons learned on this engagement.