Predictive Analytics 2.0 has so many new things in it that my original article (Introducing SAP Predictive Analytics 2.0!) was not able to go through details for the SAP Automated Predictive Library (APL) – a major milestone in our efforts to integrate and embed our advanced analytics services everywhere and into everything.

The SAP APL is a native C++ implementation of the automated predictive capabilities of SAP InfiniteInsight running directly in SAP HANA. Now, for the first time, you can run our patented automated predictive algorithms on your data stored in SAP HANA without first requiring an expensive and time consuming data extraction process. This also opens up an entirely new area of use cases – such as on-the-fly, in-database scoring for predictions, classifications, and clustering scenarios.

Augmenting SAP HANA's Predictive Capabilities

When I talk to people about the APL, the first question I usually get is, “I thought HANA already had native predictive capabilities?”. The answer of course is yes, in the form of the SAP Predictive Analytics Library (SAP PAL). The SAP PAL is also a HANA-native implementation of algorithms, but these are more suited to data scientists who have a data mining background and need more explicit modelling of the analytical workflow.

The key differentiator for the SAP APL is the “A” for “automated”. The APL does not take in a complex predictive model as an input – it simply needs to be set up and be told what type of data mining function needs to be applied to the data. From there the APL takes over by composing its own models, creating and selectively eliminating metadata as required, and ultimately coming up with the most optimal model given the data you provided – in a mostly automated way. This means customers, developers, and partners do not need to be data scientists to use the SAP APL – they simply need to feed the APL what they have and tell it what they need. Even better, SAP Predictive Analytics 2.0 in both “Automated” and “Expert” modes can also natively take advantage of the APL on SAP HANA right from the desktop client.

When you combine "automated" with "all calculations done in HANA without requiring data extraction", you end up with a pretty incredible solution that can enable you to do things that simply were not possible before. All other solutions either require data extraction or do "in-database scoring" by using a fixed model. The SAP APL is unique in that it can be self-tuning while still providing "in-database" scoring on the fly.

(By the way, I should also mention that SAP HANA has even more predictive capabilities than I am discussing here – more notably it’s support for the open-source “R” library. While this does require an off-board “R” Server and involves data transfer outside of HANA, it opens HANA up to the over 5,000 open source algorithms that are out there. Even better, SAP Predictive Analytics 2.0’s “Expert Analytics” mode also uses “R” and seamlessly works with HANA’s use of the “R” server to provide a complete end-to-end advanced analytics solution for the data scientist.)

Overview of SAP Automated Predictive Library (APL)

The SAP APL is installed as a library inside the HANA AFL (Application Function Library). The following diagram gives an overview of where it fits in:

The APL is accessibly by many ways: You can access the APL by calling the functions from SQL scripts or from the Application Functional Modeler (AFM) within HANA Studio. Desktop users can also access the APL by using SAP Predictive Analytics 2.0 in both the “Automated” and “Expert” modes. Finally, applications built directly on SAP HANA can embed APL functionality without exposing any complexity to the user. Since the APL is new as of February 11th, currently only SAP applications use it, but the APL is open for use by any application sitting on SAP HANA.

In this first release, the APL has five classes of capabilities:

Classification: To predict a binary answer – i.e. Is this transaction fraudulent or not?

Regression: To predict or score an amount that is a non-binary value - i.e. Determining the insurance risk factor this this driver.

Clustering: To find groups in your dataset – i.e. Who are all the people likely to buy my product today?

Time Series: To predict future values based on previously observed values – How likely are flight cancellations in winter vs. summer months?

Key Influencers: To find other attributes that are impacting a particular dimension – i.e. What are indicators in my data of future equipment failure?

Here is an example of using the APL within SAP Predictive Analytics 2.0’s “Expert Analytics” mode:

As you can see, the APL algorithms are easily recognizable by their “HANA Auto-“ prefix. It is also interesting to note that within Expert Analytics, you can chain together algorithms of different types – for example, you could start with Auto Classification and then run separate “R” or PAL algorithms on each of the individual clusters by chaining them to the APL’s output.

If you are using SAP’s Hybris Marketing, SAP loud for Customer, or SAP Fraud Management, you are likely already using the SAP APL - that's how seamless and easy it is to get advanced analytical capabilites to the end user.

How To Get Started With SAP APL

You can get started NOW. The SAP Automated Predictive Library (APL) is a HANA-native component and therefore of course you need SAP HANA – currently SP09 is supported. Provided you are properly licensed for it, you can find it at: https://support.sap.com/software/patches/a-z-index.html

Note: If you do not see it at the link above, you likely are not licensed for it - contact your SAP representative who can discuss licensing and trial options for you. Chances are there's a way you can try it - as long as you already have SAP HANA installed.

(If you want to get access to these automated capabilities but do not have SAP HANA, you still can! SAP Predictive Analytics 2.0 already implements these algorithms and has no HANA dependencies) – The current trial of the desktop client is the previous version (but still has all the automated functionalities). Wewill be announcing a new PA 2.0 Trial Program soon within the next week or two!

What's Next for SAP APL

As my previous article stated, this is just the beginning of a journey for SAP Predictive Analytics 2.x – and we have big plans for the SAP APL as well. We are working on creating recommendation services, tighter integration into our other predictive offerings, and even bringing these HANA-native services to the grand-daddy of all HANA’s – the SAP HANA Cloud Platform (HCP).

Ensure you are bookmarking and rating articles you like and keep checking the SAP Predictive Analytics SCN community for all the latest. (Hint: Setting email alerts will notify you when something new gets posted).

(p.s. An extra special thanks to Marc DANIAU and team for driving the APL development and providing much of the material for this article.)

On February 11th, we formally released the next major version of our advanced analytics product – SAP Predictive Analytics 2.0. This is a big release for us because it not only includes many customer-driven features, but it also finally brings our two predictive analysis tools together into a single solution:

SAP Predictive Analysis 1.x:

SAP’s advanced analytics solution aimed at advanced business analysts and data scientists to analyze and visualize their data using powerful predictive algorithms, the “R” open-source statistical analysis language, and in-memory data mining capabilities. “SAP PA 1.x” is built upon the SAP Lumira codebase which also gives it excellent advanced visualization and data discovery capabilities as well.

SAP InfiniteInsight 7.x:

SAP’s automated data preparation, predictive modeling, and scoring solution that allows business users to easily and quickly find meaning in their data without requiring the skills of a data scientist. “SAP II 7.x” is at the forefront of automated predictive analysis and includes the product set from SAP’s acquisition of KXEN in 2013.

Why Did We Have Two Products In The First Place?

Recognizing the natural evolution of business intelligence going from answering “what happened?” to the much more interesting “why did it happen?”, SAP created a new advanced analysis product in 2012 – SAP Predictive Analysis. This product was aimed at the data analysts and data scientists who until then had little choice but to develop modelling scripts and algorithms by hand and then apply algorithms to their data manually. SAP Predictive Analysis 1.x enabled users to analyze and visualize their data using pre-built algorithms from the open-source “R” library and graphically “chain” these modules together to perform complex analysis without a technically challenging and tedious manual modelling process.

Then in 2013, SAP acquired KXEN – which made a product called InfiniteInsight that enabled business users to automatically analyze their data without manual modelling or even requiring the skills of a data scientist or statistician. SAP InfiniteInsight 7.x contains its own intelligent and self-tuning algorithms that encapsulate much of the manual preparation and modelling work a data scientist would typically do so business users can focus on answering their business problems instead of deciding which algorithm to use and when.

Predictive Analytics 2.0 Bridges Two Worlds

SAP Predictive Analytics 2.0 brings these two products together into a single installable solution and contains the functionality and experiences of both products. But just to make things interesting, we have also changed the product name slightly – the unified solution is now called “SAP Predictive Analytics” and not “SAP Predictive Analysis”.

When we set out to create a unified Predictive Analytics 2.0, we set out to balance three important factors:

UX Consistency: Existing customers upgrading to PA 2.0 need a high level of UX consistency to mitigate any disruption from the unification process. Therefore in PA 2.0:

We’ve preserved the previous PA 1.x experience and renamed it as “Expert Analytics”.

We’ve preserved the previous II 7.x experience and renamed it as “Automated Analytics”.

Users upgrading to PA 2.0 should feel right at home with whichever mode they have used before but now have a second set of capabilities they may not have had exposure to before.

UX Progression: While we preserved the traditional experience of both products, we are committed to bringing these together into a “next generation” experience that not only melds the “data scientist” and “business analyst” workflows together, but also makes them interoperable. PA 2.0 provides the foundation for this with a unified installer and users will see incremental changes in PA 2.1 and PA 2.2 rather than a wholesale change to a foreign interface.

We believe this strategy properly balances the need for providing highly demanded new functionality in the existing products while creating a non-disruptive roadmap to a new product with a next generation experience that provides even stronger “automated” and “expert” capabilities than the current II 7.x and PA 1.x products offer.

New in SAP Predictive Analytics 2.0

Here’s a high level summary of the more notable advances in PA 2.0:

Expert Analytics (Formerly Predictive Analysis 1.x):

SAP BW Connectivity: Strong feedback from our BW customers drove the requirement to easily access BEx queries and InfoProviders as a datasource for advanced analytics. This is an “offline” connectivity which means PA 2.0 connects to BW natively using BICS and downloads a dataset to the desktop. More information is available in these documents:

Custom SAP HANA Predictive Analysis Library (PAL) support: Custom components created using algorithms from the SAP HANA PAL can now be added and used in Expert Analytics. You can read more about this here:

Like any journey, there’s going to be a few steps along the way. We plan to deliver Predictive Analytics 2.1 in Q2 and you’ll see all the progress we’ve made at this year’s SAPPHIRE conference in May. There’s a lot going on in PA 2.1: improvements in R management for data scientists, predictive model management, and support for ultra-wide datasets are just a few highlights. The convergence theme will also continue in PA 2.1 and beyond as we get closer and closer to unveiling a next generation user experience optimized for predictive analysis workflows across both the data scientist and business analyst personas.

However PA 2.x on the desktop is only part of the story – we will also be offering advanced analytics services on SAP HANA Cloud Platform (HCP) for anyone to make their applications smarter and you’ll see even more integration of our next generation capabilities directly in other products like SAP Lumira, SAP Cloud for Planning, and Hybris Marketing.

As you can see, 2015 is going to be a BIG year for SAP Predictive Analytics. To keep up to date with the latest news, product updates, and discussions, make sure you are visiting the Predictive SCN Community here: SAP Predictive Analytics If you have not already done so, now is a good time to set up alerts so that you are automatically alerted when something gets posted. Simply go to http://scn.sap.com/community/predictive-analysis and once you are logged in, select “Start email notifications” on the right hand “actions” menu.

SAP Predictive Analytics is a tool that offers predictive capabilities to analyze data from various sources and apply predictive algorithms to get insights from the data. The functionality is enriched with the integration of R Language. SAP Predictive Analytics and R together gives a very good pitch for the game.

The tool can take data directly from various resources, including SAP HANA, SAP BI/BW systems; and with the HANA PAL (Predictive Analysis Library) functions, we can do better analysis of the data and predict more precisely. The business use cases where these algorithms can be applied keeps increasing day by day. and one such situation is that when we want to export the data ( predicted values ) out of the tool.

In this blog, we will be taking a business use case to export content from SAP Predictive Analytics to an external database( in our case, it will be a MySQL database). The blog will cover only the system connection parameters, we do not discuss about the algorithms or the data here.

SAP Predictive Analytics Version: 2.0.0

Step 1: Check if the database driver is available

To connect to the external system, we need the respective database driver to be installed in your installation of SAP Predictive Analytics.

Follow the steps to see if the respective database driver is installed in your system.

From the home screen : File -> Preferences ( You also use Ctrl+P)

From the preferences menu, select the SQL Drivers menu and scroll down to the ORACLE category, where you can find the MySQL5 - JDBC driver option available. The icon will be green, if the driver is already available, else click on the "Install Drivers" button on the top right corner of the window.

With an ever evolving and growing HANA Platform the Predictive Analytics (PA), Expert Analytics team have a tough job keeping up with all the new algorithms being added to the Predictive Algorithm Library (PAL). That's not to say that new algorithms won't be added in eventually. The current focus has been on including the Automated Predictive Library (APL). From PA 2.0 onwards these are available in both HANA and non-HANA modes.

To provide a more complete offering the PA development team have now added the ability to add your own custom HANA components inside Predictive Analytics. What this really means if there's a HANA PAL function that you want to take advantage of that Expert Analytics does not have an out of the box node for we can quickly and easily add one without being a developer.

So for example I have been doing some basket analysis with Apriori and now wish to investigate the other association analysis methods in the HANA PAL I can find the appropriate algorithm, FP-Growth and include it within PA 2.0.

Let's open up Expert Analytics as its now called as part of Predictive Analytics 2.0

Here I've connected to a HANA SP9 source

You will now see the additional components you can add, select PAL Component

Enter the Component Name that you want to use

Here you can see all the available PAL Functions that have been surfaced with PA 2.0

I chose FPGROWTH as I'm looking to do some more Basket Analysis. Expert Analytics then knows the parameters that the PAL function requires.

You can then adjust the names and default values as required.

When you press Finish you will see the new Component available in the Algorithms section, with an "N" for new.

You can now use that on the canvas just like all the other components.

You will now see the parameters exactly as you created the component in the previous steps.

That's it you can now run the workflow, and you can re-use the custom component whenever you need it.

Shiny App offers a way to create interactive data analysis applications using R scripts. This blog post will show how to use this interactive analysis capability in SAP Predictive Analytics. In general we utilize a custom R component to hold the Shiny App so it can be added into PA analysis chain. There are steps required to tweak a normal shiny app to become compliant with PA's requirements on custom R scripts. The remainderof this blog post will show this in more details. Before diving deeper into this topic, let's have a sneak peak of the final interactive shiny app:

Prerequisites

Install shiny app:

install.packages("shiny")

Add ShinyApp as PA Custom R Component:

(1) Create a custom R component and give it a name. Here we use "shinyapp_mtcars".

(2) Add PA compliant shiny app code to the script editor. Normally a shiny app splits its code into two R script files: data analysis logics and plotting logics in server.R, and the ui design in ui.R. Here we need to squeeze server.R and ui.R into one primary function as required by PA. Luckily, shiny app provides a way to define both in runApp. The structure of a PA-compatile shiny app looks like this:

The parameter mydata is the data frame output from the previous component in the analysis chain. Normally the shinyServer function does not require the third parameter session. But in our case, it is important to have it as this parameter makes sure the shiny app correctly exits when the browser window holding the app is closed. So make sure the session$onSessionEnded function is correctly set up. Make sure the other configurations are also correctly set:

(3) Click Next -> Finish to finish the addition of this custom R component. Now you should be able to see it in the side algorithm panel.

Run PA analysis with a Shiny App component

Using shiny app in PA analysis chain is no different from using a normal custom R component in the analysis chain. In the example shown in this blog post, I use a chain consisting of two analysis nodes: an out-of-box R-Kmeans component and a Shiny App component.

Run the chain and a browser window will pop up containing the Shiny App. Change the value of the scroll bar on the left panel and you will be able to see the plots on the right are also updated. Closing the browser window will return back to PA.

Now we have a close look into this Shiny App. The dataset used here is mtcars. It is loaded into PA as an offline dataset. This analysis chain firstly uses a R-K-Means component to cluster the dataset into three clusters according to features including cylinders, horse power, and weight. Then the shiny app component comes into play for interactive data analysis of the clustered dataset. Basically it does three things: (1) according to the ratio as specified in the slider, it samples the clustered dataset; (2) for data in the sample set, it draws a histogram according to which cluster a car belongs to and plots it (3) for data the in sample set, it fits a linear model to represent the correlation between weight and horse power, then plot the training set and the model. This analysis is only a toy application but I hope this could spark more use cases of practical uses.

Note that to re-run the analysis chain one must update components in the analysis chain first. This could be to remove and add back in the Shiny App component. It can also be changing and changing back the configuration of the K-means component.

Please contact me if you are interested in further discussion on this topic. Source code of the Shiny custom R component used in this blog post is available on request.

1- CASE STUDY

Attend this session for insight into how The Irvine Company enhanced its BI strategy to include predictive analytics capabilities. Come away with firsthand lessons from the company to help you determine the business scenarios in which predictive analytics can provide the greatest value, and get lessons on ...More »

2- ROADMAP

Attend this session to explore the fundamentals of SAP Predictive Analytics, and find out how data scientist, business analyst and business users can leverage it for greater visibility into trends, risks, and opportunities. You will learn how to ...More »

3- PREDICTIVE ON SAP HANA

Gain an understanding of the tools that run predictive algorithms within SAP HANA, including SAP Predictive Analysis, the predictive analysis libraries, SAP HANA-R Integration, SAP InfiniteInsight, and the SAP HANA application function modeler. Suited to both administrators and predictive users, in this session you will ...More »

4- EMBEDDED PREDICTIVE

Find out how embedded advanced analytics with SAP HANA allows you to identify, combine, and manage multiple sources of data and build advanced analytics models within your business applications to drive business impact. Understand how to ...More »

5- PREDICTIVE FOR THE REST OF US

“Advanced Analytics” doesn’t mean you need to be a data scientist to use complex algorithms to unlock the secrets of your data and improve the way you do BI. This session focuses on techniques for preparing data, choosing the right algorithms and building analytical models from scratch, including ...More »

6- USE CASE EXAMPLES

With an emphasis on use case examples, this session examines how to exploit SAP's advanced analytics solutions for big data and their associated algorithms to drive business impact. Attend to ...More »

7- BIG DATA

This session explores how big data shapes technology strategies and drives new business models and revenue streams for companies across all industries. Attend for insight into SAP's big data technology strategy and discuss...More »

SAP Business Warehouse (BW) and SAP Predictive Analysis

One of the coolest features that customers can get a first look at in SAP Predictive Analysis 1.21 is connectivity to any SAP Business Warehouse (BW) system in a database-agnostic way. This allows users to work with BEx queries, with or without variables, to fetch the train dataset for offline analysis. Hierarchies are automatically flattened to allow algorithms to process the data and generate models.

How to enable SAP BW Data Source in Predictive Analysis 1.21

In Predictive Analysis 1.21 this feature is turned off by default and can be enabled manually for those customers who wish to get a first look at BW connectivity ahead of the upcoming release where it will be made generally available. Please refer to Using SAP Predictive Analysis in Combination with SAP Business Warehouse Queries which outlines the currently supported features, restrictions, and recommendations

If you have SAP Predictive Analysis 1.21 and want to try downloading data from your SAP BW data source, you have to manually enable the feature first using the following steps:

Add the following additional line to the end of the file (including the first dash): -Dpa.bw.download.enable=true

Save the file and start SAP Predictive Analysis 1.21

NOTE: If you are unable to save the file due to insufficient permissions, you may need to run Notepad as Administrator first (By right clicking on the Notepad icon and selecting 'Run as administrator') and then open the file for editing.

BW Data Source in SAP Predictive Analytics 2.0

In the upcoming release of Predictive Analytics 2.0 (Expert Analytics), SAP BW Data Source feature will be generally available by default in Expert Analytics without the need to manually enable it. Watch for announcements and further information regarding BW Data Source in this upcoming release soon.

Downloading data from SAP BW database

You can directly connect to SAP Business Warehouse (BW) systems. This allows you to download train sample sets for model training offline. You can not only connect to SAP BW InfoProviders, but also execute BEx queries and pass values for variables defined in the queries. This eliminates the need to manually download or export data from SAP BW to perform predictions on the data.

Before you can connect to a SAP Business Warehouse (BW) system, it needs to be registered through SAP GUI for Windows on the same computer. It is recommended to test the connectivity of the registered BW system via SAP Logon before using it in Predictive Analytics. Refer to SAP GUI for Windows documentation on SAP Help Portal at http://help.sap.com to configure a new system. To download SAP GUI for Windows, go to SAP Service Marketplace.

1. To create a new dataset, first select a source by choosing File menu -> New.

2. In the dialog that appears, select Download from SAP Business Warehouse option and choose Next.

3. Enter BW connection details such as Client ID, Language, User, and Password and choose Connect. Note List of available BW servers is derived from the SAP Logon.

4. Select the BEx Query from the list of available queries and choose Create. You can search for your BEx Query or InfoProvider using either the Roles or InfoAreas view. Alternatively, you can search for key words in the Find box.

5. Enter values for mandatory variables of the BEx Query.

6. Where queries have hierarchy variables defined, you need to select a hierarchy.

7. Select values within the hierarchy that have been filtered based on the node selected for the previous variable and choose OK.

8. Finally, select dimensions and measures with the predictive use case in mind. These are downloaded when you choose Create.

The dataset is added in the Predict room and you can apply algorithms to the dataset to generate predictions.

Target your customers with location-based marketing. In this retail banking scenario, we analyze geo-referenced transactions to zoom in on the bank’s customers and deliver even more relevant marketing to each individual.

Use predictive analytics solutions from SAP to determine the bands a user will listen to in a music streaming service. Learn how to build -in a few clicks - a recommendation engine that personalizes content on your website and across a variety of channels for your customers.

See hows predictive analytics solutions from SAP, integrated into a CRM application, can help a call center agent determine the best offer to recommend to a customer calling to purchase tickets for the next football game.

How does a major actor in consumer packaged goods operate 40,000 smart vending machines and still offer a personalized customer experience? In this demo, we explore and analyze the data related to the products and transactions, using predictive analytics solutions from SAP.

How does a major actor in consumer packaged goods operate 40,000 smart vending machines and still offer a personalized customer experience? In this demo, we explore and analyze the data to determine the best personalized product to offer each individual.

This demo shows you how to predict which telecom subscribers are likely to churn and why – using an intuitive, wizard-driven interface.

10- Even Marketers Can Use Predictive Analytics!See how you can integrate SAP’s predictive engine into other business applications, such as the SAP Customer Engagement Intelligence platform, making it easy for business users to consume predictive results easily and modern marketers to target audiences better.

Hadoop and Predictive Analytics are some of the most exciting technologies for businesses today but are often seen as having a steep learning curve. While both are complex, getting started is simple thanks to the Hortonworks Sandbox providing the database and SAP InfiniteInsight making predictive analytics intuitive for both data scientists and business users. In just 3 easy steps, you can setup your own Hadoop cluster and tackle real predictive use cases!

Once you’ve installed Virtualbox, open up the Hortonworks Sandbox .ova file and it’ll automatically load it into your interface. Hit ‘Start’ and you now have a fully functional Hadoop environment!

2. Connect

Next we simply set up our connection from Hadoop to SAP InfiniteInsight using an ODBC connection. Download and install the driver here: http://hortonworks.com/hdp/addons/.

After installation, open up your ODBC Administrator and under the System DSN tab, “Sample Hortonworks Hive DSN” is now available.

Configure it with the IP address from the startup screen of your Hadoop environment, with the remaining fields shown below.

Test the connection and you have now successfully added Hadoop as a data source for InfiniteInsight.

TIP: Your <ip address>:8888 will be your homepage for Hadoop in your browser for accessing Hive, HDFS, and more

3. Predict

Now that everything is set up, you’re ready to do predictive analytics! Open InfiniteInsight and we’ll ‘Create a Clustering Model’ based on the sample tables in Hadoop. Select the ‘Data Type’ as ‘Database’ and select “default”.sample_07 that shows various job titles with the number of total employees and salaries.

We now have a customer-facing website for suggesting product enhancements which is dedicated to SAP Predictive Analytics ( a.k.a. InfiniteInsight and Predictive Analysis) : https://ideas.sap.com/PredictiveAnalytics

Some users might have encountered a situation when a R script runs well in the R console or RStudio but throws an error in PA about not being able to find a R package that is already installed to R. For example, the screen shot below was an error I had. It is thrown out by PA custom R component when loading the rgl package

When you get this error, one potential problem could be that the R_LIBS environment variable was not successfully set. In general, R_LIBS need to points to all folders where your R package are installed. Use the .libPaths() function to check where your R libraries are installed

In my machine R packages are installed to two folders as shown above. One potential solution is to append all folders returned by .libPaths() to the system environment variable R_LIBS. Make sure to separate them by semicolon (;). Create R_LIBS if it does not exist.

Following on from Part1, I now want to show you how we can easily do some further analysis of the Market Basket Analysis (MBA) Output by using SAP HANA Studio, SAP Predictive Analysis (PA) and/or SAP Lumira.

The output of the Basket Analysis, Association Analysis (HANA Apriori algorithm) is very straightforward:

It shows the product purchased (PreRule), the secondary product(s) (PostRule), and then some calculated fields - Lift, Support and Confidence. Wikipedia can give you the details how each of these is calculated. When performing MBA, we can choose if we are looking for a match between 2 products or more than 2 by using Aprori or Apriori Lite respectively.

I have found that the lift is the most useful of these columns as it combines the support and confidence to give you an idea of how good the rule is. For example does it occur frequently and if one product is found how likely is it that the other product will be in the same basket.

First in SAP HANA studio we can easily see this output and build a simple Analytic View (AV) for us to consume in PA/Lumira.

The Analytical View allows us to derive a new field and then specify some metadata such as the measure columns.

I have found it useful to combine the PreRule and PostRule into a single field, so we will do this with a calculated column as below.

Now we do need to define the Aggregation for the measures, and there is not really any of them that are appropriate, but as we will also include the Rule or the PreRule and PostRule it will work fine with SUM.

We can now use Lumira / Predictive Analysis to create some further visualisations using this Analytical View.

See Below for a selection that I have created in just a few clicks.

I have written another blog looking, The SAP HANA Effect, looking at how this changes the business process and analysis of basket data This can be found in the SAP HANA and In-Memory forum. I also hope to compare the different HANA algorithms and see how the results and performance compare.

Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you'll find examples from the 80s and the early 90s. Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it's not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably. What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.

Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.

HANA AlgorithmsWith SAP HANA SP9 we now have 4 algorithms that can be focused on Association Analysis or Market Basket Analysis as it is more often known as.

Apriori

Apriori Lite

FP-Growth

KORD

I'm not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each. I plan to write a follow up blog that will aim to explore any differences between these.

What I have done is used SAP Predictive Analysis v1.21, with SAP HANA Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not. The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.

Normally with SAP HANA you may build an Analytical View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set. For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.

You can then feed this into SAP Predictive Analysis (PA), if you want to use the in-built HANA Predictive libraries you need to "Connect to SAP HANA"

Select your source data

Drag in the HANA Apriori node

Configure the node parameters, which are fairly self explanatory. Depending upon the basket size in your data-set you may need to set the support quite low.

I have then chosen to output the resulting rules to a new HANA table so that we can easily analyse the rules generated.

You can then run the analysis and receive the output in just a couple of minutes.

If you switch to the results view you then receive some pre-built analysis showing you the rules that have been generated

Predictive Analysis Association Chart - Tag Cloud

As you can see you the output is fairly readable, but you still may want to further analyse the rules generated to understand what this is telling you.

Predictive Analytics has recently seen a spike of excitement among many different business departments such as e.g. marketing or human resources who seek to better understand their customers or would like to look at how employees behave in their organization and improve the services offered to their clients. Unfortunately only very few business departments have access to Data Scientists and therefore often have only little experience in developing predictive models. This presents a real challenge since predictive analytics is fundamentally different from traditional reporting and without Data Science support you might find it hard to get started and feel confident in the results of your analyses. Luckily, SAP InfiniteInsight addresses this challenge directly and can be easily used by analysts since greatly reduces the complexity of data preparation and model estimation through a very high level of automation. This way you can focus on the business questions that matter and spend less time dealing with complicated IT solutions. This blog is geared towards analysts who want to understand how to get the most out of their data using SAP InfiniteInsight so here’s how you would get started with your predictive modeling initiative:

Step 0: Understand the predictive analytics process

Before actually getting started, you should familiarize yourself with the general idea behind predictive analytics and how it differs from traditional business intelligence (the folks over at Data Science Central have a nice summary). In short, when using predictive analytics we want to forecast the probability of a future event based on patterns that we find in historical data for said event: For example, to predict turnover (your target) we will need historical data on turnover along with a bunch of attributes that we can use to find relationships and patterns between the attribute and target variables. Once we have derived the historical relationship and built a valid model, we will use this model on new data to forecast turnover. The forecasted results can then be used to make various business decisions. Now, the actual flow may involve a few side steps (e.g. transforming your data so that it can be used) but in essence this is the high-level process that will be described here.

Step 1: Define your business objective

Whether it's wanting to predict which customer will buy your newly launched product or which employee might leave your company – you need to define what your business objective is and clarify how you want to measure it. This sounds trivial but can provide a real challenge since you need to have historical data available for your target outcome that is sufficiently accurate to derive a statistical model in a later step not to speak of having your target variable available in the first place.

While it’s certainly possible to “just play around” and see what happens (sometimes referred to as exploratory analysis), you will gain better results if you focus your efforts on a single business question from the very beginning. You will also find it easier to gain end-user acceptance if you know what challenge your users are facing and how your analysis can help them solve it.

Step 2: Find & connect to the data

Depending on your business objective, you will now need to find the data to base your model on. You don’t need to have a sophisticated concept in mind but you’ll need a general idea what kind of data you are looking for – with SAP InfiniteInsight there is one simple rule: The more variables you have, the better since SAP InfiniteInsight will determine automatically which variables should be removed and which variables add value to the model. Getting the data from an operational system like SuccessFactors Employee Central or SAP CRM can be slightly more difficult than from a Business Warehouse but the granularity of data available in a BW may not be sufficient for modeling: With operational systems the data usually has the right granularity but is frequently distributed across many different tables and often companies restrict direct table access to users from IT. Therefore you may face some challenges when trying to get the data from the tables directly. BW on the other often has a wealth of data, nicely packaged and preprocessed but you may run into the issue that while the data may have all the attributes that you’re looking for, the data may be too aggregated to be used.

The rule of thumb for data granularity is: You need historical data in the same granularity as the concept you want to predict, i.e. if you want to forecast turnover on employee level you need to have the historical data on employee level as well. The good news is that you can always fall back on using a simple flat file with your data in SAP InfiniteInsight so if push comes to shove you can simply ask your IT department to download some data as CSV in the needed format.

Step 3: Derive & interpret the model

Once you have the data, you want to find the best model that has the best tradeoff between describing your training data and predicting new, unknown data: SAP InfiniteInsight can automatically test hundreds of different models at the same time and choose the one that works best for your data and purpose. Hidden in the background, SAP InfiniteInsight also performs many tasks automatically that Data Scientists usually do with traditional tools to improve the quality of your data and the model performance such as missing value handling, binning, data re-encoding, model cross-validation, etc. This way you can simply point SAP InfiniteInsight to your bucket of data, define which variable to predict and ask the tool to work its magic. All you need to do then is interpret the results (see this blog post to see how you can interpret a model based on a example from HR ).

Step 4: Apply the model

Great – now you have a working model! Next you want to predict new stuff with your model – usually this “stuff” sits somewhere in a database. SAP InfiniteInsight can either directly apply the model to new data (e.g. data that sits somewhere in a table or a flat file) or it can export the model to a database to allow real-time scoring. The first option is more for ad-hoc scoring or further model validation purposes while the second option can be used to continuously score new data as it comes into the database – this way one could include the scored results in some other application or make the information available to other users. However, in the case of in-database scoring you will probably need some involvement from your IT department.

Step 5: Execute on your insights

One of the most important questions of any statistical analysis is: What do you do with the results? How can you reap the benefits of “knowing thefuture”? Having an idea about what is likely to happen is not enough – now your organizations need to adapt its behavior to either avoid the unpleasantoutcomes or gain the positive ones as predicted by the analysis. How this can be done depends heavily on your organization and the analysis context –possible next steps include

making the results/model available to a larger audience (e.g. HR Business Partners, marketing managers, etc.) by exporting it to a database to enable real-time application of the model,

including the scoring algorithm in a business application (e.g. an SAP system like SAP CRM),

developing a one-time action plan based on the results, or

designing a larger process to use the analysis results in each cycle of the business process to which it belongs.

Remember to include those employees who are crucial for a successful execution (e.g. usually your business end-users) early in the process and make surethey understand the results and how to leverage the insights. To be accepted, your analysis must be concise, clear, and trustworthy. Try to understand whereyour stakeholders (e.g. managers, business users, etc.) are coming from and how to communicate the results of the analysis effectively in their businesslanguage. A great analysis with great predictive power is only half the battle – whether your business will be able to profit from this will depend on your organization’s ability to close the loop to its operations.

Conclusion

At this point you may feel slightly overwhelmed at the sight of the different aspects that play a role when setting up a predictive analytics initiative. It is true – these things can get really complex but when using SAP InfiniteInsight they become much simpler compared to traditional tools due to the high level of automation. However, to get started quickly and get a feeling for the technology you don’t need to boil the ocean – you can easily take data that is already available to you and see what kind of relationships you can uncover (a trial for SAP InfiniteInsight is available here). You can use this blog post to see an example of how SAP InfiniteInsight can be used with HR data but the example and the steps described translate well to other business areas as well. Please feel free to leave any questions or comments!

Many HR departments are looking at predictive analytics as a hot new approach to improve their decision making and offer exciting new services to their business. Luckily, with SAP InfiniteInsight you don’t have to be a Data Scientist to find the valuable insights hidden in your data or build powerful predictive models. Combined with this, SuccessFactors Workforce Analytics provides clean, validated information bringing together disparate data from multiple systems into one place to enable decision making. Let’s see on a concrete example how you could use this combination to better understand your workforce and make predictions in areas that really matter to your business.

The Scenario

Meet John – he’s an HR analyst working for a large insurance company and responsible for supporting line of business managers with workforce insights. He’s been monitoring a concerning trend over the last year regarding the turnover of sales managers in the company’s regional offices – his turnover reports in Workforce Analytics have shown significant deviations from the tool’s industry benchmarks. Today, he has a call with Amelia, the global head of sales, to talk about headcount planning. John takes the opportunity to inform Amelia about his findings only to learn that Amelia has been made aware of this phenomenon a few weeks ago by a few of her direct reports: “You know, John – I’m fine with people leaving, a bit of turnover is healthy and keeps our business competitive but what I’ve been hearing is that we tend to lose the wrong people, namely mid-level sales managers with a great performance record. If an experienced sales employee leaves we take an immediate hit to our numbers so we naturally try very hard to keep them. Our salary is more than competitive and we offer great benefits so I have trouble imagining what could be the drivers behind this trend. Can you please investigate and let me know what I could do to reverse this development?”

The Data

John discusses his suspicions with some of the other analysts who have observed similar trends in other lines of business. Some of his colleagues hint that a lack of promotion or a general increase in the readiness to change jobs might have an influence on employees’ propensity to leave. So John decides to extend his analysis beyond sales and include other business functions as well. He prepares a dataset with all the employees in his company as of the end of his company’s last fiscal year (09/2013) and flags employees who have left the company voluntarily within the following 12 months (until 09/2014) to have a basis for his analysis. The dataset also contains a range of variable to assess their influence on turnover such as previous roles, demographics or performance. The 12 months period for tracking the employee will allow John to anticipate an employee at risk with sufficient lead time to give a manager the opportunity to react if required. Even though John has already some rough hypothesis what could drive turnover based on his reports in Workforce Analytics, he wants to keep the analysis broad to capture unexpected relationships as well.

The Analysis

John starts up SAP InfiniteInsight and decides to build a classification model to classify the employees in his dataset into those who would leave within the next 12 months and those who would still be with the company.

John connects to the SuccessFactors Workforce Analytics database and selects his dataset as a data source:

He clicks “Next” and instructs SAP InfiniteInsight to analyze the structure of his dataset by clicking on the “Analyze” button next.

John is happy with the suggest structure of the dataset – SAP InfiniteInsight has recognized all the fields in his dataset correctly and John doesn’t need to make any changes. He clicks “Next” to progress to the model definition screen:

John can use all the variables in his dataset except for the Employee ID since this field is perfectly correlated with the outcome John likes to model. Therefore he excludes Employee ID from the model definition. As target variable John uses the “Will leave within 12 months” flag from his dataset. This flag contains “Yes” for all employees who leave within 12 months and “No” for those who are still with the company. The analyst clicks “Next” to review the definition before executing the model generation:

Since John is no Data Scientist and doesn’t want to deal with manual optimization of the models, he uses SAP InfiniteInsight’s “Auto-selection” feature: When “Enable Auto-selection” is switched on (by default), SAP InfiniteInsight will generate multiple models with different combinations of the explanatory variables that John has selected in the previous screen. This way the tool optimizes the resulting model in regards to predictive power and model robustness (i.e. generalizability to unknown data). Simply put: When using this feature John will get the best model without having to deal with the details of the statistical estimation process. He now clicks “Generate” to start the model estimation process.

The Results

Eight seconds later, SAP InfiniteInsight presents John the results of the model training:

John reviews the results: His dataset had 19,115 records and 22 dimensions were selected for analysis. 9.02% of all employees inside the historical dataset (snapshot of 09/2013) left the company voluntarily between 10/2013 and 09/2014, i.e. within 12 months of the snapshot (=his target population), while 90.98% of employees were still employed. These descriptive results are in line with his turnover reports from Workforce Analytics.

John now looks at the model performance (highlighted in red) and sees that the best model that SAP InfiniteInsight has chosen has very good Predictive Power (KI = 0.8368 , on a scale from 0 to 1 with 1 being a perfect model) as well as extremely high robustness (Prediction Confidence: KR = 0.9870, on a scale from 0 to 1). Also, from the 22 variables John had originally selected, the best model only needs 16 variables: The remaining six variables didn’t offer enough value and have therefore been automatically discarded. Based on the model’s KI and KR values John concludes that not only does the model perform very well on his dataset – it also can be applied to new data without losing its predictive power. He is very happy with the results and clicks “Next” to progress to the detailed model debriefing.

John decides to look at the model’s gain chart to understand how much value his model offers for classifying flight risk employees compared to picking employees at random (i.e. not using any model at all). So he selects “Model Graphs”…

The graph compares the effectiveness of John’s model (blue line) at identifying flight risk employees with picking employees at random (red line) as well as having perfect knowledge of who would be leaving (green line). Since the model’s gain (blue line) is very close to the perfect model (green line) John concludes that there is probably only very little that could be done to further improve the model since it is already very close to perfection (for more information on how to read gain charts see here). The analyst decides it’s worth looking at the individual model components to understand which variables drive employee turnover. He clicks on “Previous” and selects “Contribution by Variables” on the “Using the Model” screen.

John looks at the chart and can see that the top three variables contributing to voluntary turnover are “JobLevelChangeType”, “Current Functional Area” and “Change in Performance Rating”. He decides to look at them in more detail by double-clicking on the bar representing each variable.

The most important variable is “JobLevelChangeType” which describes how an employee got into his or her current position: The higher the bar, the greater the likelihood to leave within the next 12 months. John sees directly that being an external hire or having been demoted contributes significantly to turnover. He isn’t surprised to see “demotion” as a strong driver since his company had only three years before begun using this approach to make the organization more permeable in both directions and this has seen some resistance by employees. Based on the data, it seems that having been demoted drastically reduced employee retention.

Also, external hires seem to rather leave the company as opposed to looking at better opportunities within the company and John makes a note about this – he wants to discuss this with Amelia since he currently doesn’t see why external hires would behave this way.

Next, John looks at “Current Functional Area”:

John immediately sees his suspicions confirmed: Working in sales contributed significantly to employee turnover – and this by a wide margin! He continues to the third variable “Change in Performance Rating”:

The pattern John had observed in the first two variables continues – seeing one’s performance level decrease drove employees away while improving oneself helped the company retain employees. The company has introduced a stack ranking system where performance levels were always evaluated in relation to an employee’s peers to encourage grow and competition – especially in the sales department. However, as a consequence many employees see their performance decrease (12.8% of employees have experienced this during the period) while there may not necessarily be something wrong with an employee’s absolute performance: A previously high performing employee may see his or her performance rating decrease while delivering the same results simply because he/she is part of a high performing team where some of the other team members had a better year. The results of the model hint at an unintended side-effect of this system – instead of putting up with decreasing performance ratings and training harder, the company’s employees tend to quit their jobs and try their luck elsewhere. John finds this interesting and plans to discuss this with Amelia to understand whether these effects were welcome in her department.

John looks at the remaining 13 variables to understand the other drivers better. He observes a strong influence of tenure on turnover levels (especially among mid-level employees with tenure between 5 and 9 years) or not having had a promotion within the last three years. There also seem to be differences across countries, regions and demographic variables such as age or gender. The patterns that John sees in the model paint the picture that the company has indeed a problem keeping experienced employees, especially in the sales department – and the culprit seems to be new stack ranking performance evaluation scheme John’s company had implemented three years ago in an attempt to foster a more competitive and performance oriented company culture. This is supported by the data from the countries – those few countries where the stack ranking system hadn’t been implemented yet have significantly lower turnover. The story that emerges is one of an experienced, well-performing employee who is confronted with the new performance evaluation scheme, sees his or her performance ratings drop with pressures on the rise and then decides to leave.

John assembles the information into a presentation for his HR top management to address the topic. After having had a follow-up discussion with Amelia who confirmed his conclusions, he is convinced that the stack ranking system is not tuned to the volatile sales business and serves as a driver of turnover. In preparation of the meeting John decides to apply his model on current data to identify those employees from the sales department who are currently at risk of leaving.

The Prediction

John refreshes his dataset based on the most current data. Using the model’s confusion matrix John chooses a high sensitivity level to predict potential leavers. The confusion matrix compares the model's performance in classifying employees into leavers and non-leavers (=”predicted yes” / “predicted no”) against the actual, historical data (=”true yes” / “true no”). This way John can understand how well the model performs at classifying individual employees into leavers and non-leavers – every model makes mistakes but good models make fewer mistakes than bad models and the confusion matrix tells John which categories the model confuses with one another compared to the actual outcomes (hence the name “confusion matrix” – more info here).

Using this model on the list of sales reps should give John a list of employees of which statistically 56.72% (the model’s sensitivity score) would actually leave the company within the next 12 months. John applies the model on his new dataset:

After applying the model, John looks at the resulting list: Out of 2,120 employees, his model has identified 473 employees at risk out of which he knows about 57% will actually leave within the next year (although he doesn’t know who exactly will be leaving). Since some of these employees perform better than others and are therefore more important to be retained, John filters the list of flight risk employees to only include experienced, well performing sales reps and ends up with a shortlist of 215 employees. From these employees’ sales data in Workforce Analytics he calculates that losing 57% of then could cost the company up to $60M in lost sales. Also, at estimated recruiting and training costs of a new sales manager of 150,000$+ this analysis could save the company up to 215 x 57% x $150,000 + $60M in lost sales = $78.3M.

John discusses the list of 215 employees with Amelia and they decide to go to the HR Leadership Team meeting together to address the urgency of finding appropriate measures to retain these employees. Amelia and the HR Leadership Team are very impressed with John’s work and, faced with the huge impact of doing nothing, decide to free up some budget for appropriate retention measures while at the same time initiating a discussion whether to get rid of the stack ranking evaluation system to reverse the trend…

...and how are YOUR employees?

Employee retention is an important topic with a big impact on a company’s bottom line. Seeing how simple it is to use SAP InfiniteInsight maybe you’d like to try out a similar analysis yourself? A trial version of SAP InfiniteInsight is available here: