As an entrepreneur, product designer and product manager for people management and predictive analytics software, I have seen a number of problems related to creating useful products, and getting things done. I decided to keep track of some common scenarios. All views are mine. Not my employers'.

Sunday, November 30, 2014

This year, I have a lot to be thankful for. We celebrated family dinner with shallots sambar, plantain poriyal and mango yogurt pachadi. Dishes were served with basmati rice. All three dishes are quintessentially South Indian. I cooked all the dishes and made sure no corners were cut. It turned out to be a nice dinner.

Image courtesy blendwithspices.com

I believe that successful cooking has a lot in common with successful project execution. I have been cooking for over 25 years now and learned that when I followed certain steps, the dishes turned out to be a success. These are the steps.

1. Always follow the recipe of an accomplished chef or cook diligently. Don't improvise on that recipe. Just because you are a good product manager does not mean you know anything about cooking.

2. Plan your dishes when you shop. Not after you enter the kitchen.

3. Do not compromise on the ingredients. Grow them in your back yard if you have to. I do this. I grow curry leaves and lemons in my back yard for cooking and pickling.

4. Don't experiment when you are cooking for others.

5. Focus on a few dishes you are good at and perfect them. It is better to be very good at a few things than to be mediocre at many things.

If you are interested in South Indian cooking you must have the cook book by S. Meenaksi Ammal. Her book is considered the bible of South Indian cooking. I also like the cook book by Chandra Padmanabhan. The dishes I cooked today were all based on her recipes. I have owned both books for several years.

You might be wondering if there is a restaurant in your area where you can sample these dishes. Try the Saravan Bhavan chain of restaurants. They have restaurants around the world in many countries. They may not be the most authentic. But they are close enough.

I am currently doing a course on getting and cleaning data. R programming language is the tool I use to clean data. I decided to look at the SAP SuccessFactors Add-On downloads data just for fun. These are the steps involved. You could do the same with Excel, But R lets you handle large volumes of data and perform actions that are not very easily possible with Excel. This is a simple exercise. But it demonstrates the logical process any data scientist or analyst would follow.

The Question for which I need an answer
The most important thing in data analysis is the question you want answered. In my case, I want to see which countries have the most interest in the SAP SuccessFactors Add-On. I suspect US and Germany will be at the top. But I have no idea which other countries show the most interest. Let us find out.

These are the steps and the R code to perform functions

Step 1: Read the data file which is available to me as a CSV file.downloads <- read.csv("downloads.csv")
This reads the raw data I got from the data base into a data frame called downloads.

Step 2: Select just the data I need. The new dataframe downloads will have multiple rows with the same country names. I want to find out how many times each country name is listed. I can use the table function to find that out.countrycount <- as.data.frame(table(downloads$Country.Name))The above code creates a dataframe with the name of countries in one column and the number of times they occur in the original table in a second column names Freq,Step 3: I then want to sort the data with the country with the most downloads on the top.This is the R code to do that.sort(countrycount$Freq, decreasing = TRUE)
Step 4 : The last step is to write the data to a CSV file so that I can share the data with other people and systems. This is the code for that.

write.csv(countrycount, "countrycount.csv")Now let us look at the data.
Customers from the US downloaded the Add-On the most. No surprise there. We have thousands of SAP ERP HCM customers in the US. The second is Germany. No surprise there either. Third is Australia. Fourth is Saudi Arabia. That is good to know. There is a lot of interest in Australia and Saudi Arabia. There is a lot of interest from many European customers for the Talent Hybrid model. This information gives me and my product management colleagues enough insight to make some data driven decisions. Of course I have a lot more data than this and can find answers to many more such questions.

The next step could be to visualize this data to convey the information quickly. Many tools including R can do that. I will get to it in the future.

Monday, November 17, 2014

The book talks about data driven ways of looking at everyday decisions. For example, kicking a penalty shot straight at the goal keeper has a higher percentage of success compared to kicking towards the corners of the goal post.

They also encourage you to think like a child and try to do think small while solving big problems instead of thinking big. Here is a video of the authors talking about their book.

Friday, November 14, 2014

I made an update to the current and planned integrations between SAP and SuccessFactors Talent Solutions. This is for the Talent Hybrid Deployment model. The overview presentation is now available in internal and partner Jam groups.

Sunday, November 09, 2014

A few days back David Ludlow, the head of cloud solutions marketing at SAP and I were looking at the content for a particular product area in our Jam group. I noticed David looking at the number of times a solution marketing asset has been viewed rather than focusing on the number of assets produced for a topic. He is right. It is better to have a few assets that are very valuable than having numerous assets that are not used much. A collaboration tool such as SAP Jam can help you with that. SAP Jam has changed that way we measure value in the team. Before the days of SAP we used to measure the value we create based on the number of assets produced and word of mouth feedback. We now measure value based on evidence provided by Jam.

Saturday, November 08, 2014

Let us say you use the services of a cloud provider who gives you data in XML format via a web site and you want to periodically look at that data and extract some information to make your decisions. The file is big. So loading it in Excel and manipulating the data is cumbersome and error prone. You can spend a lot of money to build a special software for it or you can write some simple R code to extract the data yourself. This is how you can do it.

Let us assume that the file I am working with is the master data file of 100,000 employees. At any given point of time I want to find out how many employees live in a certain zip code.

Step 1 is to load the web address of the XML file in a vector.fileURL < - "http://www.website.com/filename.xml"Step 2 is to load all the content of the XML file in another vector.documentcontent <- xmlTreeParse(fileURL, userInternal=TRUE)

Step 3 is to parse the root node of the XML content and store it in another vector.rootNode <- xmlRoot(documentcontent)

Step 4 is to extract all zip codes into a vector.allzipcodes <- xpathSApply(rootNode, "//zipcode", xmlValue)

Step 5 is to count the number of people who have the zip code "90210".sum(allzipcodes == "90210")In 5 simple steps you have performed meaninful data extraction from XML data, which normally requires very sophisticated and costly tools.To perform data extraction like this, you will need some basic understanding of XML and some logical thinking. If you are a cloud professional services or an SAP ERP HCM functional consultant, I believe you can perform basic data extraction like the one I described below using R, with a little bit of effort .

Friday, November 07, 2014

Most data sharing in organizations is done using Microsoft Excel. So the code to import Excel data into R for manipulation is a good thing to know. The import function is powerful enough to let you read a specific section of an Excel file and load it into a dataframe..

Here is the sample code.

mydata <- read.xlsx("file.xlsx",sheetIndex=1,colIndex=7:15, rowIndex=18:23)
The sample code reads an Excel file and imports data from columns 7 to 15 and rows 18 to 23 into a dataframe. An R dataframe is a table of data.

Thursday, November 06, 2014

The most basic steps in getting and cleaning data are like this. I am using a data file that has US housing data. I want to analyze that data the same way a web site such as Zillow might analyze that data.

1. First, you have to fetch the data into R. The code for that might look like this. Here you are reading a csv file from your working directory and loading it into a dataframe called housing.

housing <- read.csv("us-housing-data.csv", stringsAsFactors = FALSE)The sample code above shows how to fetch data from a CSV file in a local directory. Similarly there are functions to fetch data from an XML file, an EXCEL file, a JSON file, and an HTML web page. Once you understand the fundamentals of fetching data, it is only a matter of knowing the function.

2. Second, you may want to remove some rows where data is missing for a particular column. So you create another dataframe which only has the rows where column 37 has some valid data.

cleanhousingdata <- housing[complete.cases(housing[,37]),]

3. In the third step you may want to filter that column for a certain condition. In this case, I am looking for homes that are valued at more than 1 million USD. VAL is the name of the column.

costlyhouses <- subset(cleanhousingdata, VAL >1000000)
Once you do these basic steps, you can start looking for answers to you questions in the data. Coming up with questions is another interesting area.

Wednesday, November 05, 2014

One of the important steps in getting insights from data is getting the data itself and clearning the raw data to turn it into processed data. Data resides in public websites, APIs, databases, local files and handwritten documents. Fetching this data requires tools and techniques that are not commonly talked about by enterprise data analysis product teams. Enterprise analytics product experts talk about data analysis, which is just one step in the data science process. There are some tools that exclusively focus on cleaning data. But a vast majority of them just focus on just visualization.

I am currently taking a four week course on "Getting and Cleaning Data" taught by professors at the Bloomberg School of Health in John Hopkins University delivered via Coursera. This is the third course in the data science track. The tool I use to get and clean data is 'R'. You might wonder why do we need to use a programming tool such a 'R' when you can use Microsoft Excel. The main difference is the flexibility and the power of 'R' to handle large volumes of data, clean the data and manipulate the data in an efficient and repeatable manner. There is a good post explaining the differences between Excel and 'R'.

Going through these courses and completing the associated exercises and quizzes have given me a conceptual understanding of the tools and practical experience for getting, cleaning and manipulating data. It is as much craft as it is science. I am still working with sample data provided by the professors. But I plan to start fetching my own data in the coming months. For example, you can analyze housing market data that is available from public sources to find good real estate investment opportunities and get insight that may not be readily available to others.

My colleagues in product management have created a customer community for APIs and Integration. This is apart from the current internal Jam groups and partner Jam groups, which are not open to customers. Registration is required to access the community.

Please note that help.sap.com/cloud4hr is the best place for official help and implementation documentation.

Saturday, November 01, 2014

Recently, I went to the Books Inc book store in downtown Mountain View and picked up a book named "Show Your Work" by Austin Kleon. He gives ten ideas for those who want to share your work and get discovered. For example, he suggests that you share something small every day. His ideas will work for those who want to share with the outside world and within their work environment.

He also has another book called "Steal Like An Artist", which is also a wonderful book.