ENGAGE

Improving the Customer Experience Through Open Source Analytics

For years, best practices in customer support centers meant arming representatives with historical data about the person on the other end of the phone, so they could propose the right solutions or offers. Now customers are far more likely to interact in real time over the web or social channels like Twitter and Facebook (News - Alert). In fact, Gartner predicts that by 2020, customers will manage 85 percent of their relationship with the enterprise without interacting with a human.

With employees increasingly removed from direct customer engagement, enterprises need more effective ways to create a personalized, meaningful experience online. The solution lies in applying analytics to the web and mobile applications that are driving real-time interactions with customers.

While this may seem like a daunting task, most organizations already have a vast amount of information about customers and their behaviors at their fingertips. It may be in applications, emails, databases, logs, and even social networks. The trick is extracting the right data and taking advantage of the information. This means combining insights from the past with data about the present to predict customer behavior in the future.

Significantly, open source software has been at the center of innovation for the three forms of analytics that are instrumental in enabling meaningful online interactions. This has effectively democratized access to state-of-the-art analytics, empowering organizations of all sizes to employ these technologies in optimizing their online customer experiences. The rest of this article looks at the three categories of analytics: batch, real-time, and predictive; how they support online interactions; and the open source software increasingly being used to power them.

Batch Analytics: Making the Most of Historical Information

Most companies are most familiar with batch analytics, which provide a historical view of the customer and other relevant information. Enterprises typically process high volumes of data in groups of transactions, which are collected over a period of time and then processed in batch form. Periodically intelligence will be pulled from the data, and displayed in a dashboard or report.

A good example is when a hotel manager views a report or dashboard that shows guest complaints have increased 10 percent over the last quarter. Armed with this historical information, the manager may choose to drill down into details about changes in service, staffing, or amenities to understand the drivers behind this change and take corrective action.

Another typical scenario is to look into the customer base to understand patterns. For instance, a hotel or an airline may analyze and find the customers who return with predictable patterns. Then, by digging into data on those customers, they can provide a yardstick into understanding where the hotel or airline has done well or has fallen short of expectations.

For years, enterprises generally relied on proprietary databases and data warehouses to handle their batch data. However, in recent years, more companies have turned to Apache Hadoop, the open source software designed for distributed processing of very large data sets. Because it can handle both structured and unstructured data, it is well suited for the range of online data types that enterprises must capture and analyze. There are many products on the market based on Apache Hadoop, but they vary in scalability. Ideally, the solution implemented in the enterprise will offer multithreading to fully harness the processing power that four-core and eight-core servers offer in handling large data volumes.

Streaming Analytics for Real-Time Decisions

Advertisers and marketers were pioneers in using the analysis of event streams to make decisions. As more companies employ real-time analytics, they are tasked with correlating vast amounts of data streaming from various sources, such as mobile devices or the Internet of Things. Increasingly businesses are turning to complex event processing technology to provide these correlations. CEP is a form of event processing that combines data from multiple sources to detect patterns and attempt to identify either opportunities or threats, providing the ability to identify significant events and respond quickly.

The key concept is to enable the system to understand a user’s situation (i.e., context) and acting accordingly. Let’s say, for example, that Ann has lost her luggage while traveling from Chicago to New York. When she goes to the baggage carousel, her mobile app can provide her location to a CEP system. Then, based on data collected from the airline luggage tag (News - Alert) and her phone, the system can alert Ann that her luggage will be late and ask her to talk to an agent. The agent, in turn, can file the problem ticket with one click, since the airline already has all the information.

Later, when Ann calls back for an update, the airline can simply identify her by her phone number. This is done using a CEP system, which will correlate her phone number, look at her account for any open problem tickets, and then respond to her directly with the status of her luggage. When she calls, the system can state, “Your luggage is now at JFK and will be delivered tomorrow at 7 a.m. If you have any other questions, please hold.”

The idea is that the system understands customers’ situations and provides prompts without them having to offer explanations. Similarly, if we can pay attention to the context around the customer, often there is enough information to improve his or her experience. For instance, if he or she has an itinerary and calls a few hours before the flight, he or she might need to make a change. Alternatively, if the flight has left, he or she may have missed the flight and need assistance getting onto a later one.

To be useful, this context has to be provided almost in real time, where the system connects diverse information, derives the context, and uses that to trigger the best action for that individual, whether it’s correcting an error, providing offers, or issuing a fraud alert if certain anomalies are detected

A number of open source technologies have emerged to address the demands of streaming and real-time analytics. Apache Storm is perhaps the most widely used streaming analytics engine. Meanwhile Apache Spark and Apache Flink each offer a single programming model for handling both streaming analytics and real-time event processing, as does the cloud-based Google (News - Alert) Cloud DataFlow. Additionally, there are a range of CEP platforms that support the ability to detect complex temporal queries.

While many CEP platforms support time series capture of streaming events, some enterprises are turning to time series databases, or TSDBs, which require timestamps on all data and are capable of writing data within milliseconds. Examples of open source TSDBs include OpenTSDB, InfluxDB, and Google KairosDB, and they are typically used in conjunction with SQL or NoSQL databases.

Predictive Analytics for Anticipating Future Behaviors

Predictive analytics complement both batch and real-time analytics by going beyond the obvious connections to analyze information and uncover deeper, non-trivial associations among the data. These might suggest a high likelihood of interest in a product, indicate the probability of customer churn, detect an anomaly, or suggest the best resolution to a problem for a given consumer.

Already pervasive on the web, nearly everyone has experienced predictive analytics in the form of those ubiquitous pop-up ads that rely on insights from Google’s search engine to target consumers’ recent online searches and purchases.

Increasingly, machine-learning algorithms are used to support the automation of predictive analytics. They can handle extremely large volumes of data, and they can automatically learn from the data, unlike rules-only systems that require professionals to watch rules and evaluate their performance. As more customers interact with organizations online, this kind of automation is becoming increasingly popular.

Despite growing adoption, there are pluses and minuses to machine learning for predictive analytics. On the one hand, it reduces the cost of predictive analytics and helps to ensure that both customers and organizations are happy. However, no algorithm is perfect, and occasionally it will run into an edge condition, for instance the automated phone system that repeatedly misunderstands your interactions. The good news is that predictive analytics also can be used to monitor interactions, provide oversight, and escalate responses when the customers are perceived to be unhappy. So there can be another system that monitors and escalates the interactions of an automated system with the customer as needed.

Several open source frameworks for machine learning have emerged in recent years. These include Apache Spark MLlib and Dato GraphLab Create. Among these, Spark MLlib, the highly scalable Apache Spark machine-learning library, has the largest community and continues to see rapid adoption. Meanwhile Dato GraphLab Create, which is written in C, provides fast performance and scalability. Two other machine-learning algorithms, Facebook Torch and Google TensorFlow, come from a deep-learning heritage, and at least today they seem better suited to web applications.

The market is flooded with products. However, those based on open source standards often provide new architectural approaches and flexibility backed by community support. Enterprises also need to decide from a range of pure-play and integrated solutions. Pure-play products may offer deeper feature sets important to certain users, but it is important to remember that analytics do not work in isolation. Therefore, enterprises with more mainstream needs may benefit from an integrated solution that helps speed time to market and enables them to focus on the customer interactions critical to success.

With a majority of customer interactions taking place online, it is no longer enough to rely on past information to guess what a customer will do. It is becoming critical to augment historical data with real-time and batch analytics. In doing so, enterprises can arm themselves and their consumer-facing applications with the right business intelligence to provide customers the meaningful experiences they expect.