FoneDoktor, A WibiData Application

This guest blog post is from Alex Loddengaard, creator of FoneDoktor, an Android app that monitors phone usage and recommends performance and battery life improvements. FoneDoktor uses WibiData, a data platform built on Apache HBase from Clouderas Distribution including Apache Hadoop, to store and analyze Android usage data. In this post, Alex will discuss FoneDoktors implementation and discuss why WibiData was a good data solution. A version of this post originally appeared at the WibiData blog.

At last months Hadoop World, one of the sessions spotlighted FoneDoktor, an Android app that collects data about device performance and app resource usage to offer personalized battery and performance improvement recommendations directly to users. In this post, Ill talk about how I used WibiData  a system built on Apache HBase from CDH  as FoneDoktors primary data storage, access, and analysis system.

WibiData is an integrated system for managing, analyzing and serving complex user data in support of investigative and operational analytic workloads. It leverages HBase to combine batch analysis and real time access within the same system, and integrates with existing BI, reporting and analysis tools. Having used Hadoop for over four years now, I was insanely impressed with the simplicity that WibiData brings to apps that need to store, access, and analyze massive amounts of user data. Read on for how I used it to build FoneDoktor.

What is FoneDoktor?

FoneDoktor is an Android app that monitors phone usage and recommends usage improvements to better Android phone performance and battery life. Usage information such as average screen brightness, average signal strength, wifi connectivity, power cycles, and more is collected throughout the day and sent to a WibiData cluster. Data is only sent when the phone is connected to a power source, to avoid using battery to send data upstream.

Once FoneDoktor has been running for a few weeks on a phone, it starts analyzing the usage data and makes recommendations in the form of Android push notifications. A notification might suggest that you should turn on auto screen brightness, or start using wifi when your signal is low, if available of course. FoneDoktor has several more notification types, too.

In this case this record is saying that the screen was at full brightness (255) for 477 seconds, connected to power, with wifi and 3g on, and a signal strength of 7. This particular record has a unique device ID, which is what’s used as the WibiData key.

On any given day, FoneDoktor will collect about 100 records from each phone. These records are stored in a WibiData column. Each WibiData column stores a specific type of record. For example, one column exists for WiFi-specific records, another column for screen brightness records, etc. Since WibiData timestamps each record as it’s stored, every record is accessible by its key (device ID), column (record type), and timestamp (when the record was created). WibiData also makes it easy to scan both rows and values (by timestamp) in a particular column.

Data Storage and Access with WibiData

The write path (outlined in an architecture diagram in the conclusion section below) for a record starts at the phone. The record is cached on the phone if its not connected to power. Then, once the phone is connected to power, the record is sent upstream as JSON to a web server implemented in Python/Django. The web server creates an Avro record and sends a Thrift RPC to the WibiData data access server, which writes the record into WibiData.

FoneDoktors read path is as straightforward as its write path. The phone periodically queries the web server to see if any new notifications or summary data is available. The web server fires off a Thrift RPC to the WibiData data access server, which queries WibiData and returns an Avro record and is serialized into JSON in Python/Django before being sent to FoneDoktor.

WibiData is implemented on top of HBase, which means clients dont need to worry about caches or indexes when reading and writing data. WibiData scales out of the box.

User Analysis in WibiData

Without WibiData, MapReduce would power FoneDoktors data analysis over a real-time storage system such as HBase. With WibiData, the analysis APIs are very obvious and far more simple than MapReduce. FoneDoktor has two different types of analysis. First, some analysis only looks at a single phone  for example to do battery calculation, summary information about usage, etc. This type of analysis is done in WibiData with producers. All other forms of analysis are done on the entire data set, looking at how all phones are used and creating correlations between usage and performance. This type of analysis is done by gatherers.

Producers

The producer API is dead simple. You specify which tables and columns your data comes from, where output will be written to in WibiData, and a method for processing a single row at a time. WibiData handles buffering, reading from HBase, writing back to HBase, and everything else. No MapReduce. No input/output complexity.

In FoneDoktors case, a producer might look at the set of screen brightness records for a given phone and output an aggregate screen brightness average, which can be used for further analysis later.

Gatherers

The gatherer API is slightly more complex than the producer API, but its still far more simple than a traditional MapReduce job. Just like the producer API, you specify which data you want to read in, and where output should go. You then write a method for processing individual rows, where the output data of this method is used as input data in a reducer. The reducer is not a traditional MapReduce reducer, but it works very similar to one. It takes aggregated keys and their respective lists of values and outputs data to a WibiData cell. Again, theres no need for ETL and complex input/output strategies.

In FoneDoktors case, a gatherer might look at which devices perform worse than others, and dig into the usage data to learn why. For example, it may find that two users with the same device have drastically different battery life. It will then look at the usage differences and make suggestions to each respective user for improving their battery life.

Conclusions – Why WibiData

Almost every use of Hadoop requires a sibling real-time data storage and access solution (OLTP  online transaction processing) for serving a website, OLAP dashboard, mobile app, or any other real-time piece of software. In practice Hadoop works alongside data solutions such as MySQL, Vertica, Oracle, Teradata, HBase, and lots of other alternatives to power these real-time applications. Hadoop does batch, background processing and these other systems serve and store the real-time data. Without WibiData, FoneDoktors architecture would look something like this:

Architecture Without WibiData

Hadoop, during the four years Ive worked with it, has started to play much more nicely with sibling real-time data storage solutions, especially with Clouderas partnerships and connectors. With WibiData, the process of using Hadoop alongside a real-time solution is easier than ever because it provides one solution for serving, storing and analyzing data. Analysis, just like in Hadoop, is done in batch, in the background. Serving and storage, like HBase, is done in real-time, at massive scale.

The difference is WibiData comes with great abstractions to make using it much more simple than the alternative. Furthermore, WibiData comes with lots of built-in libraries to do most of the needed analysis work for you for complex machine learning and data mining work. With WibiData, FoneDoktors architecture looks like this:

Architecture With WibiData

Building FoneDoktor was lots of fun  it was my first Android app and my first usage of WibiData. Im very impressed with WibiData for the reasons Ive stated here. Android was great, too.

Use WibiData if youre tired of moving user data from system to system, to either analyze it or store and access it. WibiData just works, it scales, and most importantly, its simple in an otherwise complex technology stack.