Urbanization, Transit & Data: Can SAP #HANA help?

October 31st, 2011, the world’s population passed the 7billion mark. Earlier, in 2010 for the first time ever, approximately half of the world’s population lived in an urban dwelling and by 2030, estimates are that 60% of humanity will be living in urban areas … which will further increase to 70% by mid-century!

This creates unique Challenges (and opportunities).

Globally, the smart ‘cities’ (replace ‘cities’ with’ transit’, ‘energy’, ‘travel’, practically any conceivable thing that you want to make better), and you will know that there in-fact are a lot of innovations happening around the idea of internet of things, big data, analytics, and a host of intelligent solutions to address current and future challenges in a smart data driven approach.

From transit and travel perspective, which is the core of how an urban population functions – a very close synonym being the ‘circulatory system of the human body’; one major challenge is congestion and traffic that is part of the daily commute of urbanites. And with this increasing, a lot of wasted time and energy that could have been better utilized had it been spent doing productive work.

Such is the challenge that a mega city in China proposed for SAP to help analyze the historic, real-time, and predictive traffic trends of what to expect in the city. For this challenge, the spatial data (location related) that is updated every 15-30sec from more than 8,000 taxi vehicles, with an addition of run data (data related to actual fare related transactions as they happen) was used.

With the SAP HANA platform, capable of applying advanced computing logic on data that resides in in-memory, the challenge was met and is still in a rapid iterative process of making it even better.

In-fact, there were multiple use cases on how to interpret such ‘big data’, and use it in an operational context to improve congestion from the cities and transport operators perspective, and ultimately to the traveler in providing a better service.

There were a number of unique insights as to the real congestion and traffic patterns in the city that were inferred that helped in supporting the decision making of the city in servicing the citizens.

Three use cases:

The zone:

Traditionally, zones in a city – be it from a fare management in a public transit sense or for other municipality driven purposes have relied on census data in its various forms.

Here, with data analyzed on the fly – SAP HANA was able to ‘zone’ the city in a unique data driven approach. Zoning based on the origin-destination data of the constantly updated ‘feed’ has provided unique insights as to how a modern city’s travel pattern could lead to a new approach of zoning. We term this as the Traffic Analysis Zone (TAZ). In addition to the basic building block of TAZ, we can augment this with other data sets such as commuting and travel behavior, fare prices, road conditions and accessibility, etc… to further even classify Zones in different criteria.

Below, traditional trips based zoning i.e. total aggregate number of Origin/destination (left screen capture) Vs. the other two, algorithmic ‘pattern’ based zoning, the middle one being a clustering which is based on the behavior of trip distances that traveler’s take from the origin and the one on the right being the behavior pattern [ i.e.- to which destination do the travelers mostly head to (in much the same vain as a social network analysis (‘who of my friends in my connections do interact with each other’), OR in the same vain as a shopping cart behavior on online shopping (‘people who bought these products also bought these products as well’)]

From a visualization perspective, multiple views, either directly embedded on the GIS data OR SAP UI5 based views like Origin-destination mapping, Chord connections, heat maps, time series visualization, etc… to provide various ways to decipher the big data problem in an easy to consume fashion to support decision making were developed.

With such insight at hand, a Traffic Analysis Zone could help in efficiently distributing resources to tend to the daily traffic needs of the traveler and optimize resources in a more data driven and precise manner. The gains from cost and efficiency perspective as well as the implications both from a planning perspective (long and short term planning) and the daily operations of fleet are huge.

One visualization is the below real-time view of occupied taxi’s (yellow dots), and taxi’s looking for a fare (Gray taxis) captured between the hours of 12:03 and 12:05 on a single day.

A ‘real-time view’ for the company

Also developed are additional views for the operators from a operations, planning and dispatching perspective for the vehicles in operations with run data (data related to actual fare of the taxi’s in circulation), in conjunction with the spatial data as above video, to provide an aggregate level of the behavior of thefleet and its performance – in a 24×7 matrix and related Time series chart, with capabilities to drill down on a granular level.

The operator will have a powerful analytical tool set to learn about the performance of the fleet from a historic and real-time perspective and re-allocate in real-time resources as the need arises.

Identifier’s from a video camera capture:

Here, operational applications do record the video feeds of traffic data, and extract related raw data of the captured vehicles unique identifiers, in this case license plates of vehicles.

This raw data input, with related time stamp and various other information were loaded to SAP HANA. With this raw input, SAP HANA was used to do a pattern recognition based on machine learning approach. Some patterns were readily attributable and easily correlated to events such as weather, time of day, etc…

But more interesting, the non-readily decipherable patterns, one example, a captured data from the same road, but different lanes with similar model cameras did provide different patterns, which lead to further identifying root causes of differences in such patterns.

With further analysis, various sources of these discrepancies were identified. Some related to the cameras providing faulty/non usable footage, others related to operational state of the cameras, and even more, not related to the cameras, but actual blockage of vision of site of the cameras by obstructions such as trees, etc…

Armed with this patterns, one of the solutions that SAP provided was to generate an auto fault discovery pattern of symptoms that are not easily decipherable observations, but were a valuable input in the proactive management of these assets; In essence, an input for a planned maintenance regime for the operational cameras in the entire network.

Another major one, which was one of the main initial drivers of the project, was the actual identification and analytics of the license plates. With multiple feeds coming in, it was possible to detect and correct the observed plates. Here, a fault discovery of recorded raw data (if license plate was actually the right one) was made by correlating multiple feeds of the same car from different cameras. From the detected and corrected feeds, a 99% fault discovery and a 94.5% fault correction was recorded. Here this detection accuracy was on the actual feed data that came in to HANA. (There are various limitations as to the operational systems being able to detect the raw data from a streaming video live).

With more enriched and clean data, the power of data analysis on SAP HANA is limitless.

Now, another angle to this is for a passenger transit or a public transit company.

When compared to the use case above, such a company would have a more structured timetable (schedule) route and deployment of fleet. The use cases when applied to public transit company could provide even more insights, especially in terms of current and future zoning, commuter trends, travel patterns, fault detection and discovery as well as multiple other use cases.

The richer the data set, the more SAP HANA can handle with unique answers.

To see this in more detail and have an in-depth understanding of innovations in progress and how they apply in managing challenges and turn them to opportunities in passenger travel and transit – do join us on April 2nd and 3rd for The International SAP Conference for Travel & Transportation @Sinsheim, Germany.