This is Part 3 of a 4-part series. Be sure to check out Part 1 and Part 2 first!

According to Gartner, smart cities will be using about 1.39 billion connected cars, IoT sensors, and devices by 2020. The analysis of location and behavior patterns within cities will allow optimization of traffic, better planning decisions, and smarter advertising. One of the 10 major areas in which big data is currently being used to excellent advantage is in improving cities. For example, the analysis of GPS car data can allow cities to optimize traffic flows based on real-time traffic information. Telecom companies are using mobile phone location data to provide insights by identifying and predicting the location activity trends and patterns of a population in a large metropolitan area. The application of machine learning to geolocation data is being used in telecom, travel, marketing, and manufacturing to identify patterns and trends, for services such as recommendations, anomaly detection, and fraud.

This is the third in a series of blogs discussing the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered in order to predict and visualize the most popular Uber locations.

Handling huge amounts of real-time data puts high demands on application architecture. Uber and others have moved from a monolithic to an event-driven microservices architecture because they needed to scale. In this post, we will go over the implementation of a real-time web application using Vert.x, a toolkit for building reactive event-driven microservices.

The first part of this series discusses creating a machine learning model using the Apache Spark K-means algorithm to cluster Uber data by location.

Clustering algorithms group items into categories by analyzing similarities between input examples and discovering groupings that occur in collections of data. Clustering algorithms can be used for:

Customer segmentation.

Finding trends and detecting anomalies.

Grouping search results or similar articles.

The K-means algorithm groups observations into K clusters in which each observation belongs to the cluster with the nearest mean from its cluster center. Below, the output of the model cluster centers, returned from the analysis of the Uber data (with K=10) are displayed on a Google map:

The second post discusses using the saved K-means model with streaming data to do real-time analysis of where and when Uber cars are clustered.

This third post discusses building a real–time dashboard to visualize the cluster data on a Google map. The following figure depicts the data pipeline:

Uber trip data is published to a MapR Streams topic using the Kafka API.

A Spark streaming application, subscribed to the first topic, enriches the event with the cluster location and publishes the results in JSON format to another topic.

A Vert.x web application, subscribed to the second topic, displays the Uber trip clusters in a heat map.

The Vert.x Toolkit and Web Application Architecture

The Vert.x toolkit is event-driven, using an event bus to distribute events to work handler services called verticles. Vert.x, similar to Node.js, employs a non-blocking model with a single threaded event-loop to handle work. The Vert.x SockJS event bus bridge allows web applications to communicate bi-directionally with the Vert.x event bus using Websockets, which allows you to build real-time web applications with server push functionality.

Looking into more detail at the Uber dashboard application architecture:

A Vert.x Kafka client verticle consumes messages from the MapR Streams topic and publishes the messages on a Vert.x event bus.

A JavaScript browser client subscribes to the Vert.x event bus using SockJS and displays the Uber trip locations on a Google Heatmap.

The Dashboard Vert.x Service

In the Vert.x service code snippet below, we:

Create a vertx instance, which provides access to the Vert.x core API.

Create a Router object, which routes HTTP request URLs to handlers.

Create a BridgeOptions object and specify that messages with the address “dashboard” should pass through the event bus bridge.

Route paths that match /eventbus/* to be associated with an event bus bridge SockJSHandler, which extends the server-side Vert.x event bus into client side JavaScript.

Create an HttpServer object, an HTTP server implementation.

Tell the server to listen on the configured port for incoming requests.

In the code snippet below, messages are consumed from the MapR Streams Uber topic and published to the Vert.x event bus address “dashboard.” Messages will be delivered to all handlers subscribed to this address.

The Dashboard Vert.x HTML5 JavaScript Client

The client uses a Google Maps Heatmap Layer to visually depict the intensity of the Uber trip cluster locations on a Manhattan Google map. With the Google Heatmap, areas of higher intensity will be colored red, and areas of lower intensity will appear green. The dashboard app uses Google Maps markers to mark cluster centers.

This example is all in a simple index.html for learning purposes. The necessary JavaScript for Vert.x, SockJS, jQuery, and Google Maps is shown below; note that for Google Maps, you will need your own key.

Creating the Map

For the map to display on the web page, we first reserve a spot for it by creating a named div element with div id="map". Then, in the initMap function, which is called when the page is loaded, we create a Google Maps instance, specifying a reference to the div element via the document.getElementById() method. Next, we create a HeatmapLayer object with empty geographic data in the form of an array. Later, we will update this data with geographic locations from the server.

Creating the Event Bus

Below, we create an instance of the vertx.EventBus object, specifying the URI location to connect. Then, we add an onopenlistener, which registers an event bus handler for the address “dashboard.” This handler will receive all messages published to the “dashboard” address.

The messages received from the server application are in JSON format and contain the following for each trip location: the cluster center id, datetime, latitude, and longitude for the trip, base for the trip, and latitude and longitude for the cluster center. An example is shown below: