Smart City Data Analysis: Pushing Data into the Cloud

Cloud computing can help cities use big data and analytics to analyze and gain intelligence from data. In the previous smart cities posts, we mentioned how cities can acquire data from different sources, such as IoT devices, sensors, mobile applications, and interactions with citizens (read the past blogs here and here). This data, after it has fulfilled its primary purpose, might be given a new life as a source for analytics to influence further projects.

Think about the potential years (if not centuries) of data stored in city archives. From demographics and schools to transportation and GIS data, this data can be combined with real-time data and machine learning to ascertain information about the cities in order to make a meaningful impact.

How can cities easily push data into the cloud to be analyzed? While analyzing massive amounts of quickly changing data is not always an easy job, a lot of the complexity can be resolved by using AWS managed services:

Amazon Kinesis makes it easy to load and analyze near real-time streaming data, such as fraud detection, inventory alerts in a critical care unit or a blood bank. This opens up the possibility of detecting device failure in a traffic system or a medical unit to provide a corrective response action.

With Amazon Simple Storage Service (Amazon S3), you can host massive data sets in a cost-effective way. S3 can be the target storage for a process that involves scanning incoming documents that can be processed to feed the analytic engine or be stored in a common relational database.

Another option is to take advantage of Amazon Elastic Block Store (EBS), which provides persistent block-level storage volumes with different possibilities in terms of IOPS, or Elastic File System, which is useful as a NFS-v4 compatible shared storage that grows and shrinks automatically as you add and remove files.

With Amazon Elastic MapReduce (Amazon EMR), you can quickly launch clusters of Hadoop in minutes, resize them, and terminate them when their analysis is completed. Or, you can decide to keep the cluster available to continuously process data when it arrives.

As an example scenario to analyze massive amount of documents, we can imagine a document data extraction flow as depicted below:

The flexibility of tools and services is important for data analytics in order to gain insights about information, hidden patterns, and eventually refine your algorithms until you find the answers you were looking for. This gives cities of all sizes the ability to perform their big data analysis for real-time streaming data, historical analysis, or a mixed approach without the burden of an expensive investment.

This technology can be widely used in many smart city applications, such as: