Smart Data Analytics for Smart Cities

Smart Data Analytics for Smart Cities

Smart cities are urban regions seeking to utilize innovative information and communication technologies (ITC) in architectural planning and design, creative and cultural industries, and concepts of social and environmental sustainability, in order to address economic, spatial, social and ecological problems facing cities today. On a daily basis, smart cities accumulate large masses of data from a variety of sources such as transportation, network, energy sector, smart homes, tax records, surveys, Light Detection and Ranging sensor data (LiDAR), mobile phones sensors, etc. The data from these various sources is then accumulated, linked, and connected via the Internet of Things (IoT).

With the development of computers and smartphones, citizens interactions with firms are becoming more connected affecting other firms and public bodies in urban living. As a result, firms change the way they design and configure future products and services. Therefore, new business models, i.e., how firms do business through their value propositions, value creation and value capture, will play a key role in shaping smart cities of the future. Yet, very little is known about new business models within smart cities and urban living and how these models can be supported by the urban data.

How can cities use, understand, and, most importantly, analyse data generated by citizens, businesses, and municipalities to ensure more optimal fulfilment of citizens wants and needs, to reach sustainable and green society, and to ensure economic growth?

Recent advances in Big Data and Connected Data analytics provide useful insights into how city data can be analysed and then used to generate value for municipalities, companies, and individuals offering a variety of tools which range from relatively simple data mining and clustering algorithms to more sophisticated analytics which takes into account the context in which individuals are making decisions in cities.

In my experience, one of the major challenges for analysing urban datasets is data volume. Recent estimates show that by the end of 2017, annual global traffic of data will reach 7.7 Zettabytes. Under these conditions, it is not clear how analytics will be able to cope with such a volume. This challenge is of grave importance for many municipalities and organizations which are often forced to delete large quantities of potentially valuable information. Among other sources, customer loyalty card information, video data from security cameras, etc. all this data is barely used for analytics and often deleted within weeks if not days after collection. Yet, clustering models for large datasets offer some hope for solving this problem.

The idea behind clustering urban data is to split it into similar (homogeneous) groups in accordance with specific attributes. Clustering large masses of urban data in a compact format allows analysts to present information of the entire dataset (without omissions or deletions) but reorganises the data and makes it manageable.

Together with Ahmad Al Shami and Weisi Guo from Warwick Institute for the Science of Cities, as well as Waseem Zakir from Nottingham Trent University we have recently tested two clustering techniques appropriate for large urban datasets: the K-Means technique and the Fuzzy c-Mean (FCM) technique. K-Means divides the data into clusters by minimizing the distance between data points and the centroid of the cluster (usually calculated as a mean of all points in the cluster). FCM is similar to K-Means but assumes that it is possible that an object/data point may belong to more than one cluster according to its degree of membership. We used LiDAR sensor data from the area of the University of Warwick in Coventry (UK) to compare the performance of K-Means vs FCM. Clustering was conducted on a rather standard AMD 8320, 4.1 GHz, 8 core processor with 8 GB of RAM and running a 64-bit Windows 8.1 OS. As a result of this test, we found that FCM clustering works faster on larger datasets than K-Means. However, K-Means requires less processing power than FCM.

This means that clustering techniques can be very helpful in dealing with large masses of urban data. In fact, clustering empowers smart cities to conduct smart analytics (analytics which does not require omitting or deleting valuable information) on data generated by citizens, firms, and cities themselves in order to create an optimal system which would allow citizens wants and needs to be accurately anticipated as well as help firms and cities to develop and deliver anticipated goods and services to citizens.

The time and further tests will tell which clustering technique is best. Based on some early tests, if you have significant limitations to your processing power, K-Means clustering is a good solution but be prepared to wait for the results. If processing power is not a big issue, FCM will work better for large urban datasets and provide a faster solution.