Hadoop Ecosystem

Course video 20 of 33

The fourth module is entitled to "Spatial DBMS and Big Data Systems", which covers two disciplines related to spatial data science, and will make learners understand how to use DBMS and Big Data Systems to manage spatial data and spatial big data. This module is composed of six lectures. The first two lectures will cover DBMS and Spatial DBMS, and the rest of the lectures will cover Big Data Systems. The first lecture "Database Management System (DBMS)" will introduce powerful functionalities of DBMS and related features, and limitations of conventional Relational DBMS for spatial data. The second lecture "Spatial DBMS" focuses on the difference of spatial DBMS from conventional DBMS, and new features to manage spatial data. The third lecture will give learners a brief overview of Big Data Systems and the current paradigm - MapReduce. The fourth lecture will cover Hadoop MapReduce, Hadoop Distributed File System (HDFS), Hadoop YARN, as an implementation of MapReduce paradigm, and also will present the first example of spatial big data processing using Hadoop MapReduce. The fifth lecture will introduce Hadoop ecosystem and show how to utilize Hadoop tools such as Hive, Pig, Sqoop, and HBase for spatial big data processing. The last lecture "Spatial Big Data System" will introduce two Hadoop tools for spatial big data - Spatial Hadoop and GIS Tools for Hadoop, and review their pros and cons for spatial big data management and processing.

Spatial (map) is considered as a core infrastructure of modern IT world, which is substantiated by business transactions of major IT companies such as Apple, Google, Microsoft, Amazon, Intel, and Uber, and even motor companies such as Audi, BMW, and Mercedes. Consequently, they are bound to hire more and more spatial data scientists. Based on such business trend, this course is designed to present a firm understanding of spatial data science to the learners, who would have a basic knowledge of data science and data analysis, and eventually to make their expertise differentiated from other nominal data scientists and data analysts. Additionally, this course could make learners realize the value of spatial big data and the power of open source software's to deal with spatial data science problems.
This course will start with defining spatial data science and answering why spatial is special from three different perspectives - business, technology, and data in the first week. In the second week, four disciplines related to spatial data science - GIS, DBMS, Data Analytics, and Big Data Systems, and the related open source software's - QGIS, PostgreSQL, PostGIS, R, and Hadoop tools are introduced together. During the third, fourth, and fifth weeks, you will learn the four disciplines one by one from the principle to applications. In the final week, five real world problems and the corresponding solutions are presented with step-by-step procedures in environment of open source software's.