Spark, YARN, and Hadoop

[Instructor] Once our HDFS is operational,…we can start YARN on top of it.…Though let me correct something…in our directory structure first,…if you've been following along,…you currently have two folders under the Root directory,…the User,…and the Hduser.…What we need to do is create another Hduser folder…under the User folder.…Let's do that now by using the command…H-D-F-S space D-F-S space dash…M-K-D-I-R space slash user slash hduser…Now, we need to copy some files from the local directory…into a new directory called Input…because later on…we'll be running MapReduce which will need some Input files.…

Author

Released

8/30/2018

The explosion of data in recent years has made the field of data science—in which professionals work to glean insights from this abundant information—increasingly more vital. If you're looking to pursue a career or to work with experts in this rapidly-growing field, it's crucial that you familiarize yourself with the tools of the trade. In this course, instructor Jungwoo Ryoo helps to acquaint you with some of the most well-known data science tools in the areas of cloud computing, distributed file storage, distributed processing, and machine learning. Throughout this course, Jungwoo provides coverage of Proxmox, Hadoop, Spark, and Weka, discussing how to install and leverage each tool in your data science workflow. To wrap up, he explains how Hadoop, Spark, and Weka can work collaboratively to produce the best results.