Installing Local Data Lake on Ubuntu Server : Part 1

Advertisement

In previous guides, we have covered some important basic installation and setup guide for the major known Big Data softwares. Here is Part 1 of Installing Local Data Lake on Ubuntu Server With Hadoop, Spark, Thriftserver, Jupyter etc To Build a Prediction System. We suggest to use servers from VPSDime as they cost very low – $7 per month for 6GM RAM. We talked about some limitations of OpenVZ virtualization. VPSDime is great for test setups unless you are breaking their rules. 12GB is minimum need of RAM. Our older guides went towards analysis of data like log files as one path. Prediction software system is another path. We will use Ubuntu server as most user can use.

I can not give warranty about the version number related typo. At worst, currently WordPress has been just bad with hundreds of funky features and configs becomes odd if wrongly switched from Text to Visual.

Installing Local Data Lake on Ubuntu Server : What is Data Lake?

Data lake is a method of storing data within a system to facilitate the collocation of data in various schemata and structural forms for various tasks like reporting, visualization, analytics and machine learning. Apache Hadoop distributed file system itself is example data lake.