Monday, 31 March 2014

There are a lot of nice videos in YouTube from tops for kids to machine learning. Some of these videos are so interesting, feel like viewing them again and again. When you find this pattern, it's better to download the videos. Not only does this allow for offline view, but also save the bandwidth. Bandwidth cap makes this even more useful.

`youtube-dl` is a very useful command to download files from YouTube in Ubuntu. `youtube-dl`has got a lot of nice options, here are some of the options I use

youtube-dl -c -t -f 5 --batch-file=files.txt

-c -> resume partially downloaded file-t -> Use the title of the video in the file name used to download the video-f -> Specify the video format (quality) in which to download the video.--batch-file -> Specify the name of a file containing URLs of videos to download from youtube in batch mode. The file must contain one URL per line.

Setting up a Hadoop cluster is all easy with a bit of familiarity with system and network administration. It's all interesting, the only frustrating thing is the downloading of the patches after the installation of the OS and the downloading of the packages for the softwares on top of OS. The downloads can go to all the way close to a GB also, which might take a couple of minutes to hours based on the internet bandwidth.

Here is where caching tools really help. They will cache the downloaded packages to one of the designated local machine (lets call it the cache server) and the other machines can point to the cache server to get the packages. This way the packages are downloaded from the internet for the first time and from then on the local cache server will be used for getting the packages. This approach will not only save the network bandwidth, but will also make the whole installation process faster.

For debian systems, apt-cacher-ng is designed to cache the packages and is really easy to install and configure. Here are the steps involved:

Friday, 14 March 2014

Step –> 1: Download and InstallDownload the Hive from the Apache Download Mirror and i place it
in /home/bigdata/Installations/ directory.

$ cd
/home/bigdata/Installation

$ wget
http://redrockdigimark.com/apachemirror/hive/stable/apache-hive-1.2.1-bin.tar.gz ( i preferred to download hive-1.2.1.tar.gz,
as it is stable version)

$ sudo tar xzf
hive-1.2.1.tar.gz

Step –> 2:After downloading and installation. Now we are moving to
edit hive-env.sh file for Configuration. To configure hive, there I
have installed and give permission to bigdata.In $HIVE_HOME/conf/hive-env.shexport
JAVA_HOME=/opt/jdk1.80_10