Technology

Search This Blog

7 reasons why we should use Hadoop Streaming with Unix

Hadoop Streaminng is a utility that comes with hadoop and it allows you to use any executable program for bigdata analysis.We can use languages like Java,Python,PHP ,Scala ,Perl and many more .It also supports Unix commands and Shell scripts.I was using hadoop streaming with unix or shell script extensively and I enjoyed it for several reasons.I would like to share benefits of hadoop streaming with unix.1. AvailabilityIf you want to use a tool/technology other than mapreduce,you may be running for hive or pig .if you go for Hive or Pig you have to install and manage them separately (if you use vendor hadoop you will get it by default) .otherwise you could use hadoop streaming with Unix which you need not install it separately .2. LearningYou need not learn new tool /technology like Hive or Pig if you do not have serious requirement.You can leverage your Unix skills for data analysis on hadoop .3. Less development timefor developing java Mapreduce applications,you have to compile your code ,unit test it,package it,export jar file and run it finally.unlike Java Mapreduce you can quickly develop streaming applications with Unix by directly writing mapper and reducer code in mapper and reducer options4. Quick conversionAs I said It has less development time, we can quickly convert data from one format to another.I heavily used it for converting data from text to sequence file and sequence file to text .we can use inputformat and outputformat options in hadoop streaming for the same.5. Testing datafor the same reason, as I said, it has less development time we can quickly test the input data and output data by using hadoop streaming with Unix or shell script6. Simple business requirementfor simple business requirement , we can always use Hadoop streaming with Unix .like for simple filtering operations and simple aggregation operations.7. Performancefinally ,I read it somewhere hadoop streaming with Unix has better performance over mapreduce ,Hive and Pig.I personally not tested it though.So try to use hadoop streaming with Unix if you have any above requirements.for more details on how to use hadoop streaming with unix ,read it .Happy Hadooping Friends.