Series: Hadoop Streaming

[This entry is part 4 of 4 in the series Hadoop Streaming] MapReduce with Hadoop Streaming in bash – Bonus! To conclude my three part series on writing MapReduce jobs with shell script for use with Hadoop Streaming, I’ve decided to throw together a video tutorial on running the jobs we’ve

[This entry is part 3 of 4 in the series Hadoop Streaming] In our first MapReduce with Hadoop Streaming in bash article, we took a collection of Stephen Crane poems and used a MapReduce job to calculate ‘term frequency’–meaning we counted the number of times each word in the collection appeared

[This entry is part 2 of 4 in the series Hadoop Streaming] In MapReduce with Hadoop Streaming in bash – Part 1 we found the ‘term frequency’ of words within a collection of documents. For the documents I chose 8 Stephen Crane poems, and our bash Map and Reduce jobs tokenized

[This entry is part 1 of 4 in the series Hadoop Streaming] So to commemorate my recent certification and because my Java absolutely sucks, I decided to do a common algorithm using Hadoop Streaming. Hadoop Streaming Hadoop Streaming allows you to write MapReduce code in any language that can process stdin