Re: Python Streaming

The bottom of the stack trace says "Caused by: java.io.IOException: Cannot run program "/tmp/mapper.py": error=2, No such file or directory". How many nodes are there in this cluster? Have you copied the mapper.py program to all the nodes? It also needs to be executable (chmod 755 mapper.py)

I didn't copy the scripts to all nodes, as I expect which is normal that -files option shall copy them to the HDFS where it is by default reachable via all nodes. I am sure it is executable, I even made it 777.

Re: Python Streaming

I renamed my mapper and reducer to jpm.py and jpr.py to make sure my spelling is right. The reducer part of the "cat" doesn't work unless it's preceeded by "python". Then it completes successfully.

In hadoop map-reduce, from the command line, I've gotten the process to complete, but it yields no results. I reduced the reducer functionality to just pass on what comes from the mapper. It completes, but doesn't yield any results in the output (file size = 0). I removed the reducer completely and I get what I expect from the mapper.

I'd like to progress to the gui's and get a taste of pig and hive in cloudera by the end of the month. I think I'm going to try all over again with a fresh vm.

Re: Python Streaming

I have similar problem, my python code is working fine when I am running it locally using cat command, but it is not working fine when I am running it on Hadoop. Please find below my code, error, command to run the program and permission on my files: