Compression is irrelevant with yarn.If you want to store files with compression, you should compress the filewhen they were load to HDFS.The files on HDFS were compressed according to the parameter"io.compression.codecs" which was set in core-site.xml.If you want to specific a novel compression format, you need to set "STOREDAS INPUTFORMAT" to the corresponding class which act as the role ofcompression such as "com.hadoop.mapred.DeprecatedLzoTextInputFormat".1, you should compress each file in the dir rather than the whole dir.2, consider the compression ratio, bzip2 > gzip > lzo, however, thedecompression speed is just the opposite order. So we need balance. gzip ispopular one as far as I know.3, without need.4, Yes, and the process is transparent to users.2013/10/16 xeon <[EMAIL PROTECTED]>

> Hi,>>> I want execute the wordcount in yarn with compression enabled with a dir> with several files, but for that I must compress the input.>> dir1/file1.txt> dir1/file2.txt> dir1/file3.txt> dir1/file4.txt> dir1/file5.txt>> 1 - Should I compress the whole dir or each file in the dir?>> 2 - Should I use gzip or bzip2?>> 3 - Do I need to setup any yarn configuration file?>> 4 - when the job is running, the files are decompressed before running the> mappers and compressed again after reducers executed?>> --> Thanks,>>

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext