Allow writing to output directories that exist, as long as they are empty

Details

Description

The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.

Can do. One question I have is that there exists a org.apache.hadoop.mapred.TestFileOutputFormat but not a corresponding org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat. Should I copy over the existing test to the new location? Also, should I make my source change in both the old mapred as well as the newer mapreduce files?

Ian Nowland
added a comment - 12/Jun/09 00:22 Can do. One question I have is that there exists a org.apache.hadoop.mapred.TestFileOutputFormat but not a corresponding org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat. Should I copy over the existing test to the new location? Also, should I make my source change in both the old mapred as well as the newer mapreduce files?

Tom White
added a comment - 24/Aug/09 14:11 I don't see why we wouldn't make this change to both old and new APIs.
There is a precedent for having the same test for the old and new APIs (e.g. the one for LazyOutput), so yes, I would create a new org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat.