Activity

Koji, what is the ___ command? A copy, and you omitted the source for brevity? Does one of the variables in the target reference _temporary? I'll probably need to investigate further after your clarification. Thanks.

Daryn Sharp
added a comment - 27/Mar/12 14:57 Koji, what is the ___ command? A copy, and you omitted the source for brevity? Does one of the variables in the target reference _temporary? I'll probably need to investigate further after your clarification. Thanks.

Koji Noguchi
added a comment - 27/Mar/12 01:26 Todd, Daryn,
When users create their own outputs inside the streaming script, we ask them to use _temporary.
"hadoop dfs -___ $
{mapred_work_output_dir}
/mycutom_path-$
{mapred_tip_id}
" or something like that.
I do not want to see _temporary left out when jobs finish successfully.

Yep - something that could be set system-wide in core-site.xml. When users upgrade, they expect they may have to tweak some confs for the new version, but it's harder to ask them to change all of their shell scripts.

In either case it will be a "change now or change later" scenario

Right. The idea is that they would have some warning (a full major version) before their code stops working. Our general policy is to only make the breaking change after having the deprecated support for a full major version – in which case it would go away in 0.24.0.

Would this bring back the issue of left out _temporary dirs? (MAPREDUCE-1272)

I would think the MR task would be using the new non-deprecated API which doesn't recursively create parents.

Todd Lipcon
added a comment - 26/Mar/12 19:56 Just to confirm: you mean a real conf key, not a cmdline flag, right?
Yep - something that could be set system-wide in core-site.xml. When users upgrade, they expect they may have to tweak some confs for the new version, but it's harder to ask them to change all of their shell scripts.
In either case it will be a "change now or change later" scenario
Right. The idea is that they would have some warning (a full major version) before their code stops working. Our general policy is to only make the breaking change after having the deprecated support for a full major version – in which case it would go away in 0.24.0.
Would this bring back the issue of left out _temporary dirs? ( MAPREDUCE-1272 )
I would think the MR task would be using the new non-deprecated API which doesn't recursively create parents.

Koji Noguchi
added a comment - 26/Mar/12 19:30 I'd be in favor of a deprecated config flag which restores the old behavior
Would this bring back the issue of left out _temporary dirs? ( MAPREDUCE-1272 )

Just to confirm: you mean a real conf key, not a cmdline flag, right? In either case it will be a "change now or change later" scenario. I'm curious when would you anticipate a good time to remove a new & deprecated conf? Would it live all the way through 23? As Eli pointed out, an alpha may be the best time to remove the behavior. Thoughts?

Daryn Sharp
added a comment - 26/Mar/12 19:30 Just to confirm: you mean a real conf key, not a cmdline flag, right? In either case it will be a "change now or change later" scenario. I'm curious when would you anticipate a good time to remove a new & deprecated conf? Would it live all the way through 23? As Eli pointed out, an alpha may be the best time to remove the behavior. Thoughts?

If it's not too huge a pain, I'd be in favor of a deprecated config flag which restores the old behavior (while emitting a warning that it's deprecated and to be removed in a future version). This will help people migrate to 0.23, since I'm sure there are lots of cases where people have shell scripts running as part of production workflows.

Todd Lipcon
added a comment - 26/Mar/12 19:17 If it's not too huge a pain, I'd be in favor of a deprecated config flag which restores the old behavior (while emitting a warning that it's deprecated and to be removed in a future version). This will help people migrate to 0.23, since I'm sure there are lots of cases where people have shell scripts running as part of production workflows.

I like the new behavior as well, and I suppose an alpha is the best time to introduce it. How about updating this jira with a release note then? We'll file a Pig to update the tests to work against 23.

Eli Collins
added a comment - 26/Mar/12 19:03 I like the new behavior as well, and I suppose an alpha is the best time to introduce it. How about updating this jira with a release note then? We'll file a Pig to update the tests to work against 23.

@Eli: I believe the auto-create was removed with the earlier redesign but crept back in later so I didn't view this particular jira as incompatible. Since the goal has been to mimic the real shell, this non-posix "feature" was intentionally (again?) removed to solve this jira.

It also resolves the inconsistency between single source vs. multiple sources copies. Single source copy would formerly auto-create non-existent dirs, but a multi-source copy fails if the target isn't an existing dir. Ergo, copying a glob to a non-existent path was indeterminate depending on the number of glob matches: 1 match auto-created, more than 1 match failed.

Internally, we found the impact to be minimal (a single mkdir), so we favored retaining the posix semantics of copy. Some relied on the auto-create because mkdir failed if the directory already exists, thus HADOOP-8175 added a -p flag.

If you feel strongly on restoring the former non-posix behavior and single/multi source inconsistency, I can look into doing so.

Daryn Sharp
added a comment - 26/Mar/12 15:33 @Eli: I believe the auto-create was removed with the earlier redesign but crept back in later so I didn't view this particular jira as incompatible. Since the goal has been to mimic the real shell, this non-posix "feature" was intentionally (again?) removed to solve this jira.
It also resolves the inconsistency between single source vs. multiple sources copies. Single source copy would formerly auto-create non-existent dirs, but a multi-source copy fails if the target isn't an existing dir. Ergo, copying a glob to a non-existent path was indeterminate depending on the number of glob matches: 1 match auto-created, more than 1 match failed.
Internally, we found the impact to be minimal (a single mkdir), so we favored retaining the posix semantics of copy. Some relied on the auto-create because mkdir failed if the directory already exists, thus HADOOP-8175 added a -p flag.
If you feel strongly on restoring the former non-posix behavior and single/multi source inconsistency, I can look into doing so.

Eli Collins
added a comment - 24/Mar/12 01:36 Hey Daryn,
Should we restore the old behavior for 23? This is going to break people, eg scripts that
do hadoop fs -put file1 dir1/file1 w/o dir1 already created used to work and won't anymore.
Thanks,
Eli

Mark this as an incompatible change as the code throws an exception now but it doesn't previously (when parent directory doesn't exist). The unit tests and the e2e tests of Pig will be broken after using a Hadoop with this patch.

Bilung Lee
added a comment - 21/Mar/12 23:43 Mark this as an incompatible change as the code throws an exception now but it doesn't previously (when parent directory doesn't exist). The unit tests and the e2e tests of Pig will be broken after using a Hadoop with this patch.

Hadoop QA
added a comment - 02/Mar/12 17:30 +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12516856/HADOOP-8131.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 7 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/659//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/659//console
This message is automatically generated.

Hadoop QA
added a comment - 02/Mar/12 16:06 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12516848/HADOOP-8131.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.fs.TestFsShellReturnCode
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/658//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/658//console
This message is automatically generated.

Hadoop QA
added a comment - 02/Mar/12 15:47 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12516846/HADOOP-8131.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.fs.viewfs.TestViewFsTrash
org.apache.hadoop.fs.TestFsShellReturnCode
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/657//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/657//console
This message is automatically generated.

Removed unnecessary commented code in the test. FsShell dumps to stdout/stderr so I had to print to stdout to properly interlace the output. That's also what other FsShell tests do.

The equals stuff is checking the basename of the path to catch ., .., dir/, dir/., dir/.. since Path will eat the chars off when it normalizes and/or fully qualifies. That makes it "look" like a non-existent file. The shell tracks the exact path supplied by the user, so it checks that basename for "it's a dir" conditions.

Daryn Sharp
added a comment - 02/Mar/12 15:44 Removed unnecessary commented code in the test. FsShell dumps to stdout/stderr so I had to print to stdout to properly interlace the output. That's also what other FsShell tests do.
The equals stuff is checking the basename of the path to catch ., .., dir/, dir/., dir/.. since Path will eat the chars off when it normalizes and/or fully qualifies. That makes it "look" like a non-existent file. The shell tracks the exact path supplied by the user, so it checks that basename for "it's a dir" conditions.

From a quick look at the code it mostly looks good. I don't like the name.equals in the checking for a parent, but like the comment says there really is no other way I can think of to know if it is a directory or not, especially a directory that does not exist.

I also noticed some commented out code in the tests, and some calls to system.out. Please remove the commented out code, and verify that call of the calls to System.out are what you want. It might be better to change them to be log statements.

Robert Joseph Evans
added a comment - 02/Mar/12 15:33 From a quick look at the code it mostly looks good. I don't like the name.equals in the checking for a parent, but like the comment says there really is no other way I can think of to know if it is a directory or not, especially a directory that does not exist.
I also noticed some commented out code in the tests, and some calls to system.out. Please remove the commented out code, and verify that call of the calls to System.out are what you want. It might be better to change them to be log statements.
I will look at the logic more closely now.

When using a 1-arg put, the shell default to using the pwd. If the target doesn't exist, the shell assumes it's a rename of the src file. So the temp file "$p.COPYING" becomes "..COPYING". The move at the end of the copy tries to rename the temp file to ".", which is interpreted as the pwd, so the rename is a no-op.

Daryn Sharp
added a comment - 02/Mar/12 00:45 When using a 1-arg put, the shell default to using the pwd. If the target doesn't exist, the shell assumes it's a rename of the src file. So the temp file "$p. COPYING " becomes ".. COPYING ". The move at the end of the copy tries to rename the temp file to ".", which is interpreted as the pwd, so the rename is a no-op.