After digging more into this with @hashutosh's help, we see the following issues:

1. The hadoop archive command line has changed.
2. There is no way in the current set of commands supported by hive for a user to specify a parent directory for the archive.
3. The api createHadoopArchive in all shims is the same which is counter-intuitive.

The hadoop archive command has changed between versions 0.20 and 0.20S/1.0/0.23. There is a compulsory command line parameter -p that is required in the latter versions. Since these versions are driving the same command line as 0.20 (without the -p), they fail. This needs to be fixed in the createHadoopArchive api.

The createHadoopArchive has the issue that it checks hive.archive.har.parentdir.settable. The user, in the current set of commands available, has no way of setting a parent directory for the creation of the archive. So, in the future when that ability is added, we need to revisit the createHadoopArchive api itself or derive it from conf.

The createHadoopArchive api is the same across all the shims, i.e. Hadoop20Shims.java and the HadoopShimsSecure.java have the exact same implementation of this api which is counter-intuitive considering the shims are supposed to be specific for versions of hadoop.

So, I propose at this time, we should fix the createHadoopArchive in the HadoopShimsSecure to adhere to the new command line expected by those versions of Hadoop. We should also fix the Hadoop20Shims api to not worry about the -p parameter since it cannot use it.

Vikram Dixit K
added a comment - 25/Jul/12 20:36 After digging more into this with @hashutosh's help, we see the following issues:
1. The hadoop archive command line has changed.
2. There is no way in the current set of commands supported by hive for a user to specify a parent directory for the archive.
3. The api createHadoopArchive in all shims is the same which is counter-intuitive.
The hadoop archive command has changed between versions 0.20 and 0.20S/1.0/0.23. There is a compulsory command line parameter -p that is required in the latter versions. Since these versions are driving the same command line as 0.20 (without the -p), they fail. This needs to be fixed in the createHadoopArchive api.
The createHadoopArchive has the issue that it checks hive.archive.har.parentdir.settable. The user, in the current set of commands available, has no way of setting a parent directory for the creation of the archive. So, in the future when that ability is added, we need to revisit the createHadoopArchive api itself or derive it from conf.
The createHadoopArchive api is the same across all the shims, i.e. Hadoop20Shims.java and the HadoopShimsSecure.java have the exact same implementation of this api which is counter-intuitive considering the shims are supposed to be specific for versions of hadoop.
So, I propose at this time, we should fix the createHadoopArchive in the HadoopShimsSecure to adhere to the new command line expected by those versions of Hadoop. We should also fix the Hadoop20Shims api to not worry about the -p parameter since it cannot use it.
Please let me know if I am missing something.

Due to this archive_multi.q (and others) fail for 0.23. The error for archive_multi.q is below. It looks to because HarFileSystem (potentially through makeQualified) will add en extra : to the authority and this extra colon trips up HiveFileFormatUtils.getPartitionDescFromPathRecursively. See "cannot find dir = har://pfile-localhost:/" below versions the paths in pathToPartitionInfo "har://pfile-localhost/".