I am exicted to announce that the feature development will soon becompleted. Please see the jira for the design and the details of thesubtasks. This is a heads up about the merge vote mail that will soon besent.

Next steps, before calling for merge vote, we need to get the followingdone:- Add user documentation that describes the feature, and how to use it- Complete some of the pending tasks- Continue testing the feature and fix any bugs that might come up- Update the design document

Thanks to everyone who has participated in design and development of thisfeature. Please review the work and help in testing the feature.

I'm very excited to see that this project is nearing completion. I've beenfollowing the development pretty closely and am very much looking forwardto getting this merged to trunk.

One thing that I do think we should address before the merge is moving theprogrammatic APIs for working with snapshots. I've brought this up before,and was told that it would be done in a separate JIRA, but I don't thinkthat JIRA was ever filed.

As it stands right now, the API for using snapshots is the following:

1. The API to create/delete/rename snapshots are in FileSystem.2. The API to mark directories as snapshottable or not only exists inDistributedFileSystem and DFSAdmin, neither of which are intended to bepublic APIs.

In my opinion (and I think this was shared by others at the last snapshotsdesign meetup?) we should move #1 out of the FileSystem class since theseare primarily administrative APIs, and it is unlikely that any otherFileSystem implementation besides HDFS will ever implement these commands.Also, #2 should really be in some public (not necessarily stable, butpublic) class for use by tools which are used to administer HDFS. In myopinion the most natural place for both of these APIs is in the HdfsAdminclass, which is a public/evolving interface explicitly for these sorts ofoperations.

> Support for snapshots feature is being worked on in the jira> https://issues.apache.org/jira/browse/HDFS-2802. This is an important and> a> large feature in HDFS. Please see a brief presentation that describes the> feature at a highlevel from the Snapshot discussion meetup we had a while> back -> https://issues.apache.org/jira/secure/attachment/12552861/Snapshots.pdf.>> I am exicted to announce that the feature development will soon be> completed. Please see the jira for the design and the details of the> subtasks. This is a heads up about the merge vote mail that will soon be> sent.>> Details of development and testing:> Development has been done in a separate branch -> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2802. The> design is posted at ->> https://issues.apache.org/jira/secure/attachment/12551474/Snapshots20121030.pdf> .> The feature development has involved close to 100 subtasks and close to 20K> lines of code.>> A lot of unit tests have been added as a part of the feature. We also have> been testing this in a cluster of 5 nodes with a long running test that> mimics a real cluster usage with emphasis on use cases related to> snapshots. Please see the test plan>> https://issues.apache.org/jira/secure/attachment/12575442/snapshot-testplan.pdffor> the details.>> Next steps, before calling for merge vote, we need to get the following> done:> - Add user documentation that describes the feature, and how to use it> - Complete some of the pending tasks> - Continue testing the feature and fix any bugs that might come up> - Update the design document>> Thanks to everyone who has participated in design and development of this> feature. Please review the work and help in testing the feature.>> Regards,> Suresh>

Currently, allowSnapshot(..) and disallowSnapshot(..) are already in HdfsAdmin. The other operations createSnapshot(..), renameSnapshot(..) and deleteSnapshot(..) are actually user operations and they are declared in FileSystem. Users can take snapshots for their own directories once admin has allowed snapshots for those directories. Snapshot is not a HDFS-specific operation. Many other file systems do support it. No?

I'm very excited to see that this project is nearing completion. I've beenfollowing the development pretty closely and am very much looking forwardto getting this merged to trunk.

One thing that I do think we should address before the merge is moving theprogrammatic APIs for working with snapshots. I've brought this up before,and was told that it would be done in a separate JIRA, but I don't thinkthat JIRA was ever filed.

As it stands right now, the API for using snapshots is the following:

1. The API to create/delete/rename snapshots are in FileSystem.2. The API to mark directories as snapshottable or not only exists inDistributedFileSystem and DFSAdmin, neither of which are intended to bepublic APIs.

In my opinion (and I think this was shared by others at the last snapshotsdesign meetup?) we should move #1 out of the FileSystem class since theseare primarily administrative APIs, and it is unlikely that any otherFileSystem implementation besides HDFS will ever implement these commands.Also, #2 should really be in some public (not necessarily stable, butpublic) class for use by tools which are used to administer HDFS. In myopinion the most natural place for both of these APIs is in the HdfsAdminclass, which is a public/evolving interface explicitly for these sorts ofoperations.

> Support for snapshots feature is being worked on in the jira> https://issues.apache.org/jira/browse/HDFS-2802. This is an important and> a> large feature in HDFS. Please see a brief presentation that describes the> feature at a highlevel from the Snapshot discussion meetup we had a while> back -> https://issues.apache.org/jira/secure/attachment/12552861/Snapshots.pdf.>> I am exicted to announce that the feature development will soon be> completed. Please see the jira for the design and the details of the> subtasks. This is a heads up about the merge vote mail that will soon be> sent.>> Details of development and testing:> Development has been done in a separate branch -> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2802. The> design is posted at ->> https://issues.apache.org/jira/secure/attachment/12551474/Snapshots20121030.pdf> .> The feature development has involved close to 100 subtasks and close to 20K> lines of code.>> A lot of unit tests have been added as a part of the feature. We also have> been testing this in a cluster of 5 nodes with a long running test that> mimics a real cluster usage with emphasis on use cases related to> snapshots. Please see the test plan>> https://issues.apache.org/jira/secure/attachment/12575442/snapshot-testplan.pdffor> the details.>> Next steps, before calling for merge vote, we need to get the following> done:> - Add user documentation that describes the feature, and how to use it> - Complete some of the pending tasks> - Continue testing the feature and fix any bugs that might come up> - Update the design document>> Thanks to everyone who has participated in design and development of this> feature. Please review the work and help in testing the feature.

Ah, my bad. Not sure how I missed those. Good to see. Though, now that Ilook at them, those methods should really be taking Paths as arguments, notStrings. This is obviously quite minor, though.> The other operations createSnapshot(..), renameSnapshot(..) and> deleteSnapshot(..) are actually user operations and they are declared in> FileSystem. Users can take snapshots for their own directories once admin> has allowed snapshots for those directories. Snapshot is not a> HDFS-specific operation. Many other file systems do support it. No?>

Certainly other "file systems" support it, e.g. WAFL, ZFS, etc, but doother "FileSystem" (the Hadoop class) implementations, e.g.LocalFileSystem, S3FileSystem, etc? Will they ever? If they do, will theysupport sub-tree snapshots like HDFS does? Snapshots in general seem likesomething whose implementation, interface, etc. are highly filesystem-specific, and thus I don't think it makes a ton of sense to put thatAPI in what is intended to be a broad, stable interface. If we were to movethese operations into the HdfsAdmin interface, there's nothing to stopusers from using that interface instead of FileSystem. After all, that wasthe point of adding the HdfsAdmin class in the first place - to have apublic API for performing HDFS-specific operations.

Currently, allowSnapshot(..) and disallowSnapshot(..) are already in HdfsAdmin.

Ah, my bad. Not sure how I missed those. Good to see. Though, now that I look at them, those methods should really be taking Paths as arguments, not Strings. This is obviously quite minor, though.

The other operations createSnapshot(..), renameSnapshot(..) and deleteSnapshot(..) are actually user operations and they are declared in FileSystem. Users can take snapshots for their own directories once admin has allowed snapshots for those directories. Snapshot is not a HDFS-specific operation. Many other file systems do support it. No?>Certainly other "file systems" support it, e.g. WAFL, ZFS, etc, but do other "FileSystem" (the Hadoop class) implementations, e.g. LocalFileSystem, S3FileSystem, etc? Will they ever? If they do, will they support sub-tree snapshots like HDFS does? Snapshots in general seem like something whose implementation, interface, etc. are highly file system-specific, and thus I don't think it makes a ton of sense to put that API in what is intended to be a broad, stable interface. If we were to move these operations into the HdfsAdmin interface, there's nothing to stop users from using that interface instead of FileSystem. After all, that was the point of adding the HdfsAdmin class in the first place - to have a public API for performing HDFS-specific operations.--Aaron T. MyersSoftware Engineer, Cloudera

> HdfsAdmin is also for admin operations. However, createSnapshot etc> methods aren't.>

I agree that they're not administrative operations in the sense that theydon't strictly require super user privilege, but they are "administrative"in the sense that they will most-often be used by those administering HDFS.The HdfsAdmin class should not be construed to contain only operationswhich require super user privilege, even though that happens to be the caseright now. It's intended as just a public API for HDFS-specific operations.

Regardless, my point is not necessarily that these operations should gointo the HdfsAdmin class, but rather that they shouldn't go into theFileSystem class, since the snapshots API doesn't seem to me like it willgeneralize to other FileSystem implementations.

> On Fri, Apr 19, 2013 at 6:53 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:>> > HdfsAdmin is also for admin operations. However, createSnapshot etc> > methods aren't.> >>> I agree that they're not administrative operations in the sense that they> don't strictly require super user privilege, but they are "administrative"> in the sense that they will most-often be used by those administering HDFS.> The HdfsAdmin class should not be construed to contain only operations> which require super user privilege, even though that happens to be the case> right now. It's intended as just a public API for HDFS-specific operations.>> Regardless, my point is not necessarily that these operations should go> into the HdfsAdmin class, but rather that they shouldn't go into the> FileSystem class, since the snapshots API doesn't seem to me like it will> generalize to other FileSystem implementations.>>Agreed. The cases of WAFL/ZFS were brought up -- in those file systems,even if users may take snapshots, they're done using FS-specific APIsrather than any standard Linux interface. So, I'm in favor of eitherputting the APIs in HdfsAdmin, or alternatively in DistributedFileSystem,forcing a user to down-cast if they want to use the HDFS-specific operation.

> On Fri, Apr 19, 2013 at 3:36 AM, Aaron T. Myers <[EMAIL PROTECTED]> wrote:>> > On Fri, Apr 19, 2013 at 6:53 AM, Tsz Wo Sze <[EMAIL PROTECTED]> wrote:> >> > > HdfsAdmin is also for admin operations. However, createSnapshot etc> > > methods aren't.> > >> >> > I agree that they're not administrative operations in the sense that they> > don't strictly require super user privilege, but they are> "administrative"> > in the sense that they will most-often be used by those administering> HDFS.> > The HdfsAdmin class should not be construed to contain only operations> > which require super user privilege, even though that happens to be the> case> > right now. It's intended as just a public API for HDFS-specific> operations.> >>

I have to disagree about adding this functionality to HdfsAdmin. HdfsAdminclass is for admin operations. As Nicholas has said, the snapshot operations are nothing different from mkdir, create file kind of operations.> > Regardless, my point is not necessarily that these operations should go> > into the HdfsAdmin class, but rather that they shouldn't go into the> > FileSystem class, since the snapshots API doesn't seem to me like it will> > generalize to other FileSystem implementations.> >> >> Agreed. The cases of WAFL/ZFS were brought up -- in those file systems,> even if users may take snapshots, they're done using FS-specific APIs> rather than any standard Linux interface. So, I'm in favor of either> putting the APIs in HdfsAdmin, or alternatively in DistributedFileSystem,> forcing a user to down-cast if they want to use the HDFS-specific> operation.I have hard time understanding the issue related to adding these methods to FileSystem API. I think we already have many operations, one mightargue does not belong to generic file system such as getting block size,file checksum, operations to copy from local, or copy to local, gettingreplication etc. These are operations that are largely influenced by havingHDFS as the dominant implementation.

I also think there are other operations that are only inDistributedFileSystemshould be moved down to FileSystem. Such as concat etc. I think it isperfectlyokay for the base FileSystem to throw unsupported exception for suchoperations.Current way of casting a FileSystem to a non public DistributedFileSystem isnot a good idea.

Other file system which support snapshot could implement these methods.Implementing these methods does not mean, they have to use the samesnapshot path convention. They can document and provide their own conventionfor supporting snapshot paths.