The default behavior is to require all regions of a table to coordinate in order to
prevent writing to the table, and to perform a flush at the same time. This causes the
table to be unavailable for writes while the snapshot is being taken.

There is no locking anywhere in snapshots.
Coordination is just the process of asking to each Region to perform the snapshot operation and then report back once they are done.
The default behavior of snapshot is flushing (not locking), which means that the data in the memstore at the moment of the snapshot operation on the region will be included in the snapshot. but you still can write or read while the snapshot operation is going on.

However, if your set-up can tolerate the possibility of some data not being captured by the snapshot, you can use the <option>SKIP_FLUSH</option> option of the snapshot command to disable locking and flushing while taking the snapshot.

sort of ok, since there is no locking you don't really know which data is present in the snapshot. "A snapshot is an row-consistent image of what was in the table between the time you asked for the snapshot and the completion". Skip flush will avoid to flush the memstore which means the data not flushed yet will not be included in the snapshot (you don't have any way to know what).
just emphasize that the only difference is the memstore not flushed

Using this option can result in data lost if you try to restore a snapshot which is missing data that was written while the snapshot was taken.

can you reword this avoiding the "can result in data lost". You don't lose any data on restore.
Since there is no locking between the start and the end of the snapshot operation, you can still insert data while the snapshot is in progress. The snapshot is just a "snapshot at some point in time", so the data inserted on a particular region after the snapshot was taken is not in the snapshot. With the skip flush not only the data inserted after the snapshot operation, but all the data in the memstore is not available.

Matteo Bertozzi
added a comment - 14/Aug/14 08:41 The default behavior is to require all regions of a table to coordinate in order to
prevent writing to the table, and to perform a flush at the same time. This causes the
table to be unavailable for writes while the snapshot is being taken.
There is no locking anywhere in snapshots.
Coordination is just the process of asking to each Region to perform the snapshot operation and then report back once they are done.
The default behavior of snapshot is flushing (not locking), which means that the data in the memstore at the moment of the snapshot operation on the region will be included in the snapshot. but you still can write or read while the snapshot operation is going on.
However, if your set-up can tolerate the possibility of some data not being captured by the snapshot, you can use the <option>SKIP_FLUSH</option> option of the snapshot command to disable locking and flushing while taking the snapshot.
sort of ok, since there is no locking you don't really know which data is present in the snapshot. "A snapshot is an row-consistent image of what was in the table between the time you asked for the snapshot and the completion". Skip flush will avoid to flush the memstore which means the data not flushed yet will not be included in the snapshot (you don't have any way to know what).
just emphasize that the only difference is the memstore not flushed
hbase> snapshot 'mytable', 'snapshot123', 'SKIP_FLUSH'
That does not work, the correct one is:
snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => true }
Using this option can result in data lost if you try to restore a snapshot which is missing data that was written while the snapshot was taken.
can you reword this avoiding the "can result in data lost". You don't lose any data on restore.
Since there is no locking between the start and the end of the snapshot operation, you can still insert data while the snapshot is in progress. The snapshot is just a "snapshot at some point in time", so the data inserted on a particular region after the snapshot was taken is not in the snapshot. With the skip flush not only the data inserted after the snapshot operation, but all the data in the memstore is not available.

The SKIP_FLUSH = false will result in an empty snapshot, since all the data is in the memstore

The SKIP_FLUSH = true will result in a snapshot with [A, B, C] and [M, N, O] since you flush the memstores
now, if you have writes coming in.. but keep in mind there is no lock
The RS-1 may start flushing/taking the snapshot before any write so you get [A, B, C] and concurrent writes will be added to the "new memstore".
RS-2 may start flushing/taking the snapshot after some writes so you get [M, N, O, P, Q] but concurrent writes during the flush will be added to the new memstore

Matteo Bertozzi
added a comment - 15/Aug/14 08:11 almost, but we never lock
The default behavior is to require all regions of a table to coordinate in order to prevent writing to the table, and to perform a flush of data in memory at the same time
we don't lock writes, we just do a flush. and it is basically the only difference between SKIP_FLUSH = false and SKIP_FLUSH = true.
Using this option means that data can be inserted into a table while a snapshot is in progress, and that new data will not be included in the snapshot.
the first part seems to refer to the "write locked" on the other case, which is not true.
maybe just something like "the data present in the memstore will not be included in the snapshot"?
There is no way to determine the newest data that will be included in the snapshot if flushing is disabled.
This is true in both case since there is no lock around the snapshot operation.
Let me try to do an example:
Create table
Add data to table (RS-1 memstore has: [A, B, C] , RS-2 memstore has: [M, N, O] )
assuming that you don't do any write now, and you take a snapshot
The SKIP_FLUSH = false will result in an empty snapshot, since all the data is in the memstore
The SKIP_FLUSH = true will result in a snapshot with [A, B, C] and [M, N, O] since you flush the memstores
now, if you have writes coming in.. but keep in mind there is no lock
The RS-1 may start flushing/taking the snapshot before any write so you get [A, B, C] and concurrent writes will be added to the "new memstore".
RS-2 may start flushing/taking the snapshot after some writes so you get [M, N, O, P, Q] but concurrent writes during the flush will be added to the new memstore

I'm +1 with it. There are a couple of words that may not be 100% correct but I think is correct enough to give the idea. I'll wait another day before committing to see if you or someone else have alternatives.

whether a very recent insert or update

"very recent" is probably not the right way to describe it.. maybe "concurrent" is better.

A snapshot is only a representation of a table at a given point in time

some one can argue on the meaning of "at a given point in time" since is more "a window", since the time it takes to reach every server can take from few seconds up to a minute depending on how slow is your machine.

Matteo Bertozzi
added a comment - 19/Aug/14 22:51 I'm +1 with it. There are a couple of words that may not be 100% correct but I think is correct enough to give the idea. I'll wait another day before committing to see if you or someone else have alternatives.
whether a very recent insert or update
"very recent" is probably not the right way to describe it.. maybe "concurrent" is better.
A snapshot is only a representation of a table at a given point in time
some one can argue on the meaning of "at a given point in time" since is more "a window", since the time it takes to reach every server can take from few seconds up to a minute depending on how slow is your machine.