Victor Z. Peng
added a comment - 20/Oct/11 15:17 Just to get clarified, does this mean dumping specific column families for specific tables? So the 'snapshot' command will take both table names and column familie names.

It seems the implementation is not very hard, we simply just add more helper functions that could handle the request to snapshot a subset of columns, since org.apache.cassandra.db.Table#snapshot is snapshotting columns one by one already.

One thing's worth discussing is how we support specifying both multi-tables and multi-columns simultaneously via NodeTool in CLI.

What i propose is we add a '-c' option, which will accept an argument as column names. Since an option can accept at most on argument, the argument can be formatted as comma separated string: "-c col1,col2,col3". When -c is absent, we default it to all columns. Does this sound ok?

Victor Z. Peng
added a comment - 04/Nov/11 20:02 It seems the implementation is not very hard, we simply just add more helper functions that could handle the request to snapshot a subset of columns, since org.apache.cassandra.db.Table#snapshot is snapshotting columns one by one already.
One thing's worth discussing is how we support specifying both multi-tables and multi-columns simultaneously via NodeTool in CLI.
What i propose is we add a '-c' option, which will accept an argument as column names. Since an option can accept at most on argument, the argument can be formatted as comma separated string: "-c col1,col2,col3". When -c is absent, we default it to all columns. Does this sound ok?

I think it would be more consistent w/ the other options to just add an optional (single) columnfamily argument at the end. If you want to snapshot more than one you can always issue multiple snapshot commands.

Jonathan Ellis
added a comment - 04/Nov/11 20:20 I think it would be more consistent w/ the other options to just add an optional (single) columnfamily argument at the end. If you want to snapshot more than one you can always issue multiple snapshot commands.

Victor Z. Peng
added a comment - 04/Nov/11 20:51 You are right! I want to start with this fix now. My first fix for Cassandra
3 more questions:
Do I have to start after I have been assigned to this bug? Or just write and submit?
No tests for NodeTool?
NodeTool Wiki page not updated.

Victor Z. Peng
added a comment - 26/Nov/11 13:48 I assume we don't need to support specific column family for CLEARSNAPSHOT? It's harder to implement and I think we don't have a use case in practice at the moment?

Dave Brosius
added a comment - 14/Apr/12 04:43 Added a new command
snapshot_columnfamily keyspace columnfamily {-t snapshotname}
because hijacking the existing snapshot command is problematic, because
1) you can specify 1-n keyspaces so disambiquating what is the keyspace and what is the column family is difficult.
2) a column family name could exist in multiple keyspaces.
applied in trunk

I think it would be good to split up the method calls at the JMX level as well, since it doesn't really make sense to apply a specific CF name AND multiple keyspaces at the same time. What do you think?

Nit: help in nodecommand adds a second line for "snapshot" instead of "snapshot_columnfamily"

Jonathan Ellis
added a comment - 16/Apr/12 21:44 Thanks, Dave!
I think it would be good to split up the method calls at the JMX level as well, since it doesn't really make sense to apply a specific CF name AND multiple keyspaces at the same time. What do you think?
Nit: help in nodecommand adds a second line for "snapshot" instead of "snapshot_columnfamily"

Jonathan Ellis
added a comment - 16/Apr/12 21:58 Just wanted to make sure folks were ok with splitting the command as it is
I guess the main alternative would be to add more -flags... I'm okay breaking backwards compatibility there.

the issue with -flags, is then you have the potential situation of n keyspaces with a cf name... which might be confusing... hopefully people don't have the same cf name in multiple keyspaces. -flags is also different then the way other commands handle cfs. But i'm fine with doing it that way as well. If that were the case there would be only one jmx call i would think.

Dave Brosius
added a comment - 16/Apr/12 22:09 the issue with -flags, is then you have the potential situation of n keyspaces with a cf name... which might be confusing... hopefully people don't have the same cf name in multiple keyspaces. -flags is also different then the way other commands handle cfs. But i'm fine with doing it that way as well. If that were the case there would be only one jmx call i would think.

that could potentially take snapshots of multiple 'foo's (one each in multiple keyspaces) which might be something the admin wasn't realizing... right? or am i wrong and cf names are unique across the cluster?

Dave Brosius
added a comment - 16/Apr/12 22:26 if one did
nodetool snapshot -cf foo
that could potentially take snapshots of multiple 'foo's (one each in multiple keyspaces) which might be something the admin wasn't realizing... right? or am i wrong and cf names are unique across the cluster?

Jonathan Ellis
added a comment - 16/Apr/12 22:32 Ah, I see. Quite right, CF names are not unique. (So what you could do is check the schema nodetool-side and spit back a "which KS did you want to snapshot CF in?" error...)

Dave Brosius
added a comment - 17/Apr/12 01:59 rework to only use the snapshot command and honor an optional -cf tag for the column family. If the -cf tag is used, insist that one and only keyspace is specified.
patch against trunk
the jmx call will not be backwards compatible.