Improve the current client API by creating new container classes

Details

Description

The current API does not scale very well. For each new feature, we have to add many methods to take care of all the overloads. Also, the need to batch row operations (gets, inserts, deletes) implies that we have to manage some "entities" like we are able to do with BatchUpdate but not with the other operations. The RowLock should be an attribute of such an entity.

The scope of this jira is only to replace current API with another feature-compatible one, other methods will be added in other issues.

Jean-Daniel Cryans
added a comment - 10/Sep/08 16:06 - edited Proposed RowOperation class:
public class RowOperation implements Writable {
private byte [] row;
private long timestamp = HConstants.LATEST_TIMESTAMP;
private long rowLock = -1L;
constructors, getters, setters, read, write...
}
Class to be used for gets:
public class RowGet extends RowOperation {
private byte [][] columns;
}
RowGet used in a get (in opposition to getRow) would only use the first column in the byte array. Is it confusing?
BatchUpdate would now extend RowOperation (and maybe should be renamed?)
public class BatchUpdate extends RowOperation implements Iterable<BatchOperation> {
private ArrayList<BatchOperation> operations =
new ArrayList<BatchOperation>();
}
Maybe we can do the same for scanners?
All methods in the API would use their corresponding class. All others would be deprecated.

Jim Kellerman
added a comment - 10/Sep/08 18:10 I'm not sure I follow. Won't RowOperation still require all the overloads?
And isn't RowOperation more like BatchOperation than BatchUpdate?
If you have multiple gets, do you return a RowResult?
I would propose something like deprecating BatchOperation and add BatchPut, BatchDelete, BatchGet and BatchGetRow. Deprecate BatchUpdate and add RowOperation.
We'd need a new method to send the RowOperation to the server. Commit doesn't make sense, especially if you do:
RowLock lock = lockRow(row)
RowOperation op = new RowOperation(..., lock);
op.add( new BatchPut(...)
op.add( new BatchGet(...)
...
RowResult result = send(op)
op = new RowOperation(..., lock)
op.add(...)
...
result = send(op)
unlockRow(lock)

stack
added a comment - 10/Sep/08 19:21 Change RowGet to be RowColumnOperation (doesn't seem to be explicitly about 'Get').
Rename BatchUpdate as RowOperations?
Is API wrong?
public class BatchUpdate extends RowOperation implements Iterable<BatchOperation> {
Should that be Iteralble<RowOperation>?
Please add illustration of this API used when we have batches of row updates.

In my design, a RowOperation is to be subclassed into RowGet and BatchUpdate (or a better name would be RowMutation). The rest would be the same, the BatchUpdate still has a bunch of BatchOperation. When we will have batches of row updates, it would look like:

ArrayList<RowOperation> batch = new ArrayList<RowOperation>();
RowGet rowGet = new RowGet(row);
rowGet.addColumn("info:");
batch.add(rowGet)
//add some more gets in batch
RowResult[] results = htable.get(batch); // or maybe a SortedMap?
batch = new ArrayList<RowOperation>();
BatchUpdate update = new BatchUpdate(row);
//add some BatchOperation in it
batch.add(update);
//add some more BatchUpdate in batch
htable.commit(batch);

In Jim's design, a RowOperation aggregates many operations on a single row like BatchPut, BatchDelete, etc.

Jean-Daniel Cryans
added a comment - 10/Sep/08 19:34 In my design, a RowOperation is to be subclassed into RowGet and BatchUpdate (or a better name would be RowMutation). The rest would be the same, the BatchUpdate still has a bunch of BatchOperation. When we will have batches of row updates, it would look like:
ArrayList<RowOperation> batch = new ArrayList<RowOperation>();
RowGet rowGet = new RowGet(row);
rowGet.addColumn( "info:" );
batch.add(rowGet)
//add some more gets in batch
RowResult[] results = htable.get(batch); // or maybe a SortedMap?
batch = new ArrayList<RowOperation>();
BatchUpdate update = new BatchUpdate(row);
//add some BatchOperation in it
batch.add(update);
//add some more BatchUpdate in batch
htable.commit(batch);
In Jim's design, a RowOperation aggregates many operations on a single row like BatchPut, BatchDelete, etc.
ArrayList<RowOperation> batch = new ArrayList<RowOperation>();
RowOperation op = new RowOperation(..., lock);
op.add( new BatchPut(...))
op.add( new BatchGet(...))
batch.add(op)
//add some more RowOperation in batch
RowResult[] results = htable.send(batch); // the results would come from the rows RowOperation that contained BatchGet/BatchGetRow

... we should make htable.get so it can take a List of RowGets or a single RowGet (should fail on RowOperation). Or maybe we should add a getBatch that takes a List and returns SortedMap whereas get takes single RowGet and returns a RowResult.

RowGet should take columns in its constructor. Should be immutable type.

stack
added a comment - 10/Sep/08 20:05 Suggest RowUpdate instead of RowMutation or BatchUpdate. Goes better with RowGet than RowMutation, IMO.
Regards the below...
RowResult[] results = htable.get(batch); // or maybe a SortedMap?
... we should make htable.get so it can take a List of RowGets or a single RowGet (should fail on RowOperation). Or maybe we should add a getBatch that takes a List and returns SortedMap whereas get takes single RowGet and returns a RowResult.
RowGet should take columns in its constructor. Should be immutable type.

Jean-Daniel Cryans
added a comment - 11/Sep/08 02:27 @stack
Yeah RowUpdate, had a blank.
I would prefer a get that returns a SortedMap and another that returns a RowResult.
But I think we should settle on a certain design before further commenting a particular one. Both are good, would like others opinion,

Jim Kellerman
added a comment - 12/Sep/08 19:23 - edited Ok, I'm on board with JD's basic design with some modifications:
RowOperation should be abstract
Don't reuse the name BatchUpdate - it will be harder to deprecate if we do. Use RowMutation instead
Make RowResult implement Comparable
Have two flavors of get like we do with commit:
public RowResult get(RowGet)
public SortedSet< RowResult> get(List<RowGet>)
When getting multiple versions, you can still use a RowResult as Cell can contain multiple values and timestamps.
Constructors for RowGet should have multiple overloads so you can get a single column, multiple columns, multiple versions, a single timestamp or a pair of timestamps indicating a range.
We should probably also have a RowDelete class so we can support deleteAll
I think we should leave Scanners alone.

Constructors for RowGet should have multiple overloads so you can get a single column, multiple columns, multiple versions, a single timestamp or a pair of timestamps indicating a range.

I would keep the two methods get and getRow as a way to be more transition-friendly. This means that I would create 2 subclasses to RowOperation. Looking at the code inside HRS, it also make more sense.

Jean-Daniel Cryans
added a comment - 16/Sep/08 19:21 Constructors for RowGet should have multiple overloads so you can get a single column, multiple columns, multiple versions, a single timestamp or a pair of timestamps indicating a range.
I would keep the two methods get and getRow as a way to be more transition-friendly. This means that I would create 2 subclasses to RowOperation. Looking at the code inside HRS, it also make more sense.

Jean-Daniel Cryans
added a comment - 16/Sep/08 23:41 First try on this issue. Passes the tests. Please review.
When reviewing, please keep in mind the following:
This first patch is more about showing what's the new API like.
The package level javadoc is not modified at this point nor is the rest of the documentation.

I need to study it more. We have to be real careful making fundamental changes to our API – as is this patch – to make sure we get it right. On my first drive through, it strikes me that it needs to be more elegant than it is at flush blush.

Could add the exception as argument if wanted rather than printStackTrace.

Don't touch whats under v5. Thats migration stuff. Move what it needs under v5 if you want to change BatchUpdate, etc., or just leave them in place till actual removal and we can worry about how to migrate then.

Below is synopsis of J-D/Stack back and forth:

[12:33] <st^ack> jdcryans: RowGet doesn't have a constructor to set number of versions?
[12:34] <jdcryans> st^ack: I wish that we don't have to change the API at each new feature
[12:34] <jdcryans> so keeping a small constructor is, I think, the way to go
[12:34] <jdcryans> for the rest, setters
[12:35] <st^ack> Immutables are nice.
[12:35] <st^ack> You have to add setter/getter every time you add a new feature.
[12:36] <st^ack> The worry is that the constructor will grow madly?
[12:36] <jdcryans> st^ack: yeah
[12:36] <jdcryans> and that we have to have every combination for every feature set
[12:43] <st^ack> but we have to add a getter/setter for each new attribute, right?
[12:44] <st^ack> Why is RowDelete different from RowUpdate?
[12:45] <jdcryans> st^ack: yes, new getter/setter, that's what I think is better instead of g/s + all contructors
[12:46] <jdcryans> RowDelete is for deleteAll and deleteFamily, does not have a list of batchoperations
[12:46] <jdcryans> if you think it's bad, I would say that a=b=c so current is bad too
[12:46] <jdcryans> current API*
[12:48] <st^ack> deleteAll breaks things. usually the operation contains the rows to operate on. when deleteAll, the operation doesn't supply columns -- presumtpion is all columns.
[12:48] <st^ack> Current API doesn't scale, true.
[12:48] <st^ack> How do we do write-if-not-modified using this changed API?
[12:49] <st^ack> you know, the feature where we only update a value if it hasn't been changed
[12:49] <jdcryans> yeah yeah I was in that conversation
[12:49] <st^ack> i.e. take out a lock on row, look at value, then withing same row lock do update if not changed.
[12:50] <jdcryans> that would be another operation I guess
[12:51] <st^ack> jdcryans: I don't think I like the fact that our RPC is having primitive types replaced by more compound types
[12:51] <jdcryans> at least that would make another bunch of methods in HTable
[12:51] <st^ack> let me read more
[12:52] <jdcryans> then it will be hard to batch gets
[12:52] <st^ack> sort of.
[12:52] <st^ack> Could be row [][]
[12:53] <st^ack> column and timestamp same for all
[12:53] <st^ack> let me read more
[12:53] <jdcryans> st^ack: yeah but's limiting
[12:53] <jdcryans> public RowResult getRow(finalbyte [][] row, finalbyte [][][] columns, finallong[] ts, final RowLock[] rl) would be ugly too
[12:59] <st^ack> On the pattern of setters vs. constructor args, I kinda feel we should do one or the other. Odd spanning both.
[13:00] <jdcryans> st^ack: y
[13:00] <st^ack> Regards expanding constructor, is it not the case that it won't be as bad once the extra args have been moved out of HTable method names instead into Operation construction.
[13:01] <jdcryans> st^ack: won't be as bad, true
[13:02] <st^ack> I suppose adding a new feature, you'll have to fix the baseline Operation class and then all of its derivatives. That'll be a bit messy.
[13:02] <jdcryans> st^ack: you mean the constructors?
[13:02] <st^ack> Yeah, constructors
[13:03] <jdcryans> yes that's something I saw too, having to recopy all of them
[13:03] <jdcryans> part of why I prefer to keep g/s (but forgot about it since now :P)
[13:04] <st^ack> Whats a constructor now? column(s), start-timestamp, end-timestamp, versions
[13:04] <st^ack> Does lock belong inside an Operation?
[13:04] <st^ack> Shouldn't lock be outside an operation?
[13:06] <jdcryans> st^ack: same goal, remove overloads
[13:06] <jdcryans> I see that as one new feature, got the same treatment
[13:07] <st^ack> locks span operations though?
[13:07] <st^ack> row locks span row operations
[13:07] <st^ack> Its different that timestamp, etc.
[13:07] <jdcryans> proposed design does not prevent that
[13:09] <jdcryans> would have to bundle lock with every operation, pretty much in the same way as if it was another parameter
[13:12] <st^ack> To lock across a set of Operations, you suggest bundling the lock with each Operation -- setting it into each Operation?
[13:12] <jdcryans> yes
[13:12] <st^ack> What if user included an Operation in a set for a row that did not include the lock.
[13:12] <st^ack> How should that run over on the server.
[13:13] <st^ack> What is difference between RowDelete and RowUpdate (or do you want me to read the code -- smile)
[13:13] <jdcryans> garbage in garbage out for that user
[13:14] <jdcryans> I don't copy that last question
[13:14] <st^ack> If lock was a distinct param on a method in HTable, then we'd have to have two versions of every method -- one with lock and one without?
[13:15] <st^ack> as we do now.
[13:15] <st^ack> I want to know how RowUpdate and RowDelete differ. Why couldn't I just pass a RowUpdate to a deleteAll?
[13:15] <jdcryans> yes
[13:17] <jdcryans> deleteAll wouldn't know which column to take
[13:17] <jdcryans> if RowUpdate has many BatchOperations
[13:18] <jdcryans> RowUpdate == BatchUpdate
[13:20] <st^ack> Sorry, I don't follow. I need to study this patch more.

stack
added a comment - 19/Sep/08 21:27 I took a look at this patch with J-D as Virgil.
I need to study it more. We have to be real careful making fundamental changes to our API – as is this patch – to make sure we get it right. On my first drive through, it strikes me that it needs to be more elegant than it is at flush blush.
Meantime, here a few remarks.
Is below intentional:
Index: src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
===================================================================
--- src/java/org/apache/hadoop/hbase/client/HConnectionManager.java (revision 696049)
+++ src/java/org/apache/hadoop/hbase/client/HConnectionManager.java (working copy)
@@ -860,10 +860,12 @@
}
exceptions.add(t);
if (tries == numRetries - 1) {
+ t.printStackTrace();
throw new RetriesExhaustedException(callable.getServerName(),
callable.getRegionName(), callable.getRow(), tries, exceptions);
}
if (LOG.isDebugEnabled()) {
+ t.printStackTrace();
LOG.debug( "reloading table servers because: " + t.getMessage());
Could add the exception as argument if wanted rather than printStackTrace.
Don't touch whats under v5. Thats migration stuff. Move what it needs under v5 if you want to change BatchUpdate, etc., or just leave them in place till actual removal and we can worry about how to migrate then.
Below is synopsis of J-D/Stack back and forth:
[12:33] <st^ack> jdcryans: RowGet doesn't have a constructor to set number of versions?
[12:34] <jdcryans> st^ack: I wish that we don't have to change the API at each new feature
[12:34] <jdcryans> so keeping a small constructor is, I think, the way to go
[12:34] <jdcryans> for the rest , setters
[12:35] <st^ack> Immutables are nice.
[12:35] <st^ack> You have to add setter/getter every time you add a new feature.
[12:36] <st^ack> The worry is that the constructor will grow madly?
[12:36] <jdcryans> st^ack: yeah
[12:36] <jdcryans> and that we have to have every combination for every feature set
[12:43] <st^ack> but we have to add a getter/setter for each new attribute, right?
[12:44] <st^ack> Why is RowDelete different from RowUpdate?
[12:45] <jdcryans> st^ack: yes, new getter/setter, that's what I think is better instead of g/s + all contructors
[12:46] <jdcryans> RowDelete is for deleteAll and deleteFamily, does not have a list of batchoperations
[12:46] <jdcryans> if you think it's bad, I would say that a=b=c so current is bad too
[12:46] <jdcryans> current API*
[12:48] <st^ack> deleteAll breaks things. usually the operation contains the rows to operate on. when deleteAll, the operation doesn't supply columns -- presumtpion is all columns.
[12:48] <st^ack> Current API doesn't scale, true .
[12:48] <st^ack> How do we do write- if -not-modified using this changed API?
[12:49] <st^ack> you know, the feature where we only update a value if it hasn't been changed
[12:49] <jdcryans> yeah yeah I was in that conversation
[12:49] <st^ack> i.e. take out a lock on row, look at value, then withing same row lock do update if not changed.
[12:50] <jdcryans> that would be another operation I guess
[12:51] <st^ack> jdcryans: I don't think I like the fact that our RPC is having primitive types replaced by more compound types
[12:51] <jdcryans> at least that would make another bunch of methods in HTable
[12:51] <st^ack> let me read more
[12:52] <jdcryans> then it will be hard to batch gets
[12:52] <st^ack> sort of.
[12:52] <st^ack> Could be row [][]
[12:53] <st^ack> column and timestamp same for all
[12:53] <st^ack> let me read more
[12:53] <jdcryans> st^ack: yeah but's limiting
[12:53] <jdcryans> public RowResult getRow( final byte [][] row, final byte [][][] columns, final long [] ts, final RowLock[] rl) would be ugly too
[12:59] <st^ack> On the pattern of setters vs. constructor args, I kinda feel we should do one or the other. Odd spanning both.
[13:00] <jdcryans> st^ack: y
[13:00] <st^ack> Regards expanding constructor, is it not the case that it won't be as bad once the extra args have been moved out of HTable method names instead into Operation construction.
[13:01] <jdcryans> st^ack: won't be as bad, true
[13:02] <st^ack> I suppose adding a new feature, you'll have to fix the baseline Operation class and then all of its derivatives. That'll be a bit messy.
[13:02] <jdcryans> st^ack: you mean the constructors?
[13:02] <st^ack> Yeah, constructors
[13:03] <jdcryans> yes that's something I saw too, having to recopy all of them
[13:03] <jdcryans> part of why I prefer to keep g/s (but forgot about it since now :P)
[13:04] <st^ack> Whats a constructor now? column(s), start-timestamp, end-timestamp, versions
[13:04] <st^ack> Does lock belong inside an Operation?
[13:04] <st^ack> Shouldn't lock be outside an operation?
[13:06] <jdcryans> st^ack: same goal, remove overloads
[13:06] <jdcryans> I see that as one new feature, got the same treatment
[13:07] <st^ack> locks span operations though?
[13:07] <st^ack> row locks span row operations
[13:07] <st^ack> Its different that timestamp, etc.
[13:07] <jdcryans> proposed design does not prevent that
[13:09] <jdcryans> would have to bundle lock with every operation, pretty much in the same way as if it was another parameter
[13:12] <st^ack> To lock across a set of Operations, you suggest bundling the lock with each Operation -- setting it into each Operation?
[13:12] <jdcryans> yes
[13:12] <st^ack> What if user included an Operation in a set for a row that did not include the lock.
[13:12] <st^ack> How should that run over on the server.
[13:13] <st^ack> What is difference between RowDelete and RowUpdate (or do you want me to read the code -- smile)
[13:13] <jdcryans> garbage in garbage out for that user
[13:14] <jdcryans> I don't copy that last question
[13:14] <st^ack> If lock was a distinct param on a method in HTable, then we'd have to have two versions of every method -- one with lock and one without?
[13:15] <st^ack> as we do now.
[13:15] <st^ack> I want to know how RowUpdate and RowDelete differ. Why couldn't I just pass a RowUpdate to a deleteAll?
[13:15] <jdcryans> yes
[13:17] <jdcryans> deleteAll wouldn't know which column to take
[13:17] <jdcryans> if RowUpdate has many BatchOperations
[13:18] <jdcryans> RowUpdate == BatchUpdate
[13:20] <st^ack> Sorry, I don't follow. I need to study this patch more.

Jim Kellerman
added a comment - 29/Sep/08 20:01 Overall, I think the patch is moving in the right direction, but I think that the current patch is not much less complicated than the current implementation. I would change it as follows:
RowOperation becomes a concrete class
remove the timestamp as timestamps are associated with columns and not rows (for getRow and deleteAll see below)
change the constuctor so it takes a row lock and not a timestamp
add new member that is List<ColumnOperation> (see below).
Add a new base class for columns:
public class ColumnOperation implements Writable, Comparable<ColumnOperation> {
// An empty column family means all columns in row.
protected byte [] columnFamily = HConstants.EMPTY_BYTE_ARRAY;
// An empty member means all members in the column.
protected byte [] familyMember = HConstants.EMPTY_BYTE_ARRAY;
protected long timestamp = HConstants.LATEST_TIMESTAMP;
protected int numVersions = 1;
public ColumnOperation(){} // for serialization
protected ColumnOperation( byte [] family, byte [] member, long timestamp, int numVersions) {
...
}
...
}
The class above can be used to replace get and getRow.
get(RowOperation) assumes that the List<ColumnOperation> is a list of ColumnOperation.
Deprecate both BatchOperation and BatchUpdate.
Add a new class that replaces BatchOperation:
public class ColumnMutation extends ColumnOperation {
// An empty value means delete the specified column.
protected byte [] value = HConstants.EMPTY_BYTE_ARRAY;
public ColumnMutation() {} // for serialization
// For updates (put)
public ColumnMutation( byte [] family, byte [] member, byte [] value, long timestamp) {
...
}
// For deletes
public ColumnMutation( byte [] family, byte [] member, long timestamp) {
...
}
}
HTable.commit is used for both updates and deletes. It assumes that the List<ColumnOperation> is a list of ColumnMutation.
deleteAll becomes a commit of a RowOperation with one or more ColumnMutations:
A single ColumnMutation with empty value, empty family, member, and default timestamp deletes all values for that row. With a timestamp, all the entries in the row that correspond to timestamp.
A single ColumnMutation with empty value, non-empty family corresponds to deleteFamily (with or without timestamp as above)
A single ColumnMutation with empty value, non-empty family and non-empty member corresponds to current batchUpdate delete behavior
Multiple ColumnMutations with empty value can delete multiple families or multiple members.
Thus commit replaces current commit, deleteAll and deleteFamily.

Jim Kellerman
added a comment - 29/Sep/08 20:54 To clarify my previous comment (and correct a mistake):
The overall idea is to replace all gets and getRows with RowResult get(RowOperation).
All updates, deleteAll and deleteFamily can be replaced with void commit(RowOperation)
ColumnOperation is used for get and getRow. It has no value field so it cannot be used for updates.
ColumnOperation should have a single byte array for family:member, since family names are printable.
'family:member' specifies a specific member of a family
'family:' means the Cell in which the member name is null
'family' (no delimiter) means all members in the family
a completely empty byte array means all members of all families
ColumnMutations are a replacement for BatchOperation in which they subclass ColumnOperation and add only the value.
an empty value specifies a delete just as BatchOperation does today

Jim Kellerman
added a comment - 29/Sep/08 22:17 Actually, get(RowOperation) should return Cell[] as get does currently.
There is no need to return a RowResult since we already know the row.
getRow(RowOperation) could return HBaseMapWritable<byte[], Cell> even though it currently returns RowResult.

Add new class similar to BatchOperation and ColumnMutation above so that it can handle both updates and deletes.
(It could be a private inner class to RowMutation) To support HBASE-847, we'd need to add numVersions and
startTimestamp to this class. This would eliminate deleteAll and deleteFamily and all changes to a row would be
done via commit.

Jim Kellerman
added a comment - 30/Sep/08 00:19 - edited Ok, scratch my comment above which redesigns RowOperation and adds new classes.
RowOperation is basically fine as it is
both get and update currently support one time stamp so keeping it in RowOperation is fine.
however do add another constructor (or two) in which you can specify the RowLock.
I would eliminate RowSingleColumnOperation and RowGetRow and modify RowGet to have
private byte [][] columns;
private int numVersions;
(to support HBASE-847 and its sub tasks we could add another member, private long startTimestamp, to support getting
values between two timestamps)
public Cell[] get(RowGet) would ensure that columns.length == 1
public RowResult getRow(RowGet) could be changed to return SortedMap<byte[], Cell>
Eliminate RowDelete.
Deprecate BatchOperation.
Rename RowUpdate to RowMutation.
Add new class similar to BatchOperation and ColumnMutation above so that it can handle both updates and deletes.
(It could be a private inner class to RowMutation) To support HBASE-847 , we'd need to add numVersions and
startTimestamp to this class. This would eliminate deleteAll and deleteFamily and all changes to a row would be
done via commit.
That sound better?

Jim Kellerman
added a comment - 30/Sep/08 06:27 Constructor overloads or setter's?
For something like RowGet, the number of overloads expands exponentially. The required parameter is the row name.
The other parameters are:
one or more column names
timestamp
number of versions
lockid
combine that with overloads for String or String[] over byte[] or byte[][] and the total amounts to about 30 overloads.
Compare that with just overloads for
String row, String column
byte[] row, byte[] column
String row, String column[]
byte[] row, byte[][] column
and setter's for timestamp, number of versions, and lockid and you reduce ~30 overloads to 4 constructors and 3 setters.
This seems like the better route to take. The common options are covered, defaults are provided, and if you want the
optional parameters, use the setters.
Comments?

Jean-Daniel Cryans
added a comment - 30/Sep/08 14:30 and setter's for timestamp, number of versions, and lockid and you reduce ~30 overloads to 4 constructors and 3 setters.
This is my opinion too. +1
Regards the rest of the Sept. 29 comments, is the "Jim Kellerman - 29/Sep/08 04:19 PM" comment the only one that matters?

Looking at this, regards RowOperation, don't you think the timestamp belong rather with the specification of what we're to get or delete or update? If we do this, it facilitates batching a bunch of operations against the one row but each with its own timestamp specification (and +1, the timestamp needs to be specifiable as a range with start and end for all Operations).

Is it true that we cannot do a mix of update/get/deletes on the one row all in the one operation (as was possible with old BatchUpdate). Looks like you'd do a get only, or an update only, or a delete only; you might batch them true but they'd run in series rather than all as part of the one row operation. Is this the case? (Its almost as though an Operation should 'have' Gets, Deletes, and Updates)

IIRC, doing the below was problematic:

public RowResult getRow(RowGet) could be changed to return SortedMap<byte[], Cell>

On constructors vs. setters, this is an age-old argument. Lets have one or the other, not both. If lots of arguments, that would seem to favor setters though invoking all the setters on a newly created object makes for ugly code – and possibly half-initialized objects – and I dislike the fact that setters makes our objects mutable.

This stuff is hard. I'm glad we've moved to diagramming. Better for working out ideas.

Other things to consider:

+ Scanner API needs to align.
+ Batching needs to get diagrammed too so we're sure we have it covered.

stack
added a comment - 30/Sep/08 19:13 Looking at this, regards RowOperation, don't you think the timestamp belong rather with the specification of what we're to get or delete or update? If we do this, it facilitates batching a bunch of operations against the one row but each with its own timestamp specification (and +1, the timestamp needs to be specifiable as a range with start and end for all Operations).
Is it true that we cannot do a mix of update/get/deletes on the one row all in the one operation (as was possible with old BatchUpdate). Looks like you'd do a get only, or an update only, or a delete only; you might batch them true but they'd run in series rather than all as part of the one row operation. Is this the case? (Its almost as though an Operation should 'have' Gets, Deletes, and Updates)
IIRC, doing the below was problematic:
public RowResult getRow(RowGet) could be changed to return SortedMap< byte [], Cell>
Sorry, I don't remember the detail.
-1 on 'Rename RowUpdate to RowMutation.' IMO, mutation is c++ speak whereas update is db speak.
On constructors vs. setters, this is an age-old argument. Lets have one or the other, not both. If lots of arguments, that would seem to favor setters though invoking all the setters on a newly created object makes for ugly code – and possibly half-initialized objects – and I dislike the fact that setters makes our objects mutable.
This stuff is hard. I'm glad we've moved to diagramming. Better for working out ideas.
Other things to consider:
+ Scanner API needs to align.
+ Batching needs to get diagrammed too so we're sure we have it covered.

Jim Kellerman
added a comment - 30/Sep/08 20:36 - edited This diagram essentially is my proposal in the interest of time, I did not redo scanners yet, nor did I address Stack's latest comments.
Note that I removed the deprecated methods from HTable so we could see what the eventual API looks like without all the other clutter.
I will address Stack's comments in a separate comment.

Looking at this, regards RowOperation, don't you think the timestamp belong rather with the specification of what we're
to get or delete or update? If we do this, it facilitates batching a bunch of operations against the one row but each with
its own timestamp specification (and +1, the timestamp needs to be specifiable as a range with start and end for all
Operations).

Neither the current API nor the 880 patch supports different timestamps for update, get(column) or getRow.
Are you proposing we add this?

Only gets and deletes have a startingTimestamp. Doesn't make sense for puts.

Is it true that we cannot do a mix of update/get/deletes on the one row all in the one operation (as was possible with
old BatchUpdate)

Neither trunk nor 880 patch permit that, although my earlier proposal did. The problem I found with that was in
returning results. As you pointed out in an earlier comment, get(column) should return Cell[] and getRow should
return a RowResult (or as I suggested a SortedMap<byte[], Cell>)

IIRC, doing the below was problematic:

public RowResult getRow(RowGet) could be changed to return SortedMap<byte[], Cell>

HBaseMapWritable implements SortedMap<byte[], Cell> and getRow does not need the row back. Is it possible
that at one point HBaseMapWritable implemented Map<byte[], Cell> and not SortedMap?

Jean-Daniel agrees with you and I don't really care what we call it. I used mutation because that is what
Bigtable calls it.

On constructors vs. setters, this is an age-old argument. Lets have one or the other, not both. If lots of arguments,
that would seem to favor setters though invoking all the setters on a newly created object makes for ugly code and
possibly half-initialized objects - and I dislike the fact that setters makes our objects mutable.

I agree. I used setters for a lot of stuff because it reduced the number of overloads by a lot, especially in derived
classes, which had to take all the possible arguments for their super classes.

With respect to half-initialized objects, I would imagine that the things that can be set would be initialized to
a default value that makes sense.

And I agree on the mutability issue, but I think the trade-off is worth it.

Jim Kellerman
added a comment - 30/Sep/08 21:22 - edited @Stack
Looking at this, regards RowOperation, don't you think the timestamp belong rather with the specification of what we're
to get or delete or update? If we do this, it facilitates batching a bunch of operations against the one row but each with
its own timestamp specification (and +1, the timestamp needs to be specifiable as a range with start and end for all
Operations).
Neither the current API nor the 880 patch supports different timestamps for update, get(column) or getRow.
Are you proposing we add this?
Only gets and deletes have a startingTimestamp. Doesn't make sense for puts.
Is it true that we cannot do a mix of update/get/deletes on the one row all in the one operation (as was possible with
old BatchUpdate)
Neither trunk nor 880 patch permit that, although my earlier proposal did. The problem I found with that was in
returning results. As you pointed out in an earlier comment, get(column) should return Cell[] and getRow should
return a RowResult (or as I suggested a SortedMap<byte[], Cell>)
IIRC, doing the below was problematic:
public RowResult getRow(RowGet) could be changed to return SortedMap<byte[], Cell>
HBaseMapWritable implements SortedMap<byte[], Cell> and getRow does not need the row back. Is it possible
that at one point HBaseMapWritable implemented Map<byte[], Cell> and not SortedMap?
-1 on 'Rename RowUpdate to RowMutation.' IMO, mutation is c++ speak whereas update is db speak.
Jean-Daniel agrees with you and I don't really care what we call it. I used mutation because that is what
Bigtable calls it.
On constructors vs. setters, this is an age-old argument. Lets have one or the other, not both. If lots of arguments,
that would seem to favor setters though invoking all the setters on a newly created object makes for ugly code and
possibly half-initialized objects - and I dislike the fact that setters makes our objects mutable.
I agree. I used setters for a lot of stuff because it reduced the number of overloads by a lot, especially in derived
classes, which had to take all the possible arguments for their super classes.
With respect to half-initialized objects, I would imagine that the things that can be set would be initialized to
a default value that makes sense.
And I agree on the mutability issue, but I think the trade-off is worth it.

Neither the current API nor the 880 patch supports different timestamps for update, get(column) or getRow. Are you proposing we add this?

Doğacan Güney did over in HBASE-899. There you were thinking it would not be too difficult to add? Seems reasonable to me: i.e. timestamp does not below with row but rather with column specification.

bg. Neither trunk nor 880 patch permit that...

I see now how my comment confused things by grouping get with put and delete. Pardon me. What I meant was that when updating, you should be able to mix put and delete operations on the one row and asking if this redesign allowed that. Seems like I can make RowMutation object and do deletes and puts against the one row (You can't do get and updates in the one operation. That 'normal').

Regards the proposal, if I want to get values for many columns – not all columns on a row but some subset – how do I do it? I use a RowGetOperation instead of a GetOperation? We should have better naming, don't you think? Should we drop the GetOperation and instead rename current RowGetOperation as GetOperation and use it everywhere? (If we do this, we won't be able to do hbase-899 as written?)

Note that I removed the deprecated methods from HTable so we could see what the eventual API looks like without all the other clutter.

stack
added a comment - 30/Sep/08 22:15 Neither the current API nor the 880 patch supports different timestamps for update, get(column) or getRow. Are you proposing we add this?
Doğacan Güney did over in HBASE-899 . There you were thinking it would not be too difficult to add? Seems reasonable to me: i.e. timestamp does not below with row but rather with column specification.
bg. Neither trunk nor 880 patch permit that...
I see now how my comment confused things by grouping get with put and delete. Pardon me. What I meant was that when updating, you should be able to mix put and delete operations on the one row and asking if this redesign allowed that. Seems like I can make RowMutation object and do deletes and puts against the one row (You can't do get and updates in the one operation. That 'normal').
Regards the proposal, if I want to get values for many columns – not all columns on a row but some subset – how do I do it? I use a RowGetOperation instead of a GetOperation? We should have better naming, don't you think? Should we drop the GetOperation and instead rename current RowGetOperation as GetOperation and use it everywhere? (If we do this, we won't be able to do hbase-899 as written?)
Note that I removed the deprecated methods from HTable so we could see what the eventual API looks like without all the other clutter.
+1. Helps with visualization.

Neither the current API nor the 880 patch supports different timestamps for update, get(column) or getRow. Are you proposing we add this?

Doğacan Güney did over in HBASE-899. There you were thinking it would not be too difficult to add? Seems reasonable to me: i.e. timestamp
does not below with row but rather with column specification.

It should be easy enough to add. Does it make sense for all of get/put/delete?

I see now how my comment confused things by grouping get with put and delete. Pardon me. What I meant was that when updating, you should
be able to mix put and delete operations on the one row and asking if this redesign allowed that. Seems like I can make RowMutation object and
do deletes and puts against the one row (You can't do get and updates in the one operation. That 'normal').

Yes, you can do both puts and deletes in one operation.

Regards the proposal, if I want to get values for many columns - not all columns on a row but some subset - how do I do it? I use a
RowGetOperation instead of a GetOperation? We should have better naming, don't you think? Should we drop the GetOperation
and instead rename current RowGetOperation as GetOperation and use it everywhere? (If we do this, we won't be able to do
hbase-899 as written?)

I kind of struggled with this one, and it ended up the way it did since you asked that get return Cell[] instead of RowResult. I would
prefer that we drop GetOperation and rename RowGetOperation to GetOperation, if you think that it's ok for a single get to return
SortedMap<byte[], Cell> instead of Cell[], it's an easy change and enables folding AbstractGetOperation into the new GetOperation
as well. I can certainly draw it up and we can see which we like better.

Jim Kellerman
added a comment - 30/Sep/08 22:37 @Stack
Neither the current API nor the 880 patch supports different timestamps for update, get(column) or getRow. Are you proposing we add this?
Doğacan Güney did over in HBASE-899 . There you were thinking it would not be too difficult to add? Seems reasonable to me: i.e. timestamp
does not below with row but rather with column specification.
It should be easy enough to add. Does it make sense for all of get/put/delete?
I see now how my comment confused things by grouping get with put and delete. Pardon me. What I meant was that when updating, you should
be able to mix put and delete operations on the one row and asking if this redesign allowed that. Seems like I can make RowMutation object and
do deletes and puts against the one row (You can't do get and updates in the one operation. That 'normal').
Yes, you can do both puts and deletes in one operation.
Regards the proposal, if I want to get values for many columns - not all columns on a row but some subset - how do I do it? I use a
RowGetOperation instead of a GetOperation? We should have better naming, don't you think? Should we drop the GetOperation
and instead rename current RowGetOperation as GetOperation and use it everywhere? (If we do this, we won't be able to do
hbase-899 as written?)
I kind of struggled with this one, and it ended up the way it did since you asked that get return Cell[] instead of RowResult. I would
prefer that we drop GetOperation and rename RowGetOperation to GetOperation, if you think that it's ok for a single get to return
SortedMap<byte[], Cell> instead of Cell[], it's an easy change and enables folding AbstractGetOperation into the new GetOperation
as well. I can certainly draw it up and we can see which we like better.

In the latest proposal, proposal2.jpg, note that DeleteOperation and GetOperation rely on static methods to properly
construct the objects, and that the constructors (except for the default constructor) are private. (in the superclasses,
they are protected).

By forcing clients to use these static methods, we ensure that the objects are constructed correctly. RowUpdate
has been greatly simplified by removing a lot of overloads.

In HTable, there are basically two operations: get and commit.

Scanners have still not been addressed yet.

This proposal, the latest exchanges between Stack and myself have been addressed.

Jim Kellerman
added a comment - 01/Oct/08 03:32 In the latest proposal, proposal2.jpg, note that DeleteOperation and GetOperation rely on static methods to properly
construct the objects, and that the constructors (except for the default constructor) are private. (in the superclasses,
they are protected).
By forcing clients to use these static methods, we ensure that the objects are constructed correctly. RowUpdate
has been greatly simplified by removing a lot of overloads.
In HTable, there are basically two operations: get and commit.
Scanners have still not been addressed yet.
This proposal, the latest exchanges between Stack and myself have been addressed.

I suppose commit should be named 'update' in HTable to go with names of the new classes?

RowUpdate should be called UpdateOperation to match GetOperation?

Then, I'd suggest that 'DeleteOperation' become 'Delete' and 'UpdateOperation' become 'Update' since they inherit form 'AbstractUpdate' rather than from 'RowOperation'. Will keep confusion down.

Would suggest removing all of the column/value methods from RowOperation derivatives (Should there be a Get to go with the Delete and Update above)?

Unrelated, isTableEnabled should be in HBaseAdmin rather than here in HTable?

These diagrams have set me thinking we can make things even more straight-forward if we break out a new class in which we describe the area – or 'sweep'/'reach'/'scope' – an operation is to effect. Let me post a little drawing of what I'm thinking.

stack
added a comment - 01/Oct/08 19:14 Big improvement I'd say:
I suppose commit should be named 'update' in HTable to go with names of the new classes?
RowUpdate should be called UpdateOperation to match GetOperation?
Then, I'd suggest that 'DeleteOperation' become 'Delete' and 'UpdateOperation' become 'Update' since they inherit form 'AbstractUpdate' rather than from 'RowOperation'. Will keep confusion down.
Would suggest removing all of the column/value methods from RowOperation derivatives (Should there be a Get to go with the Delete and Update above)?
Unrelated, isTableEnabled should be in HBaseAdmin rather than here in HTable?
These diagrams have set me thinking we can make things even more straight-forward if we break out a new class in which we describe the area – or 'sweep'/'reach'/'scope' – an operation is to effect. Let me post a little drawing of what I'm thinking.

Sketch of proposal derived from previous work that adds new class to encapsulate specification of table area or 'range' the Get or Delete operation applies to. Does not include HTable. HTable would have three new methods rather than two: get, delete, and put (Proposal suggests that we treat puts and deletes against a row distinctly to avoid conflict and in the name of simplification).

stack
added a comment - 01/Oct/08 21:38 Sketch of proposal derived from previous work that adds new class to encapsulate specification of table area or 'range' the Get or Delete operation applies to. Does not include HTable. HTable would have three new methods rather than two: get, delete, and put (Proposal suggests that we treat puts and deletes against a row distinctly to avoid conflict and in the name of simplification).

Michael Stack's proposal looks excellent (+1 from me) just one minor thing: Instead of having a List<Scope> (and Scope including columns) maybe just pass Get a Map of <column, Scope> pairs. I guess, it is possible that you build up scopes on different methods so one may accidentally add two scopes for one column. With a map, it may be easier to keep track of....

Doğacan Güney
added a comment - 02/Oct/08 10:27 Michael Stack's proposal looks excellent (+1 from me) just one minor thing: Instead of having a List<Scope> (and Scope including columns) maybe just pass Get a Map of <column, Scope> pairs. I guess, it is possible that you build up scopes on different methods so one may accidentally add two scopes for one column. With a map, it may be easier to keep track of....

Thinking on it, the 'Doğacan Güney - 02/Oct/08 02:27 AM' suggestion looks like an improvement. We'd just move column/family specification out of Scope and have it supplied instead as Map key (Map should probably be Sorted? An HbaseMapWritable?). Here is one reason why we might NOT do this:

Clients might want to specify multiple scopes against a single column: Imagine an application that adds hundreds of updates to a column each day. Client then wants to query for every entry made at 12:00 over the last week or every hour over last day.

A 'workaround' would be to batch a set of Gets with an entry for every update wanted.

Another objection I was going to make but having thought about it, I've not raised it since it verges on the 'silly' relates to the Scanner API. Getting a scanner using a Scope that does not include column/family name would make it impossible specifying a scanner that took multiple columns and for each its own column Scope. Do we want to support this? If not, the new API would allow a single Scope across all columns.

On the one hand the Doğacan suggestion makes Get/Delete look like Put in that they too take maps keyed by columns. Thats good.

On other hand, we limit the perverse things a Client might want to do all in the one row Get/Delete context.

I'm +1 on the Doğacan suggestion because it reduces complexity (I do not want to be in a position where we are debugging a scan across 100 columns each with 100 Scopes and a user 'thinks' its not doing the right thing).

stack
added a comment - 02/Oct/08 19:11 Thinking on it, the 'Doğacan Güney - 02/Oct/08 02:27 AM' suggestion looks like an improvement. We'd just move column/family specification out of Scope and have it supplied instead as Map key (Map should probably be Sorted? An HbaseMapWritable?). Here is one reason why we might NOT do this:
Clients might want to specify multiple scopes against a single column: Imagine an application that adds hundreds of updates to a column each day. Client then wants to query for every entry made at 12:00 over the last week or every hour over last day.
A 'workaround' would be to batch a set of Gets with an entry for every update wanted.
Another objection I was going to make but having thought about it, I've not raised it since it verges on the 'silly' relates to the Scanner API. Getting a scanner using a Scope that does not include column/family name would make it impossible specifying a scanner that took multiple columns and for each its own column Scope. Do we want to support this? If not, the new API would allow a single Scope across all columns.
On the one hand the Doğacan suggestion makes Get/Delete look like Put in that they too take maps keyed by columns. Thats good.
On other hand, we limit the perverse things a Client might want to do all in the one row Get/Delete context.
I'm +1 on the Doğacan suggestion because it reduces complexity (I do not want to be in a position where we are debugging a scan across 100 columns each with 100 Scopes and a user 'thinks' its not doing the right thing).

Should RowOperation or Operation be an Interface since as an abstract class it provides near zero functionality? If so, Interface would have a getRow and getRockLock and that'd be it? If the Doğacan suggestion so Get and Delete take Maps, maybe Get/Delete/Put implement a (read-only) SortedMap delegating invocations through to the passed HbaseMapWritable instance – so folks can interrogate to see whats in them using Map methods?

stack
added a comment - 02/Oct/08 19:16 Should RowOperation or Operation be an Interface since as an abstract class it provides near zero functionality? If so, Interface would have a getRow and getRockLock and that'd be it? If the Doğacan suggestion so Get and Delete take Maps, maybe Get/Delete/Put implement a (read-only) SortedMap delegating invocations through to the passed HbaseMapWritable instance – so folks can interrogate to see whats in them using Map methods?

Jim Kellerman
added a comment - 02/Oct/08 19:25 If Operation is an Interface, where are you going to store the row key and row lock? In Get/Put/Delete?
I hate duplication of code in general, but getting rid of data members and methods that are accessed virtually in the
current model might perform better.

Another objection I was going to make but having thought about it, I've not raised it since it verges on the 'silly' relates to the Scanner API. Getting a scanner using a Scope that does not include column/family name would make it impossible specifying a scanner that took multiple columns and for each its own column Scope. Do we want to support this? If not, the new API would allow a single Scope across all columns.

I am not sure I follow you here. Scanners can just have two constructors. One would just get a scope (meaning that this scope is valid for all columns) and the other would get (again) a Map of column, scope pairs. Or is there something I am missing here?

Doğacan Güney
added a comment - 02/Oct/08 20:25 Another objection I was going to make but having thought about it, I've not raised it since it verges on the 'silly' relates to the Scanner API. Getting a scanner using a Scope that does not include column/family name would make it impossible specifying a scanner that took multiple columns and for each its own column Scope. Do we want to support this? If not, the new API would allow a single Scope across all columns.
I am not sure I follow you here. Scanners can just have two constructors. One would just get a scope (meaning that this scope is valid for all columns) and the other would get (again) a Map of column, scope pairs. Or is there something I am missing here?

No. I was thinking about this wrong fixated on current HTable getScanners API. If I adopt your suggestion, the HTable API would change so column names would be specified in this new Map of columns to Scopes.

One thought is that we should probably check for case where Map includes a Scope for a column and for its enclosing family and throw an exception if a Client attempts such a specification?

stack
added a comment - 02/Oct/08 20:47 No. I was thinking about this wrong fixated on current HTable getScanners API. If I adopt your suggestion, the HTable API would change so column names would be specified in this new Map of columns to Scopes.
One thought is that we should probably check for case where Map includes a Scope for a column and for its enclosing family and throw an exception if a Client attempts such a specification?

My opinion is that when upgrading to 0.19.0 people will find it strange that when using a simple get() for a single cell they would now have to handle a SortedMap

Also, a SortedMap is not as friendly as something called "RowResult". IMO the 0.1 API was hard to use, 0.2 was way better and this would be like getting back to 0.1. This gets uglier when we will be batching gets (returning a SortedMap of SortedMaps?)

Regards HRS.get(), should we drop it since we merge get and getRow?

How do we get a whole row a latest timestamp? Stack wrote that we should pass a null list of Scopes but with the map thing it doesn't make much sense... Or do we fix it server-side when we see no scope?

Jean-Daniel Cryans
added a comment - 07/Oct/08 19:17 I have some issues with current design :
My opinion is that when upgrading to 0.19.0 people will find it strange that when using a simple get() for a single cell they would now have to handle a SortedMap
Also, a SortedMap is not as friendly as something called "RowResult". IMO the 0.1 API was hard to use, 0.2 was way better and this would be like getting back to 0.1. This gets uglier when we will be batching gets (returning a SortedMap of SortedMaps?)
Regards HRS.get(), should we drop it since we merge get and getRow?
How do we get a whole row a latest timestamp? Stack wrote that we should pass a null list of Scopes but with the map thing it doesn't make much sense... Or do we fix it server-side when we see no scope?

I'm really not happy where this is going... What would normally take a single line now requires many object instantiations and I have to manage the single Cell get all over the place. The patch I attached is not complete and should be used to see current proposal usage.

Jean-Daniel Cryans
added a comment - 07/Oct/08 20:45 I'm really not happy where this is going... What would normally take a single line now requires many object instantiations and I have to manage the single Cell get all over the place. The patch I attached is not complete and should be used to see current proposal usage.

I'm really not happy where this is going... What would normally take a single line now requires many object instantiations and I have to manage the single Cell get all over the place. The patch I attached is not complete and should be used to see current proposal usage.

Well maybe a way of dealing with it can be hiding scopes and HbaseMapWritable-s from users as much as possible. Something like:

Get get = new Get(row);
get.setTimestamp(col1, timestamp1);
get.setTimestamp(col2, timestamp2);
get.setVersions(col2, numVersions);
......
get.setAllTimestamps(timestamp); // for all columns
......
// maybe also add a constructor overload similar to current API
Get get = new Get(row, cols /* array of byte arrays */, timestamp);

Doğacan Güney
added a comment - 07/Oct/08 23:01 I'm really not happy where this is going... What would normally take a single line now requires many object instantiations and I have to manage the single Cell get all over the place. The patch I attached is not complete and should be used to see current proposal usage.
Well maybe a way of dealing with it can be hiding scopes and HbaseMapWritable-s from users as much as possible. Something like:
Get get = new Get(row);
get.setTimestamp(col1, timestamp1);
get.setTimestamp(col2, timestamp2);
get.setVersions(col2, numVersions);
......
get.setAllTimestamps(timestamp); // for all columns
......
// maybe also add a constructor overload similar to current API
Get get = new Get(row, cols /* array of byte arrays */, timestamp);

Jean-Daniel Cryans
added a comment - 08/Oct/08 14:23 Dogacan, what you propose is in many ways like the first proposals.
I'm beginning to think that maybe we should push this issue in 0.20.0 and finish HBASE-748 like it is currently (using BatchUpdate). Is Hadoop's 0.19.0 soon to be released?

Also, a SortedMap is not as friendly as something called "RowResult". IMO the 0.1 API was hard to use, 0.2 was way better and this would be like getting back to 0.1. This gets uglier when we will be batching gets (returning a SortedMap of SortedMaps?)

RowResult IsA SortedMap. For gets, it is a more appropriate return value if gets are restricted to a single row.
For multi-row gets, a List<RowResult> would be more user friendly.

Regards HRS.get(), should we drop it since we merge get and getRow?

We cannot drop this until we remove the old API, one release after we deprecate it in favor of the new API.

Jim Kellerman
added a comment - 08/Oct/08 18:01 @Jean-Daniel
Also, a SortedMap is not as friendly as something called "RowResult". IMO the 0.1 API was hard to use, 0.2 was way better and this would be like getting back to 0.1. This gets uglier when we will be batching gets (returning a SortedMap of SortedMaps?)
RowResult IsA SortedMap. For gets, it is a more appropriate return value if gets are restricted to a single row.
For multi-row gets, a List<RowResult> would be more user friendly.
Regards HRS.get(), should we drop it since we merge get and getRow?
We cannot drop this until we remove the old API, one release after we deprecate it in favor of the new API.

I took at the new patch (thanks for exercising the proposal in code J-D). Here's some suggestions. In general, would suggest that HbaseMapWritable with all of its generics not be exposed to the user as Dogacan suggests, except we don't do getter/setters by the million on Get as he suggests because IMO that defeats point of introducing Scope object. HBW should be wrapped.

Is Specification a better name than Scope? The HbaseMapWritable making code would be replaced by new wrapper class called Specifications where Specifications is a map of column names to Specifications (or Scope and Scopes) so that for example:

[13:10] <dogacan> st^ack: maybe I am just being thick, but I don't understand why exposing Specification or Scope is useful than just doing get.setTimestamp(col, timestamp) ?
[13:22] <st^ack> I thought point of latest proposal was extraction of the Specification/Scope object.
[13:23] <st^ack> When you add the getters/setters back to Get and Delete (but not to Put because it doesn't have Specification/Scope), then the advantage of the last proposal goes away?
[13:25] <st^ack> If the setting is done against Specification, then Get/Delete/Put can be immutable and all treated the same.
[13:51] <dogacan> hmm
[13:51] <dogacan> I see
[13:52] <dogacan> i forgot about trying to make get/delete/put as similar as possible :)
[13:52] <dogacan> let me think about it, thanks

stack
added a comment - 08/Oct/08 22:16 From IRC with Doğacan:
[13:10] <dogacan> st^ack: maybe I am just being thick, but I don't understand why exposing Specification or Scope is useful than just doing get.setTimestamp(col, timestamp) ?
[13:22] <st^ack> I thought point of latest proposal was extraction of the Specification/Scope object.
[13:23] <st^ack> When you add the getters/setters back to Get and Delete (but not to Put because it doesn't have Specification/Scope), then the advantage of the last proposal goes away?
[13:25] <st^ack> If the setting is done against Specification, then Get/Delete/Put can be immutable and all treated the same.
[13:51] <dogacan> hmm
[13:51] <dogacan> I see
[13:52] <dogacan> i forgot about trying to make get/delete/put as similar as possible :)
[13:52] <dogacan> let me think about it, thanks

I think it looks good, and that it is going the right way.
I do have one request though, would be nice to have a put constructor
that takes the return format from the get as an argument, so you don't
have to change them yourself.

Erik Holstad
added a comment - 20/Oct/08 19:18 - edited I think it looks good, and that it is going the right way.
I do have one request though, would be nice to have a put constructor
that takes the return format from the get as an argument, so you don't
have to change them yourself.
Erik

New proposal. All Row Operations subclass HbaseMapWritable via new intermediary classes, ColumnCellMap and ColumnCellBoundsMap. New CellBounds class is used to describe the boundary around a set of Cells effected by a Get or Delete. Lock has been moved out of Get, Delete and Put and instead made a parameter on commit (Could move lock back in if wanted).

stack
added a comment - 25/Nov/08 21:26 New proposal. All Row Operations subclass HbaseMapWritable via new intermediary classes, ColumnCellMap and ColumnCellBoundsMap. New CellBounds class is used to describe the boundary around a set of Cells effected by a Get or Delete. Lock has been moved out of Get, Delete and Put and instead made a parameter on commit (Could move lock back in if wanted).
Other notables:
Get g = new Get(row);
g.put(column, new CellBounds(timestamp, numVersions));
// or
CellBounds boundary = new CellBounds(ts);
for (column: columns) {
g.put(column, boundary);
}
// then commit.
Collection<Cell> cells = getRow(g).values();
// Delete works same as above.
// For Put.
Put p = new Put(row);
for (column: columns) {
p.put(column, new Cell(somevalue));
}
table.commit(p);
RowResult could subclass Put if wanted (or vice-versa). Same for Get and Delete.
Scanners need to be rewired to use CellBounds. Haven't done that yet.

stack
added a comment - 25/Nov/08 21:46 Here is a patch that does not compile with instances of the diagrammed classes. Some parts of HTable use the new code. I'm posting it to help with evaluation of this proposal.

ColumnCellMap and ColumnCellBoundsMap should be abstract so that someone doesn't instantiate one and try to have
HBase figure out what the operation is.

A row lock cannot apply to all rows. If they did, when batching, you'd have to chase down every row in order to take a lock out
on it. row locks apply to a single row only. This means that either Get, Put and Delete must either overload the constructors
(-1 on that) or implement a setter which allows the client application to take out a lock on a row and then apply it to multiple
operations.

Could ColumnCellBoundsMap inherit from ColumnCellMap?

RowResult is not a RowOperation, but it is a HBaseMapWritable.

Otherwise, +1 once I got my head around what the UML was saying, I think that it will make for an elegant API.

Jim Kellerman
added a comment - 01/Dec/08 23:09 A couple of comments:
ColumnCellMap and ColumnCellBoundsMap should be abstract so that someone doesn't instantiate one and try to have
HBase figure out what the operation is.
A row lock cannot apply to all rows. If they did, when batching, you'd have to chase down every row in order to take a lock out
on it. row locks apply to a single row only. This means that either Get, Put and Delete must either overload the constructors
(-1 on that) or implement a setter which allows the client application to take out a lock on a row and then apply it to multiple
operations.
Could ColumnCellBoundsMap inherit from ColumnCellMap?
RowResult is not a RowOperation, but it is a HBaseMapWritable.
Otherwise, +1 once I got my head around what the UML was saying, I think that it will make for an elegant API.

Will add back in row locks to operations. Will look at doing as a get/set but would prefer passing in constructor since then the get/puts are just column-orientated rather than column-orientated AND rowlocks.

CCBM could inherit from CCM but would gain little that I see and could confuse. Will pass on that for now.

stack
added a comment - 01/Dec/08 23:29 Thanks J-D and Jim for feedback.
Will add back in row locks to operations. Will look at doing as a get/set but would prefer passing in constructor since then the get/puts are just column-orientated rather than column-orientated AND rowlocks.
CCBM could inherit from CCM but would gain little that I see and could confuse. Will pass on that for now.
Otherwise OK on other comments.

Some progress. Made RowOperations comparable so can see if they are of same Row. Can't use RowResult because need a Map of column to 'Cells' rather than column to 'Cell' as is RowResult so we can return multiple versions of a cell in the one fetch. Prune Iterable from Cell (internally could do more than one version of itself – a weird facility we no longer need it seems).

stack
added a comment - 02/Dec/08 07:04 Some progress. Made RowOperations comparable so can see if they are of same Row. Can't use RowResult because need a Map of column to 'Cells' rather than column to 'Cell' as is RowResult so we can return multiple versions of a cell in the one fetch. Prune Iterable from Cell (internally could do more than one version of itself – a weird facility we no longer need it seems).

Jim Kellerman
added a comment - 02/Dec/08 16:12 The reason behind making Cell Iterable was to support multiple versions.
That's why we have:
byte [][] values
long [] timestamps
so we didn't have to have a Map<byte[], Map<byte[], Cell>>

Thinking on this more (which I should have done before making the
previous comment), it would be cleaner to remove the Iterable and multi-
value multi-timestamp stuff from Cell (which was done so that the
multi-version stuff could work within the confines of the current API).

Instead, my suggestion would be to make RowResult a Map<byte[], Cell[]>
This means that a RowResult is not a HBaseMapWritable.

Also Get and Delete should HaveA CellBoundsMap instead of inheriting
the HaveA relation from ColumnCellBoundsMap.

Since only Get returns a RowResult, how about having two methods instead
of running them all through commit:

Jim Kellerman
added a comment - 02/Dec/08 16:48 Thinking on this more (which I should have done before making the
previous comment), it would be cleaner to remove the Iterable and multi-
value multi-timestamp stuff from Cell (which was done so that the
multi-version stuff could work within the confines of the current API).
Instead, my suggestion would be to make RowResult a Map<byte[], Cell[]>
This means that a RowResult is not a HBaseMapWritable.
Also Get and Delete should HaveA CellBoundsMap instead of inheriting
the HaveA relation from ColumnCellBoundsMap.
Since only Get returns a RowResult, how about having two methods instead
of running them all through commit:
public RowResult get(Get)
public void commit(UpdateOperation)
where an update operation is either a Put or a Delete?

stack
added a comment - 02/Dec/08 17:52 Patch already purged Cell of any notion of Cells.
I've made a new Cells class. Can't change RowResult because it'll break current API. Made new Result. It has Cells.
Why should Get and Delete haveA CBM instead of inheriting? It makes things messier.
Let me play with your last suggestion. Looks good.

Moving to 0.20.0. Too much work to complete in time for 0.19.0 release. Estimate 2 weeks of work, 10d.

20:19 < St^Ack> ...new API implies new functionality -- don't know how to have it fail gracefully without it
20:19 < St^Ack> If we're to add in deprecated, then things like shell and MR need to be moved to new stuff
20:20 < St^Ack> Need to handle multiple Cells rather than the presumed one everywhere. Need to do unit tests fornew API.
20:20 < St^Ack> Need to retrofit gets, deletes, puts, then their family versions, all of a row... then do same for scanners.
20:20 < St^Ack> Make sure works with filters
...
20:21 < St^Ack> Then there is thrift and REST; could punt on these I suppose leaving them with deprecated API

stack
added a comment - 02/Dec/08 22:45 Moving to 0.20.0. Too much work to complete in time for 0.19.0 release. Estimate 2 weeks of work, 10d.
20:19 < St^Ack> ... new API implies new functionality -- don't know how to have it fail gracefully without it
20:19 < St^Ack> If we're to add in deprecated, then things like shell and MR need to be moved to new stuff
20:20 < St^Ack> Need to handle multiple Cells rather than the presumed one everywhere. Need to do unit tests for new API.
20:20 < St^Ack> Need to retrofit gets, deletes, puts, then their family versions, all of a row... then do same for scanners.
20:20 < St^Ack> Make sure works with filters
...
20:21 < St^Ack> Then there is thrift and REST; could punt on these I suppose leaving them with deprecated API

We have a use case where we for one column family have a lot of columns like 10000. They are time sorted on the column name
where the timestamp is one part of the column qualifier, so that you get the latest actions first. So if we are using getRow() to get
a row but are only planning to use let's say the first 100 columns in the family we still need to fetch the whole row from the server
and then take care of the "filtering" client side. Would be useful for us to have something like a getRow() with a filter option so
that the filtering can take place server side instead, to speed things up, maybe something like what exists for scanners a the moment.

Erik Holstad
added a comment - 31/Jan/09 02:21 We have a use case where we for one column family have a lot of columns like 10000. They are time sorted on the column name
where the timestamp is one part of the column qualifier, so that you get the latest actions first. So if we are using getRow() to get
a row but are only planning to use let's say the first 100 columns in the family we still need to fetch the whole row from the server
and then take care of the "filtering" client side. Would be useful for us to have something like a getRow() with a filter option so
that the filtering can take place server side instead, to speed things up, maybe something like what exists for scanners a the moment.

I think there are two issues with your use case, right? The first is you need some filters that do not currently exist that provide the topmost column filtering behavior that you want. The second issue is the modification to getRow() so it behaves kind of like a column scanner. If there are suitable filters available would it would be possible to use the scanner interface with a combination filter such as:

Andrew Purtell
added a comment - 31/Jan/09 02:31 Hi Erik,
I think there are two issues with your use case, right? The first is you need some filters that do not currently exist that provide the topmost column filtering behavior that you want. The second issue is the modification to getRow() so it behaves kind of like a column scanner. If there are suitable filters available would it would be possible to use the scanner interface with a combination filter such as:
(and
(allow-n-rows 1)
(and
(match-column "foo:" )
(allow-n-columns 100)
)
)
to do what you want?

Hi Andrew!
Yes there are 2 issues like you said and we, me and Jonathan, looked at the code for the filters in the scanner
and it seems like the filter that we need could be constructed by using a state in that filter, for getRow(). If the same
filter is to be used for the scanner the last row has to be kept in the filter too, so that it is only checked ones.
But it seems like filter issues can be solved on our side as long as we have the getRow() that takes a filter.

Erik Holstad
added a comment - 31/Jan/09 19:11 Hi Andrew!
Yes there are 2 issues like you said and we, me and Jonathan, looked at the code for the filters in the scanner
and it seems like the filter that we need could be constructed by using a state in that filter, for getRow(). If the same
filter is to be used for the scanner the last row has to be kept in the filter too, so that it is only checked ones.
But it seems like filter issues can be solved on our side as long as we have the getRow() that takes a filter.

Erik Holstad
added a comment - 12/Mar/09 17:11 Made some small changes to make it more clear what we are going for. The biggest change is the regrouping of the getQueries to better reflect the way they are going to be implemented.

The changes proposed for 880 might seem big or even radical to some people
But it is basically 2 big thing that we want to change with this new Api.

1.The introduction of a new Family class. This will help the user understand
that there is a difference when asking for more columns in the same family and
introducing a new family to search. This is important to help the user to
realize that families are stored together and that is therefore more effective
to ask for columns from the same family than from different.

2. Dividing the get calls into different categories depending on how they will
be executed. This change has 2 benefits, one is the same as in the first change,
helping the user to understand what queries that are treated the same and which
queries that are effective and which ones that are not. The second benefit is
for the people that will manage and write the code. Since the queries are split
into groups that are to be implemented in very similar ways it is going to be
easier to make optimizations and reason about the code for each query group,
which will lead to better and faster code.

Right now we propose 4 different get groups. They are grouped together depending
on their early out possibilities, so that each group can early out in the same
way. this is very convenient when wanting to add new get calls, they can be put
into any of the existing groups if the match the early out pattern or a new
group can be created.

The current groups are:
GetColumns, early out as soon as all the columns are found

GetFamilies, can never be earlied out since you don't know how many columns
there are in a family.

GetRange, can be earlied out as soon as the storefile with ts< than the one
asked for is finished

Erik Holstad
added a comment - 12/Mar/09 17:18 The changes proposed for 880 might seem big or even radical to some people
But it is basically 2 big thing that we want to change with this new Api.
1.The introduction of a new Family class. This will help the user understand
that there is a difference when asking for more columns in the same family and
introducing a new family to search. This is important to help the user to
realize that families are stored together and that is therefore more effective
to ask for columns from the same family than from different.
2. Dividing the get calls into different categories depending on how they will
be executed. This change has 2 benefits, one is the same as in the first change,
helping the user to understand what queries that are treated the same and which
queries that are effective and which ones that are not. The second benefit is
for the people that will manage and write the code. Since the queries are split
into groups that are to be implemented in very similar ways it is going to be
easier to make optimizations and reason about the code for each query group,
which will lead to better and faster code.
Right now we propose 4 different get groups. They are grouped together depending
on their early out possibilities, so that each group can early out in the same
way. this is very convenient when wanting to add new get calls, they can be put
into any of the existing groups if the match the early out pattern or a new
group can be created.
The current groups are:
GetColumns, early out as soon as all the columns are found
GetFamilies, can never be earlied out since you don't know how many columns
there are in a family.
GetRange, can be earlied out as soon as the storefile with ts< than the one
asked for is finished
GetTop, can be earlied out as soon as the maxNr is reached

Jonathan Gray
added a comment - 16/Mar/09 03:05 We're trying to build at least some sense of consensus here before moving forward full steam.
ken gave a semi-blessing... stack is cool as long as we explain it, code it and test it... so with jd's semi-blessing, i'd say we're ready to move towards a patch.
A complete rework of HRegion is currently underway as part of HBASE-1234 , so the server-side part of this will have to remain psuedo-code until the switch to KeyValue is complete.
We have a good bit of the client side done, will get a first patch together this week.

I did an exercise where the verb used submitting Get, Delete, etc., was same for all but I only ever got unsatisfactory results so lets use method names of get, put, delete, and scan. They each return different things anyways. Might have to have batchPut, batchGet, and batchDelete, too, because returns may vary if for example we are going to return timestamps of when the Put happened.

Should we even have a Get? Just always Scan even if one row only? Just a thought.

Should we have a TimeRange? Its only used in Get. Just have a setTimestamp and then optional range or depth to the Get? (setTimestampRange?)

Should deleteColumn in Delete be just delete? Should we have a deleteFamily? That'd leave deleteColumns. Do we need this? Answer to below will help:

How do deletes work? (I've seen the PDFs and its not plain to me).

If I do Delete.deleteColumn w/o a timestamp, whats this do? Does it delete most recent cell on the column only? Or everything behind it in time? (This would be like a deleteColumns). If former, how does it work? First it must find the most recent so it can find the timestamp?

If I pass a timestamp, what does it delete? Only the cell at that timestamp if it exists? What if nothing exists there? Nothing happens?

stack
added a comment - 29/Apr/09 06:53 More comments on https://issues.apache.org/jira/secure/attachment/12406720/HBASE-880_Design_Doc_v3.pdf
In Get, do we need addFamily and addColumn? Should it be just add?
I did an exercise where the verb used submitting Get, Delete, etc., was same for all but I only ever got unsatisfactory results so lets use method names of get, put, delete, and scan. They each return different things anyways. Might have to have batchPut, batchGet, and batchDelete, too, because returns may vary if for example we are going to return timestamps of when the Put happened.
Should we even have a Get? Just always Scan even if one row only? Just a thought.
Should we have a TimeRange? Its only used in Get. Just have a setTimestamp and then optional range or depth to the Get? (setTimestampRange?)
Should deleteColumn in Delete be just delete? Should we have a deleteFamily? That'd leave deleteColumns. Do we need this? Answer to below will help:
How do deletes work? (I've seen the PDFs and its not plain to me).
If I do Delete.deleteColumn w/o a timestamp, whats this do? Does it delete most recent cell on the column only? Or everything behind it in time? (This would be like a deleteColumns). If former, how does it work? First it must find the most recent so it can find the timestamp?
If I pass a timestamp, what does it delete? Only the cell at that timestamp if it exists? What if nothing exists there? Nothing happens?
(Above is an old discussion rehashed)

@Stack
+ I think we should have a Get, haven't looked too closely at the scanner code, but I assume that there is a scanner setup cost that we don't want to pay for random reads.

+ When it comes to TimeRange I don't really care if we have it or not, we might as well have an extra time in the Get Object.

+ The way I look at deletes is from the starting point that we added these delete types, a way of making deletes as easy as possible. And the reason we have to types is so we can have as much information in one single KeyValue as possible. And since they are a part of a KeyValue it leaves us with one timestamp for each type. So the way we have reasoned about it is that all the deletes except delete removes all the values that match after the specified timestamp, if no timestamp is sepcified that is set to now. For the simple delete you specify the timestamp you want to delete and it apllies to that timestamp only.
I think we should keep all the delete types that we currently have since it makes it easy for the client to delete things, even though deleteRow, which turns into a deleteFamily on the server side, and deleteFamily itself needs to be taken care in a special way, since they don't sort like the other entries.

Erik Holstad
added a comment - 29/Apr/09 08:33 @Stack
+ I think we should have a Get, haven't looked too closely at the scanner code, but I assume that there is a scanner setup cost that we don't want to pay for random reads.
+ When it comes to TimeRange I don't really care if we have it or not, we might as well have an extra time in the Get Object.
+ The way I look at deletes is from the starting point that we added these delete types, a way of making deletes as easy as possible. And the reason we have to types is so we can have as much information in one single KeyValue as possible. And since they are a part of a KeyValue it leaves us with one timestamp for each type. So the way we have reasoned about it is that all the deletes except delete removes all the values that match after the specified timestamp, if no timestamp is sepcified that is set to now. For the simple delete you specify the timestamp you want to delete and it apllies to that timestamp only.
I think we should keep all the delete types that we currently have since it makes it easy for the client to delete things, even though deleteRow, which turns into a deleteFamily on the server side, and deleteFamily itself needs to be taken care in a special way, since they don't sort like the other entries.

On Deletes, all is good to me except, seems like a 'delete' at now will have no effect, right? You have to know the timestamp to delete. Maybe we should handle this case special – go find the latest and add the delete with that timestamp?

Other thoughts, we can't have HTable.get that takes a Get object because there already is a get in HTable. I think we need to start up a new class to hold all of the new stuff. What shall we call it? HTable2? We'll deprecate HTable in 0.20.0.

stack
added a comment - 29/Apr/09 17:12 @Holstad
Ok on the Get ... was just an idea.
I'd say lets drop TimeRange.
On Deletes, all is good to me except, seems like a 'delete' at now will have no effect, right? You have to know the timestamp to delete. Maybe we should handle this case special – go find the latest and add the delete with that timestamp?
Other thoughts, we can't have HTable.get that takes a Get object because there already is a get in HTable. I think we need to start up a new class to hold all of the new stuff. What shall we call it? HTable2? We'll deprecate HTable in 0.20.0.

Yes, if you do a delete with a timestamp that doesn't exist in HBase it will have no effect at all.
I don't really like to go look for things to delete, cause that means that you have to do a get to do a delete and to me that kind of defeats the purpose of having the new storage format.
In the new implementation of memcache if you do a delete with a specified timestamp and that put is in there, I was thinking that you would remove the put and the delete since the delete has already been used and doesn't do any good any more. So what I think is better if want to support deletion of the latest version without specifying the timestamp is to lift this logic into the deleteCheck of the server. Maybe that is what you meant?

Erik Holstad
added a comment - 29/Apr/09 17:33 @Stack
+1 on dropping TimeRange.
Yes, if you do a delete with a timestamp that doesn't exist in HBase it will have no effect at all.
I don't really like to go look for things to delete, cause that means that you have to do a get to do a delete and to me that kind of defeats the purpose of having the new storage format.
In the new implementation of memcache if you do a delete with a specified timestamp and that put is in there, I was thinking that you would remove the put and the delete since the delete has already been used and doesn't do any good any more. So what I think is better if want to support deletion of the latest version without specifying the timestamp is to lift this logic into the deleteCheck of the server. Maybe that is what you meant?

So you are saying if we do a delete at now, and there is no entry in the memcache, we just don't add it?

Our change in how delete works where we move away from first getting all the entries the delete covers so we can write individual deletes per existing cell will require a migration of all data. I'm not opposed to that. Just noting that it will be required (The way we do deletes currently just doesn't scale – has to change so I'm w/ you Holstad when you say "I don't really like to go look for things to delete...")

stack
added a comment - 29/Apr/09 17:40 So you are saying if we do a delete at now, and there is no entry in the memcache, we just don't add it?
Our change in how delete works where we move away from first getting all the entries the delete covers so we can write individual deletes per existing cell will require a migration of all data. I'm not opposed to that. Just noting that it will be required (The way we do deletes currently just doesn't scale – has to change so I'm w/ you Holstad when you say "I don't really like to go look for things to delete...")

.bq . Might have to have batchPut, batchGet, and batchDelete, too, because returns may vary if for example we are going to return timestamps of when the Put happened.

I said the above. Turns out my understanding regards overloading in java is incorrect – you can have different returns as long as the signature varies. This means we do not need the batch versions I talk of above.

stack
added a comment - 29/Apr/09 18:00 .bq . Might have to have batchPut, batchGet, and batchDelete, too, because returns may vary if for example we are going to return timestamps of when the Put happened.
I said the above. Turns out my understanding regards overloading in java is incorrect – you can have different returns as long as the signature varies. This means we do not need the batch versions I talk of above.

No I'm not saying that I'm just saying that if the put entry for that matches that delete is in memcache we don't put the delete in there, if the put is not in memcache we put the delete in there for use in earlier storefiles. But what I' trying to say is that I think that it is ok to add some extra special code for the case of a delete with no timestamp in the server code since deletes are already handled in a special way.

Erik Holstad
added a comment - 29/Apr/09 18:03 No I'm not saying that I'm just saying that if the put entry for that matches that delete is in memcache we don't put the delete in there, if the put is not in memcache we put the delete in there for use in earlier storefiles. But what I' trying to say is that I think that it is ok to add some extra special code for the case of a delete with no timestamp in the server code since deletes are already handled in a special way.

@Stack
I have a question about the RESULT format, why is is that you want for it to implement List<KeyValue>?
The way I see it it would be easier to just pass it the KeyValue[] that we get from the server and put that
into the Constructor for RESULTS, like Result result = new Result(KeyValue[] kv).
What to you think about that?

Erik Holstad
added a comment - 01/May/09 01:49 @Stack
I have a question about the RESULT format, why is is that you want for it to implement List<KeyValue>?
The way I see it it would be easier to just pass it the KeyValue[] that we get from the server and put that
into the Constructor for RESULTS, like Result result = new Result(KeyValue[] kv).
What to you think about that?

stack
added a comment - 01/May/09 05:50 +1 on v5 of the API (allowing that its going to change a little during the implementation)
@Holstad
RowResult is a SortedMap. You can do SortedMap kinda operations on it. Java fellas don't even have to think when it comes to using it.
Regards Result implementing List<KeyValue>, same as above only, more so in this case, in its guts, that is what it is.
How its made, I don't care. You can pass in List<KV> on construction or it should probably be a Writable since its going to make the trip across RPC.

@Stack
Sounds really good, so we will have a .rowResult() method that can be called on the Result, and that would be generated when
someone asks for it. Will try to keep the protocol from the server to the client as low level as possible and just instantiate the Result,
on the client side, with that.

Erik Holstad
added a comment - 01/May/09 16:44 @Stack
Sounds really good, so we will have a .rowResult() method that can be called on the Result, and that would be generated when
someone asks for it. Will try to keep the protocol from the server to the client as low level as possible and just instantiate the Result,
on the client side, with that.

Jonathan Gray
added a comment - 06/May/09 07:03 Patch coming at the end of the week.
I have an early-cut of the javadoc available here: http://jgray.la/HBASE-880/javadoc/
And the (strictly client-side) code here: http://jgray.la/HBASE-880/code/

Jim Kellerman
added a comment - 08/May/09 20:50 If you drop time range, how would you get all the versions between two time stamps?
e.g. all versions older than time; all versions newer than time, all versions between Monday and Friday

Jonathan Gray
added a comment - 08/May/09 21:31 You got it Jim. Currently I've retained the TimeRange class, but it is hidden from the client. It's very helpful server-side as a helper method container.

Jonathan Gray
added a comment - 26/May/09 17:07 This issue was used for discussion. A new issue will be opened to turn these design documents into user documentation and package javadocs.
Closing this issue as "no longer valid". This issue is being resolved as part of HBASE-1234 and HBASE-1304 .