Details

Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.

Increments the RPC protocol version in org.apache.hadoop.ipc.Server from 4 to 5.
Introduces ArrayPrimitiveWritable for a much more efficient wire format to transmit arrays of primitives over RPC. ObjectWritable uses the new writable for array of primitives for RPC and continues to use existing format for on-disk data.

Hi Konstantin, the vote seems to be pretty strongly in favor, so I went ahead and ran some tests of the patch against v0.22. Confirmed that the patch applies automatically, and hadoop/common compiles and successfully runs core unit tests.

Matt Foley
added a comment - 29/Mar/11 18:51 Hi Konstantin, the vote seems to be pretty strongly in favor, so I went ahead and ran some tests of the patch against v0.22. Confirmed that the patch applies automatically, and hadoop/common compiles and successfully runs core unit tests.

Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected.

Javadoc comments simplified.

Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description.

Provided.

Is PrimitiveArrayWritable a better name?

By common usage, "PrimitiveArrayWritable" would imply a sub-class of ArrayWritable, which this isn't. So I thought it best to use a different word order.

Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper.

After discussion, we agreed there is no benefit in efficiency or code clarity, so no change in this regard.

Rename isCandidatePrimitive() to isPrimitive()?

Agreed. Originally, isCandidatePrimitive() provided a layer of indirection so that ArrayPrimitiveWritable didn't have to contract to support ALL primitive types. However, since it ended up supporting them all, the distinction is irrelevant.
Changed to use Class.isPrimitive().

Is constructor with componentType as argument used any where?

No, it is provided to make ArrayPrimitiveWritable more generally useful.

Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(),

No, if the user elects to use the typechecking capabilities of declaredComponentType, he should be able to make both of these queries.

Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch.

I do not want to change the usage from that of ObjectWritable. Essentially all our Writable classes use UTF8. If we want to change that, please open a different Jira.

I prefer MalformedInputException instead of ProtocolException (see Text.java for example)

I agree ProtocolException isn't exactly right.
Problem with MalformedInputException: it doesn't take a "message" argument, only an integer, so we can't use that.
Changed to just use IOException, since that's what most of the Writable code uses.

==========

ObjectWritable.java

Can you call set() method while setting instance and declaredClass in #writeObject() method?

My understanding is that ObjectWritable.writeObject() is a static method that does not instantiate an ObjectWritable instance. The variables "instance" and "declaredClass" are both local variables in #writeObject() context. So calling the non-static "set()" would not be okay.

Not sure about the comment: // This case must come after the "isArray()" case

You're right, it's not order-dependent since it uses /if/else if/else/ rather than a sequence of "if"s.
Removed the comment.

Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName())

I'm being paranoid. There can be a range of behaviors if declaredClass isn't exactly the same as instance.getClass(), so I'm refusing to allowCompactArrays if the caller isn't declaring the true type.

==========

TestArrayPrimitiveWritable

In the class javadoc can you add link to PrimitiveArrayWritable?

Done. Good catch.

Can you make in, out, bigSet, resultSet and expectedSet final?

Sure.

Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract?

For "normal" i/o I do use readFields() and write(). Then I test to see that the wire format is as expected, by going around those methods and directly invoking lower-level functionality. I think this is the only way to get confidence in the wire format encoders and decoders.

Matt Foley
added a comment - 19/Mar/11 00:18
ArrayPrimitiveWriable
Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected.
Javadoc comments simplified.
Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description.
Provided.
Is PrimitiveArrayWritable a better name?
By common usage, "PrimitiveArrayWritable" would imply a sub-class of ArrayWritable, which this isn't. So I thought it best to use a different word order.
Add @InterfaceAudience.Public and @InterfaceStability.Stable
Done.
Throw HadoopIllegalArgumentException instead of NullPointerException and IllegalArgumentException
Done.
Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper.
After discussion, we agreed there is no benefit in efficiency or code clarity, so no change in this regard.
Rename isCandidatePrimitive() to isPrimitive()?
Agreed. Originally, isCandidatePrimitive() provided a layer of indirection so that ArrayPrimitiveWritable didn't have to contract to support ALL primitive types. However, since it ended up supporting them all, the distinction is irrelevant.
Changed to use Class.isPrimitive().
Is constructor with componentType as argument used any where?
No, it is provided to make ArrayPrimitiveWritable more generally useful.
Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(),
No, if the user elects to use the typechecking capabilities of declaredComponentType, he should be able to make both of these queries.
Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch.
I do not want to change the usage from that of ObjectWritable. Essentially all our Writable classes use UTF8. If we want to change that, please open a different Jira.
I prefer MalformedInputException instead of ProtocolException (see Text.java for example)
I agree ProtocolException isn't exactly right.
Problem with MalformedInputException: it doesn't take a "message" argument, only an integer, so we can't use that.
Changed to just use IOException, since that's what most of the Writable code uses.
==========
ObjectWritable.java
Can you call set() method while setting instance and declaredClass in #writeObject() method?
My understanding is that ObjectWritable.writeObject() is a static method that does not instantiate an ObjectWritable instance. The variables "instance" and "declaredClass" are both local variables in #writeObject() context. So calling the non-static "set()" would not be okay.
Not sure about the comment: // This case must come after the "isArray()" case
You're right, it's not order-dependent since it uses /if/else if/else/ rather than a sequence of "if"s.
Removed the comment.
Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName())
I'm being paranoid. There can be a range of behaviors if declaredClass isn't exactly the same as instance.getClass(), so I'm refusing to allowCompactArrays if the caller isn't declaring the true type.
==========
TestArrayPrimitiveWritable
In the class javadoc can you add link to PrimitiveArrayWritable?
Done. Good catch.
Can you make in, out, bigSet, resultSet and expectedSet final?
Sure.
Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract?
For "normal" i/o I do use readFields() and write(). Then I test to see that the wire format is as expected, by going around those methods and directly invoking lower-level functionality. I think this is the only way to get confidence in the wire format encoders and decoders.

This is an important patch. Thanks for doing it in a backward compatible way and also in a way where both on disk format and RPC format can co-exist.

Comments:

ArrayPrimitiveWriable

Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected.

Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description.

Suresh Srinivas
added a comment - 14/Mar/11 19:26 This is an important patch. Thanks for doing it in a backward compatible way and also in a way where both on disk format and RPC format can co-exist.
Comments:
ArrayPrimitiveWriable
Class javadoc is not very clear - not sure what you mean by boxed type not supported by ObjectWritable and why that should be mentioned in the javadoc. I am not sure also how clients can provide external locking; more appropriate would be to say, pass a copy of the array, if concurrent modifications are expected.
Comments related to declaredComponentType is not clear. Can you please add description on what componentType and declaredComponentType are and how they are used, bef adding further description.
Is PrimitiveArrayWritable a better name?
Add @InterfaceAudience.Public and @InterfaceStability.Stable
Throw HadoopIllegalArgumentException instead of NullPointerException and IllegalArgumentException
Optional: The read write methods could be made static methods in WritableUtils class, so it can be used by others. Also PrimitiveType map, isCanditdatePrimitive() could also move to helper.
Rename isCandidatePrimitive() to isPrimitive()?
Is constructor with componentType as argument used any where?
Some methods could be private - for example - getDeclaredComponentType(), isDeclaredComponentType(),
Can you use Text instead of UTF8? This avoids deprecation warnings. You could also use WritableUtils to write and read string. This can be done in many other places in the patch.
I prefer MalformedInputException instead of ProtocolException (see Text.java for example)
ObjectWritable.java
Can you call set() method while setting instance and declaredClass in #writeObject() method?
Not sure about the comment: // This case must come after the "isArray()" case
Not sure why this condition is required in #writeObject() - instance.getClass().getName().equals(declaredClass.getName())
TestArrayWritable
In the class javadoc can you add link to PrimitiveArrayWritable?
Can you make in, out, bigSet, resultSet and expectedSet final?
Should we prefer reading and writing objects using readFields and writeFields - since they are the methods supporting Writable contract?

Matt Foley
added a comment - 28/Feb/11 09:48 This patch is same as v4, plus increments the RPC version number, org.apache.hadoop.ipc.Server#CURRENT_VERSION. Thanks, Hairong, for the clarification.
HADOOP-7158 has been opened for the discussion of similar optimization of wire format for homogeneous arrays of non-primitive objects. I'll post a prototype patch to that Jira tomorrow.
Unless the auto-tester finds a problem, I think this v5 patch is ready to go. Enabling patch test. Thanks.

> As I said in my comment in HDFS-1583, Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous.

Assuming you are talking about labeling the array only for the type it carries, the approach you are suggesting will not work for heartbeat response, where DatanodeCommand[] is the response, with array elements carrying subtypes of DatanodeCommand, which should be labelled once per array element.

Also a protocol could choose to give a response of Object[]. Not sure I understand the reference you posted and how it is relevant to this discussion.

I prefer this issue focusing on primitive types and opening a separate jira for non-primitive type, as I think it needs more discussion.

Suresh Srinivas
added a comment - 25/Feb/11 18:48 > As I said in my comment in HDFS-1583 , Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous.
Assuming you are talking about labeling the array only for the type it carries, the approach you are suggesting will not work for heartbeat response, where DatanodeCommand[] is the response, with array elements carrying subtypes of DatanodeCommand, which should be labelled once per array element.
Also a protocol could choose to give a response of Object[]. Not sure I understand the reference you posted and how it is relevant to this discussion.
I prefer this issue focusing on primitive types and opening a separate jira for non-primitive type, as I think it needs more discussion.

I recommend looking for subclasses of VersionedProtocol rather than search. You can see that most of the protocols you listed implement VersionedProtocol, except for two AvroRpcEngine and WritableRpcEngine, which implement RpcEngine. Avro has its own serialization object, afaiu, so we don't need to change version for the latter two.

I think it is fine to change only ipc.Server#CURRENT_VERSION, as Hairong sugested, which is the VersionedProtocol version, rather than all subclasses, as it will provide incompatibility for all of them at once.

As I said in my comment in HDFS-1583, Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous.

Yes we need to support only arrays of Writables (and of primitive types).

NULL values should be supported.

My opinion this is important optimization to be included in 0.22. If people have doubts I can start a vote on general. Lmk.

Konstantin Shvachko
added a comment - 25/Feb/11 07:28
I recommend looking for subclasses of VersionedProtocol rather than search. You can see that most of the protocols you listed implement VersionedProtocol, except for two AvroRpcEngine and WritableRpcEngine, which implement RpcEngine. Avro has its own serialization object, afaiu, so we don't need to change version for the latter two.
I think it is fine to change only ipc.Server#CURRENT_VERSION, as Hairong sugested, which is the VersionedProtocol version, rather than all subclasses, as it will provide incompatibility for all of them at once.
As I said in my comment in HDFS-1583 , Java does not support heterogeneous arrays, see ArrayStoreException for the reference. So we don't need to support such generic semantic. We can assume the array is strictly homogeneous.
Yes we need to support only arrays of Writables (and of primitive types).
NULL values should be supported.
My opinion this is important optimization to be included in 0.22. If people have doubts I can start a vote on general. Lmk.

Hairong Kuang
added a comment - 23/Feb/11 18:24 Matt, for the version number. I believe that you just need to bump up the RPC version number: org.apache.hadoop.ipc.Server#CURRENT_VERSION. This should be good enough.

The first nine are clearly production version numbers. The next three (two security and one avro) do not seem to have ever been incremented and I wonder if they need to be now. The last four are test-specific and I think should not be incremented. So please advise:

1. Do all the first nine protocols use WritableRPCEngine and therefore need to have their version numbers incremented?
2. Do the next three need to have their version numbers incremented for this change?
3. Do you agree that the four Test protocol versions should not change?
4. Did I miss any that you are aware of?

Thank you. I will put together the versioning patch when we have consensus on what to change.

Konstantin, regarding your suggestion to extend this enhancement to arrays of non-Primitive objects:
There is a simple way to extend this approach to arrays of Strings and Writables. I coded it up and have it available, it adds an ArrayWritable.Internal very similar to ArrayPrimitiveWritable.Internal. Nice and clean. Of course the size improvement isn't as dramatic since the ratio of label size to object size isn't as bad as with primitives, but the performance improvement is still there (not having to go through the decision loop for every element of a long array).

However, using it would cause a significant change in semantics:
The current ObjectWritable can handle non-homogeneous arrays containing different kinds of Writables, and nulls.
The optimization we are discussing here is removing the type-tagging of every array entry, thereby assuming that the array is in fact strictly homogeneous, including having no null entries. There is also the question of what type declaration the container array has on entry, and what type it should have on exit. In the current code the only restriction on array type is
(Writable[]).type isAssignableFrom (X[]).type and
X.type isAssignableFrom x[i].getClass() for all elements i
Also in the current code, the array produced on the receiving side is always simply Writable[].

Questions:
1. Is it acceptable to assume/mandate that all arrays of Writables passed in to RPC shall be homogeneous and have no null elements? Note that this is a very strict form of homogeneity, forbidding even subclass instances, because, for example, if you define a class FooWritable and a subclass SubFooWritable, you can put them both in an array of declared type FooWritable[], but the receiving-side deserializer will ONLY produce objects of type FooWritable, and will fail entirely unless the serialized output of FooWritable.write() and SubFooWritable.write() happen to be compatible (which is too complicated an exception to try to explain, IMO).

2. On the receiving side, should we continue producing an array of type Writable[], or (ii) preserve the type of the array during the implicit encoding process, or (iii) produce an array of componentType same as the actual element type, assuming the array is indeed strictly homogeneous so all elements are the same type? All three are easily done, but have implications about what can be done with the array later.

I would like to get the ArrayPrimitiveWritable committed soon, so unless the answers to the above are really obvious to all interested parties, maybe I should open a new Jira for a longer discussion?

Finally, regarding putting it in v22, it should port trivially, but is it acceptable to have an incompatible protocol change in a released version? Thanks.

Matt Foley
added a comment - 23/Feb/11 18:16 Regarding VersionedProtocols: By searching for all implementations of the VersionedProtocol API "getProtocolVersion(String, long)", and the values they return, I found the following 16 version constants:
hdfs.protocol.ClientDatanodeProtocol.versionID; (8)
hdfs.protocol.ClientProtocol.versionID; (65)
hdfs.server.protocol.DatanodeProtocol.versionID; (27)
hdfs.server.protocol.InterDatanodeProtocol.versionID; (6)
hdfs.server.protocol.NamenodeProtocol.versionID; (5)
mapred.AdminOperationsProtocol.versionID; (3)
mapred.InterTrackerProtocol.versionID; (31)
mapred.TaskUmbilicalProtocol.versionID; (19)
mapreduce.protocol.ClientProtocol.versionID; (36)
hadoop.security.authorize.RefreshAuthorizationPolicyProtocol.versionID; (1)
hadoop.security.RefreshUserMappingsProtocol.versionID; (1)
hadoop.ipc.AvroRpcEngine.VERSION; (0)
TEST:
hadoop.security.TestDoAsEffectiveUser.TestProtocol.versionID (1)
hadoop.ipc.MiniRPCBenchmark.MiniProtocol.versionID; (1)
hadoop.ipc.TestRPC.TestProtocol.versionID; (1)
mapred.TestTaskCommit.MyUmbilical unnamed constant (0)
The first nine are clearly production version numbers. The next three (two security and one avro) do not seem to have ever been incremented and I wonder if they need to be now. The last four are test-specific and I think should not be incremented. So please advise:
1. Do all the first nine protocols use WritableRPCEngine and therefore need to have their version numbers incremented?
2. Do the next three need to have their version numbers incremented for this change?
3. Do you agree that the four Test protocol versions should not change?
4. Did I miss any that you are aware of?
Thank you. I will put together the versioning patch when we have consensus on what to change.
Konstantin, regarding your suggestion to extend this enhancement to arrays of non-Primitive objects:
There is a simple way to extend this approach to arrays of Strings and Writables. I coded it up and have it available, it adds an ArrayWritable.Internal very similar to ArrayPrimitiveWritable.Internal. Nice and clean. Of course the size improvement isn't as dramatic since the ratio of label size to object size isn't as bad as with primitives, but the performance improvement is still there (not having to go through the decision loop for every element of a long array).
However, using it would cause a significant change in semantics:
The current ObjectWritable can handle non-homogeneous arrays containing different kinds of Writables, and nulls.
The optimization we are discussing here is removing the type-tagging of every array entry, thereby assuming that the array is in fact strictly homogeneous, including having no null entries. There is also the question of what type declaration the container array has on entry, and what type it should have on exit. In the current code the only restriction on array type is
(Writable[]).type isAssignableFrom (X[]).type and
X.type isAssignableFrom x [i] .getClass() for all elements i
Also in the current code, the array produced on the receiving side is always simply Writable[].
Questions:
1. Is it acceptable to assume/mandate that all arrays of Writables passed in to RPC shall be homogeneous and have no null elements? Note that this is a very strict form of homogeneity, forbidding even subclass instances, because, for example, if you define a class FooWritable and a subclass SubFooWritable, you can put them both in an array of declared type FooWritable[], but the receiving-side deserializer will ONLY produce objects of type FooWritable, and will fail entirely unless the serialized output of FooWritable.write() and SubFooWritable.write() happen to be compatible (which is too complicated an exception to try to explain, IMO).
2. On the receiving side, should we continue producing an array of type Writable[], or (ii) preserve the type of the array during the implicit encoding process, or (iii) produce an array of componentType same as the actual element type, assuming the array is indeed strictly homogeneous so all elements are the same type? All three are easily done, but have implications about what can be done with the array later.
I would like to get the ArrayPrimitiveWritable committed soon, so unless the answers to the above are really obvious to all interested parties, maybe I should open a new Jira for a longer discussion?
Finally, regarding putting it in v22, it should port trivially, but is it acceptable to have an incompatible protocol change in a released version? Thanks.

Konstantin Shvachko
added a comment - 02/Feb/11 20:17 The compatible way of doing this is really nice. A couple of questions.
Can/should we extend this to arrays of non-primitive types? This should benefit return types for calls like listStatus() and getBlockLocations() on a large directory.
Is it enough to increment rpc version only? This changes serialization of each and every VersionedProtocol. So formally they all should be incremented.
This seems to be a very important optimization. Would like to raise the question of applying it to 0.22.

Doug, thanks for bringing up this important use case. After looking over the code, it appears there is a single line in WritableRpcEngine where all RPC code calls ObjectWritable.writeObject(). If this one call is modified to allow the more compact array format, we get the benefit for all RPC protocols based on WritableRpcEngine, without having to change any of the RPC APIs, and with no danger to the other callers of ObjectWritable.writeObject(). Any other callers of ObjectWritable.writeObject() may choose to use the new call, but for safety sake we default to the old format.

Please take a look at this new patch. Besides the changes Todd recommended, I broke ObjectWritable.writeObject() into a stub with the old API signature, and a slightly modified implementation with an additional boolean argument controlling whether compact format is used for arrays of primitives. If false, the behavior is as before; if true the behavior is the new compact format. Callers of the old stub always get the old ("false") behavior. And finally I changed the one line in WritableRpcEngine to call the new API with the boolean argument "true".

I also re-confirmed that this change gives the improvement in Block Report processing RPC overhead time.

Matt Foley
added a comment - 31/Jan/11 06:48 Doug, thanks for bringing up this important use case. After looking over the code, it appears there is a single line in WritableRpcEngine where all RPC code calls ObjectWritable.writeObject(). If this one call is modified to allow the more compact array format, we get the benefit for all RPC protocols based on WritableRpcEngine, without having to change any of the RPC APIs, and with no danger to the other callers of ObjectWritable.writeObject(). Any other callers of ObjectWritable.writeObject() may choose to use the new call, but for safety sake we default to the old format.
Please take a look at this new patch. Besides the changes Todd recommended, I broke ObjectWritable.writeObject() into a stub with the old API signature, and a slightly modified implementation with an additional boolean argument controlling whether compact format is used for arrays of primitives. If false, the behavior is as before; if true the behavior is the new compact format. Callers of the old stub always get the old ("false") behavior. And finally I changed the one line in WritableRpcEngine to call the new API with the boolean argument "true".
I also re-confirmed that this change gives the improvement in Block Report processing RPC overhead time.

We don't currently support cross-version RPC, so changing the RPC wire format is certainly acceptable. My concern and the reason I suggested adding a subclass of ObjectWritable used by RPC that writes the new format is that folks might be using ObjectWritable for file-based data. In this case files written by the new version would not be readable by the old version. Some sites run different versions of Hadoop on different clusters and use distcp/HFTP to sync sets of files between the clusters. If these files contained ObjectWritable then this could make files sync'd from a cluster running a newer version unreadable on a cluster running an older version. So the safest change would be to only change RPC and not ObjectWritable.

Doug Cutting
added a comment - 25/Jan/11 00:46 We don't currently support cross-version RPC, so changing the RPC wire format is certainly acceptable. My concern and the reason I suggested adding a subclass of ObjectWritable used by RPC that writes the new format is that folks might be using ObjectWritable for file-based data. In this case files written by the new version would not be readable by the old version. Some sites run different versions of Hadoop on different clusters and use distcp/HFTP to sync sets of files between the clusters. If these files contained ObjectWritable then this could make files sync'd from a cluster running a newer version unreadable on a cluster running an older version. So the safest change would be to only change RPC and not ObjectWritable.

Hi Todd, thanks for reading over this. Agree with the first three items, and will include in next version of the patch. I'll wait a couple days to see if other suggestions arise.

The fourth item is tempting, and I had considered running long arrays through a zip filter, which achieves much the same goal. However, I'm told by folks who've been around longer than I that the wire format was kept in "clear" on purpose for simplicity and ease of maintenance. So I think we should leave it that way, unless there is overwhelming support for a compressed format.

Matt Foley
added a comment - 25/Jan/11 00:16 Hi Todd, thanks for reading over this. Agree with the first three items, and will include in next version of the patch. I'll wait a couple days to see if other suggestions arise.
The fourth item is tempting, and I had considered running long arrays through a zip filter, which achieves much the same goal. However, I'm told by folks who've been around longer than I that the wire format was kept in "clear" on purpose for simplicity and ease of maintenance. So I think we should leave it that way, unless there is overwhelming support for a compressed format.

For the functions that loop around the arrays, it might be faster to cast the Object to a foo[] rather than using Array.setFoo/Array.getFoo - might get some extra bounds check elimination going on and doesn't seem like any lost clarity in doing so

For the special case of reading and writing byte[] we can use in.readFully() and out.write() with the whole array, which might be a bit faster.

Should we consider using variable length coding for some of these types, since we have the opportunity to change the format? Could even do something fancy like delta-code it and then vint-encode for more space savings? Would be worth a benchmark to see if it helps.

Todd Lipcon
added a comment - 24/Jan/11 18:50 A few comments:
in readFields/write - add an else
{ throw new IllegalStateException("no branch for type: " + componentType); }
or something just to be safe?
For the functions that loop around the arrays, it might be faster to cast the Object to a foo[] rather than using Array.setFoo/Array.getFoo - might get some extra bounds check elimination going on and doesn't seem like any lost clarity in doing so
For the special case of reading and writing byte[] we can use in.readFully() and out.write() with the whole array, which might be a bit faster.
Should we consider using variable length coding for some of these types, since we have the opportunity to change the format? Could even do something fancy like delta-code it and then vint-encode for more space savings? Would be worth a benchmark to see if it helps.

Tsz Wo Nicholas Sze
added a comment - 22/Jan/11 21:57 > What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change?
We allow incompatible change for major releases. So it is acceptable for 0.23.

The attached patch is for consideration and discussion, not immediate submission. It is a little more complex than the prior suggestion, but has the following benefits:
1. it is fully backward compatible with any persisted data that may be out there, because it can still successfully receive/read the old format
2. it has minimal changes to the ObjectWritable module
3. it is consistent with current practice regarding *Writable classes
4. It automatically benefits all code that is currently pumping arrays of longs or arrays of bytes through the RPC and ObjectWritable mechanisms, including block reports and journaling.
5. It has fully optimized wire format, with almost no object overhead, unlike solutions based on ArrayWritable.

That said, it may still be considered an incompatible change:
While it is fully backward compatible with old persisted data, it is not backward compatible with older versions of Client software: A NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an OLDER RECEIVER will not correctly receive arrays of primitives sent from a NEWER SENDER.

To achieve that level of backward compatibility requires some sort of version negotiation scheme for the RPC, such as remembering whether the sender used the new or old format when sending to you. I'm not aware of much support for that level of complexity.

What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change?

Matt Foley
added a comment - 22/Jan/11 21:25 The attached patch is for consideration and discussion, not immediate submission. It is a little more complex than the prior suggestion, but has the following benefits:
1. it is fully backward compatible with any persisted data that may be out there, because it can still successfully receive/read the old format
2. it has minimal changes to the ObjectWritable module
3. it is consistent with current practice regarding *Writable classes
4. It automatically benefits all code that is currently pumping arrays of longs or arrays of bytes through the RPC and ObjectWritable mechanisms, including block reports and journaling.
5. It has fully optimized wire format, with almost no object overhead, unlike solutions based on ArrayWritable.
That said, it may still be considered an incompatible change:
While it is fully backward compatible with old persisted data, it is not backward compatible with older versions of Client software: A NEWER RPC RECEIVER can correctly receive data from an OLDER SENDER, but an OLDER RECEIVER will not correctly receive arrays of primitives sent from a NEWER SENDER.
To achieve that level of backward compatibility requires some sort of version negotiation scheme for the RPC, such as remembering whether the sender used the new or old format when sending to you. I'm not aware of much support for that level of complexity.
What is the sense of the community? Is this level of incompatibility with older Clients acceptable for, say, a v23 change?

Konstantin Shvachko
added a comment - 13/Jan/11 21:05 From comment
> move ObjectWritable's array i/o to protected methods, add a subclass with an optimized
> implementation of these methods, then use that subclass in RPC in place of ObjectWritable.
Sounds like a plan to me.
There is a patch for arrays with primitive types attached to this jira. Can be used as a starting point I guess.

This seems to have higher impact on performance than previously reported. I see many VersionProtocol methods use arrays as their input or output parameters, which can substantially benefit from this optimization.
See discussion started in HDFS-1583.

Konstantin Shvachko
added a comment - 13/Jan/11 20:59 This seems to have higher impact on performance than previously reported. I see many VersionProtocol methods use arrays as their input or output parameters, which can substantially benefit from this optimization.
See discussion started in HDFS-1583 .

Navis
added a comment - 13/Sep/10 03:34 You're right. It's just an mere aesthetic mod, but a real improvement.
It was the smallest part of in-house NN-HA implementaion. For DN having couple of million blocks, it reduced block reporting time from 21sec --> 17 sec.
I'll resolve this issue as a 'Not-A-Problem'. Thanks for your comment.

Please format your change as a "patch" file that can be applied against the trunk of the hadoop-common repository. This one has a diff of 1.txt and 2.txt which makes it more difficult to apply.

Regarding the change, it seems good except that it's backwards-incompatible for ObjectWritable. Since ObjectWritable is a public facing class, it's likely that some people have stored them on disk (eg in sequencefiles) so I don't think we can make a breaking change like this.

Do you have benchmarks that show a significant improvement by making this change? If so, it may be worth creating a new CompactObjectWritable that's used only in the RPC layer, for example, and marked as a Private Evolving interface. If the improvement is not significant, we should skip this change since we'll eventually move to a more compact RPC format (Avro) anyway.

Todd Lipcon
added a comment - 10/Sep/10 15:30 Hi Navis,
Please format your change as a "patch" file that can be applied against the trunk of the hadoop-common repository. This one has a diff of 1.txt and 2.txt which makes it more difficult to apply.
Regarding the change, it seems good except that it's backwards-incompatible for ObjectWritable. Since ObjectWritable is a public facing class, it's likely that some people have stored them on disk (eg in sequencefiles) so I don't think we can make a breaking change like this.
Do you have benchmarks that show a significant improvement by making this change? If so, it may be worth creating a new CompactObjectWritable that's used only in the RPC layer, for example, and marked as a Private Evolving interface. If the improvement is not significant, we should skip this change since we'll eventually move to a more compact RPC format (Avro) anyway.