Thinking about data types and serialization. I think null support is animportant characteristic for the serialized representations, especiallywhen considering the compound type. However, doing so in directlyincompatible with fixed-width representations for numerics. For instance,if we want to have a fixed-width signed long stored on 8-bytes, where doyou put null? float and double types can cheat a little by folding negativeand positive NaN's into a single representation (this isn't strictlycorrect!), leaving a place to represent null. In the long example case, theobvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. Thiswill allocate an additional encoding which can be used for null. Myexperience working with scientific data, however, makes me wince at theidea.

The variable-width encodings have it a little easier. There's alreadyenough going on that it's simpler to make room.

Remember, the final goal is to support order-preserving serialization. Thisimposes some limitations on our encoding strategies. For instance, it's notenough to simply encode null, it really needs to be encoded as 0x00 so asto sort lexicographically earlier than any other value.

I think that fixed width support is important for a great many rowkeyconstructs cases, so I'd rather see something like losing MIN_VALUE andkeeping fixed width.On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:

>Heya,>>Thinking about data types and serialization. I think null support is an>important characteristic for the serialized representations, especially>when considering the compound type. However, doing so in directly>incompatible with fixed-width representations for numerics. For instance,>if we want to have a fixed-width signed long stored on 8-bytes, where do>you put null? float and double types can cheat a little by folding>negative>and positive NaN's into a single representation (this isn't strictly>correct!), leaving a place to represent null. In the long example case,>the>obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This>will allocate an additional encoding which can be used for null. My>experience working with scientific data, however, makes me wince at the>idea.>>The variable-width encodings have it a little easier. There's already>enough going on that it's simpler to make room.>>Remember, the final goal is to support order-preserving serialization.>This>imposes some limitations on our encoding strategies. For instance, it's>not>enough to simply encode null, it really needs to be encoded as 0x00 so as>to sort lexicographically earlier than any other value.>>What do you think? Any ideas, experiences, etc?>>Thanks,>Nick

I spent some time this weekend extracting bits of our serialization code toa public github repo at http://github.com/hotpads/data-tools. Contributions are welcome - i'm sure we all have this stuff laying around.

Looking back, I think my latest opinion on the topic is to rejectnullability as the rule since it can cause unexpected behavior andconfusion. It's cleaner to provide a wrapper class (so both LongArrayListplus NullableLongArrayList) that explicitly defines the behavior, and costsa little more in performance. If the user can't find a pre-made wrapperclass, it's not very difficult for each user to provide their owninterpretation of null and check for it themselves.

If you reject nullability, the question becomes what to do in situationswhere you're implementing existing interfaces that accept nullable params. The LongArrayList above implements List<Long> which requires an add(Long)method. In the above implementation I chose to swap nulls withLong.MIN_VALUE, however I'm now thinking it best to force the user to makethat swap and then throw IllegalArgumentException if they pass null.On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <[EMAIL PROTECTED]>wrote:

>> HmmmŠ good question.>> I think that fixed width support is important for a great many rowkey> constructs cases, so I'd rather see something like losing MIN_VALUE and> keeping fixed width.>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:>> >Heya,> >> >Thinking about data types and serialization. I think null support is an> >important characteristic for the serialized representations, especially> >when considering the compound type. However, doing so in directly> >incompatible with fixed-width representations for numerics. For instance,> >if we want to have a fixed-width signed long stored on 8-bytes, where do> >you put null? float and double types can cheat a little by folding> >negative> >and positive NaN's into a single representation (this isn't strictly> >correct!), leaving a place to represent null. In the long example case,> >the> >obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This> >will allocate an additional encoding which can be used for null. My> >experience working with scientific data, however, makes me wince at the> >idea.> >> >The variable-width encodings have it a little easier. There's already> >enough going on that it's simpler to make room.> >> >Remember, the final goal is to support order-preserving serialization.> >This> >imposes some limitations on our encoding strategies. For instance, it's> >not> >enough to simply encode null, it really needs to be encoded as 0x00 so as> >to sort lexicographically earlier than any other value.> >> >What do you think? Any ideas, experiences, etc?> >> >Thanks,> >Nick>>>>

I'm thinking I will press forward with a base implementation that does notsupport nulls. The idea is to provide an extensible set of interfaces, so Ithink this will not box us into a corner later. That is, a mirroringpackage could be implemented that supports null values and acceptsthe relevant trade-offs.

> I spent some time this weekend extracting bits of our serialization code to> a public github repo at http://github.com/hotpads/data-tools.> Contributions are welcome - i'm sure we all have this stuff laying around.>> You can see I've bumped into the NULL problem in a few places:> *>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java> *>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>> Looking back, I think my latest opinion on the topic is to reject> nullability as the rule since it can cause unexpected behavior and> confusion. It's cleaner to provide a wrapper class (so both LongArrayList> plus NullableLongArrayList) that explicitly defines the behavior, and costs> a little more in performance. If the user can't find a pre-made wrapper> class, it's not very difficult for each user to provide their own> interpretation of null and check for it themselves.>> If you reject nullability, the question becomes what to do in situations> where you're implementing existing interfaces that accept nullable params.> The LongArrayList above implements List<Long> which requires an add(Long)> method. In the above implementation I chose to swap nulls with> Long.MIN_VALUE, however I'm now thinking it best to force the user to make> that swap and then throw IllegalArgumentException if they pass null.>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <[EMAIL PROTECTED]> >wrote:>> >> > HmmmŠ good question.> >> > I think that fixed width support is important for a great many rowkey> > constructs cases, so I'd rather see something like losing MIN_VALUE and> > keeping fixed width.> >> >> >> >> > On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:> >> > >Heya,> > >> > >Thinking about data types and serialization. I think null support is an> > >important characteristic for the serialized representations, especially> > >when considering the compound type. However, doing so in directly> > >incompatible with fixed-width representations for numerics. For> instance,> > >if we want to have a fixed-width signed long stored on 8-bytes, where do> > >you put null? float and double types can cheat a little by folding> > >negative> > >and positive NaN's into a single representation (this isn't strictly> > >correct!), leaving a place to represent null. In the long example case,> > >the> > >obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This> > >will allocate an additional encoding which can be used for null. My> > >experience working with scientific data, however, makes me wince at the> > >idea.> > >> > >The variable-width encodings have it a little easier. There's already> > >enough going on that it's simpler to make room.> > >> > >Remember, the final goal is to support order-preserving serialization.> > >This> > >imposes some limitations on our encoding strategies. For instance, it's> > >not> > >enough to simply encode null, it really needs to be encoded as 0x00 so> as> > >to sort lexicographically earlier than any other value.> > >> > >What do you think? Any ideas, experiences, etc?> > >> > >Thanks,> > >Nick> >> >> >> >>

From the SQL perspective, handling null is important. Phoenix supports null in the following way:- the absence of a key value- an empty value in a key value- an empty value in a multi part row key - for variable length types (VARCHAR and DECIMAL) a null byte separator would be used if not the last column - for fixed width types only the last column is allowed to be null

As you mentioned, it's important to maintain the lexicographical sort order with nulls being first.

On 04/01/2013 01:32 PM, Nick Dimiduk wrote:> Thanks for the thoughtful response (and code!).>> I'm thinking I will press forward with a base implementation that does not> support nulls. The idea is to provide an extensible set of interfaces, so I> think this will not box us into a corner later. That is, a mirroring> package could be implemented that supports null values and accepts> the relevant trade-offs.>> Thanks,> Nick>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:>>> I spent some time this weekend extracting bits of our serialization code to>> a public github repo at http://github.com/hotpads/data-tools.>> Contributions are welcome - i'm sure we all have this stuff laying around.>>>> You can see I've bumped into the NULL problem in a few places:>> *>>>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>> *>>>> https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>>>> Looking back, I think my latest opinion on the topic is to reject>> nullability as the rule since it can cause unexpected behavior and>> confusion. It's cleaner to provide a wrapper class (so both LongArrayList>> plus NullableLongArrayList) that explicitly defines the behavior, and costs>> a little more in performance. If the user can't find a pre-made wrapper>> class, it's not very difficult for each user to provide their own>> interpretation of null and check for it themselves.>>>> If you reject nullability, the question becomes what to do in situations>> where you're implementing existing interfaces that accept nullable params.>> The LongArrayList above implements List<Long> which requires an add(Long)>> method. In the above implementation I chose to swap nulls with>> Long.MIN_VALUE, however I'm now thinking it best to force the user to make>> that swap and then throw IllegalArgumentException if they pass null.>>>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <[EMAIL PROTECTED]>>> wrote:>>> HmmmŠ good question.>>>>>> I think that fixed width support is important for a great many rowkey>>> constructs cases, so I'd rather see something like losing MIN_VALUE and>>> keeping fixed width.>>>>>>>>>>>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:>>>>>>> Heya,>>>>>>>> Thinking about data types and serialization. I think null support is an>>>> important characteristic for the serialized representations, especially>>>> when considering the compound type. However, doing so in directly>>>> incompatible with fixed-width representations for numerics. For>> instance,>>>> if we want to have a fixed-width signed long stored on 8-bytes, where do>>>> you put null? float and double types can cheat a little by folding>>>> negative>>>> and positive NaN's into a single representation (this isn't strictly>>>> correct!), leaving a place to represent null. In the long example case,>>>> the>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one. This>>>> will allocate an additional encoding which can be used for null. My>>>> experience working with scientific data, however, makes me wince at the>>>> idea.>>>>>>>> The variable-width encodings have it a little easier. There's already>>>> enough going on that it's simpler to make room.>>>>>>>> Remember, the final goal is to support order-preserving serialization.>>>> This>>>> imposes some limitations on our encoding strategies. For instance, it's

> From the SQL perspective, handling null is important.>From your perspective, it is critical to support NULLs, even at the expenseof fixed-width encodings at all or supporting representation of a fullrange of values. That is, you'd rather be able to represent NULL than -2^31?

On 04/01/2013 01:32 PM, Nick Dimiduk wrote:>>> Thanks for the thoughtful response (and code!).>>>> I'm thinking I will press forward with a base implementation that does not>> support nulls. The idea is to provide an extensible set of interfaces, so>> I>> think this will not box us into a corner later. That is, a mirroring>> package could be implemented that supports null values and accepts>> the relevant trade-offs.>>>> Thanks,>> Nick>>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:>>>> I spent some time this weekend extracting bits of our serialization code>>> to>>> a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>>>> .>>> Contributions are welcome - i'm sure we all have this stuff laying>>> around.>>>>>> You can see I've bumped into the NULL problem in a few places:>>> *>>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>>>> *>>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>>>>>>> Looking back, I think my latest opinion on the topic is to reject>>> nullability as the rule since it can cause unexpected behavior and>>> confusion. It's cleaner to provide a wrapper class (so both>>> LongArrayList>>> plus NullableLongArrayList) that explicitly defines the behavior, and>>> costs>>> a little more in performance. If the user can't find a pre-made wrapper>>> class, it's not very difficult for each user to provide their own>>> interpretation of null and check for it themselves.>>>>>> If you reject nullability, the question becomes what to do in situations>>> where you're implementing existing interfaces that accept nullable>>> params.>>> The LongArrayList above implements List<Long> which requires an>>> add(Long)>>> method. In the above implementation I chose to swap nulls with>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to>>> make>>> that swap and then throw IllegalArgumentException if they pass null.>>>>>>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <>>> [EMAIL PROTECTED]>>>>>>> wrote:>>>> HmmmŠ good question.>>>>>>>> I think that fixed width support is important for a great many rowkey>>>> constructs cases, so I'd rather see something like losing MIN_VALUE and>>>> keeping fixed width.>>>>>>>>>>>>>>>>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:>>>>>>>> Heya,>>>>>>>>>> Thinking about data types and serialization. I think null support is an>>>>> important characteristic for the serialized representations, especially>>>>> when considering the compound type. However, doing so in directly>>>>> incompatible with fixed-width representations for numerics. For>>>>>>>>> instance,>>>>>>> if we want to have a fixed-width signed long stored on 8-bytes, where do>>>>> you put null? float and double types can cheat a little by folding>>>>> negative>>>>> and positive NaN's into a single representation (this isn't strictly>>>>> correct!), leaving a place to represent null. In the long example case,>>>>> the>>>>> obvious choice is to reduce MAX_VALUE or increase MIN_VALUE by one.>>>>> This>>>>> will allocate an additional encoding which can be used for null. My>>>>> experience working with scientific data, however, makes me wince at the

Furthermore, is is more important to support null values than squeeze allrepresentations into minimum size (4-bytes for int32, &c.)?On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:

> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[EMAIL PROTECTED]>wrote:>>> From the SQL perspective, handling null is important.>>> From your perspective, it is critical to support NULLs, even at the> expense of fixed-width encodings at all or supporting representation of a> full range of values. That is, you'd rather be able to represent NULL than> -2^31?>> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:>>>>> Thanks for the thoughtful response (and code!).>>>>>> I'm thinking I will press forward with a base implementation that does>>> not>>> support nulls. The idea is to provide an extensible set of interfaces,>>> so I>>> think this will not box us into a corner later. That is, a mirroring>>> package could be implemented that supports null values and accepts>>> the relevant trade-offs.>>>>>> Thanks,>>> Nick>>>>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]>>>> wrote:>>>>>> I spent some time this weekend extracting bits of our serialization>>>> code to>>>> a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>>>>> .>>>> Contributions are welcome - i'm sure we all have this stuff laying>>>> around.>>>>>>>> You can see I've bumped into the NULL problem in a few places:>>>> *>>>>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>>>>> *>>>>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>>>>>>>>> Looking back, I think my latest opinion on the topic is to reject>>>> nullability as the rule since it can cause unexpected behavior and>>>> confusion. It's cleaner to provide a wrapper class (so both>>>> LongArrayList>>>> plus NullableLongArrayList) that explicitly defines the behavior, and>>>> costs>>>> a little more in performance. If the user can't find a pre-made wrapper>>>> class, it's not very difficult for each user to provide their own>>>> interpretation of null and check for it themselves.>>>>>>>> If you reject nullability, the question becomes what to do in situations>>>> where you're implementing existing interfaces that accept nullable>>>> params.>>>> The LongArrayList above implements List<Long> which requires an>>>> add(Long)>>>> method. In the above implementation I chose to swap nulls with>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to>>>> make>>>> that swap and then throw IllegalArgumentException if they pass null.>>>>>>>>>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <>>>> [EMAIL PROTECTED]>>>>>>>>> wrote:>>>>> HmmmŠ good question.>>>>>>>>>> I think that fixed width support is important for a great many rowkey>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE and>>>>> keeping fixed width.>>>>>>>>>>>>>>>>>>>>>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:>>>>>>>>>> Heya,>>>>>>>>>>>> Thinking about data types and serialization. I think null support is>>>>>> an>>>>>> important characteristic for the serialized representations,>>>>>> especially>>>>>> when considering the compound type. However, doing so in directly>>>>>> incompatible with fixed-width representations for numerics. For>>>>>>>>>>> instance,>>>>>>>>> if we want to have a fixed-width signed long stored on 8-bytes, where>>>>>> do>>>>>> you put null? float and double types can cheat a little by folding>>>>>> negative>>>>>> and positive NaN's into a single representation (this isn't strictly

Ah, I didn't even realize sql allowed null key parts. Maybe a goal of theinterfaces should be to provide first-class support for custom user typesin addition to the standard ones included. Part of the power of hbase'splain byte[] keys is that users can concoct the perfect key for their datatype. For example, I have a lot of geographic data where I interleavelatitude/longitude bits into a sortable 64 bit value that would probablynever be included in a standard library.On Mon, Apr 1, 2013 at 8:38 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:

Silly question...Null support. In a system where a column may or may not exist, how do you support null?

;-)

In terms of a key, it's a primary key and can't be null. So what am I missing?Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 1, 2013, at 10:26 PM, Nick Dimiduk <[EMAIL PROTECTED]> wrote:

> Furthermore, is is more important to support null values than squeeze all> representations into minimum size (4-bytes for int32, &c.)?> On Apr 1, 2013 4:41 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:> >> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[EMAIL PROTECTED]>wrote:>> >>> From the SQL perspective, handling null is important.>> >> >> From your perspective, it is critical to support NULLs, even at the>> expense of fixed-width encodings at all or supporting representation of a>> full range of values. That is, you'd rather be able to represent NULL than>> -2^31?>> >> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:>>> >>>> Thanks for the thoughtful response (and code!).>>>> >>>> I'm thinking I will press forward with a base implementation that does>>>> not>>>> support nulls. The idea is to provide an extensible set of interfaces,>>>> so I>>>> think this will not box us into a corner later. That is, a mirroring>>>> package could be implemented that supports null values and accepts>>>> the relevant trade-offs.>>>> >>>> Thanks,>>>> Nick>>>> >>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]>>>>> wrote:>>>> >>>> I spent some time this weekend extracting bits of our serialization>>>>> code to>>>>> a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>>>>>> .>>>>> Contributions are welcome - i'm sure we all have this stuff laying>>>>> around.>>>>> >>>>> You can see I've bumped into the NULL problem in a few places:>>>>> *>>>>> >>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>>>>>> *>>>>> >>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>>>>>> >>>>> Looking back, I think my latest opinion on the topic is to reject>>>>> nullability as the rule since it can cause unexpected behavior and>>>>> confusion. It's cleaner to provide a wrapper class (so both>>>>> LongArrayList>>>>> plus NullableLongArrayList) that explicitly defines the behavior, and>>>>> costs>>>>> a little more in performance. If the user can't find a pre-made wrapper>>>>> class, it's not very difficult for each user to provide their own>>>>> interpretation of null and check for it themselves.>>>>> >>>>> If you reject nullability, the question becomes what to do in situations>>>>> where you're implementing existing interfaces that accept nullable>>>>> params.>>>>> The LongArrayList above implements List<Long> which requires an>>>>> add(Long)>>>>> method. In the above implementation I chose to swap nulls with>>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to>>>>> make>>>>> that swap and then throw IllegalArgumentException if they pass null.>>>>> >>>>> >>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <>>>>> [EMAIL PROTECTED]>>>>> >>>>>> wrote:>>>>>> HmmmŠ good question.>>>>>> >>>>>> I think that fixed width support is important for a great many rowkey>>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE and>>>>>> keeping fixed width.>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:>>>>>> >>>>>> Heya,>>>>>>> >>>>>>> Thinking about data types and serialization. I think null support is>>>>

On 04/01/2013 04:41 PM, Nick Dimiduk wrote:> On Mon, Apr 1, 2013 at 4:31 PM, James Taylor <[EMAIL PROTECTED]> wrote:>>> From the SQL perspective, handling null is important.>> From your perspective, it is critical to support NULLs, even at the expense> of fixed-width encodings at all or supporting representation of a full> range of values. That is, you'd rather be able to represent NULL than -2^31?We've been able to get away with supporting NULL through the absence of the value rather than restricting the data range. We haven't had any push back on not allowing a fixed width nullable leading row key column. Since our variable length DECIMAL supports null and is a superset of the fixed width numeric types, users have a reasonable alternative.

I'd rather not restrict the range of values, since it doesn't seem like this would be necessary.>> On 04/01/2013 01:32 PM, Nick Dimiduk wrote:>>> Thanks for the thoughtful response (and code!).>>>>>> I'm thinking I will press forward with a base implementation that does not>>> support nulls. The idea is to provide an extensible set of interfaces, so>>> I>>> think this will not box us into a corner later. That is, a mirroring>>> package could be implemented that supports null values and accepts>>> the relevant trade-offs.>>>>>> Thanks,>>> Nick>>>>>> On Mon, Apr 1, 2013 at 12:26 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:>>>>>> I spent some time this weekend extracting bits of our serialization code>>>> to>>>> a public github repo at http://github.com/hotpads/**data-tools<http://github.com/hotpads/data-tools>>>>> .>>>> Contributions are welcome - i'm sure we all have this stuff laying>>>> around.>>>>>>>> You can see I've bumped into the NULL problem in a few places:>>>> *>>>>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>>> main/java/com/hotpads/data/**primitive/lists/LongArrayList.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/primitive/lists/LongArrayList.java>>>>> *>>>>>>>> https://github.com/hotpads/**data-tools/blob/master/src/**>>>> main/java/com/hotpads/data/**types/floats/DoubleByteTool.**java<https://github.com/hotpads/data-tools/blob/master/src/main/java/com/hotpads/data/types/floats/DoubleByteTool.java>>>>>>>>> Looking back, I think my latest opinion on the topic is to reject>>>> nullability as the rule since it can cause unexpected behavior and>>>> confusion. It's cleaner to provide a wrapper class (so both>>>> LongArrayList>>>> plus NullableLongArrayList) that explicitly defines the behavior, and>>>> costs>>>> a little more in performance. If the user can't find a pre-made wrapper>>>> class, it's not very difficult for each user to provide their own>>>> interpretation of null and check for it themselves.>>>>>>>> If you reject nullability, the question becomes what to do in situations>>>> where you're implementing existing interfaces that accept nullable>>>> params.>>>> The LongArrayList above implements List<Long> which requires an>>>> add(Long)>>>> method. In the above implementation I chose to swap nulls with>>>> Long.MIN_VALUE, however I'm now thinking it best to force the user to>>>> make>>>> that swap and then throw IllegalArgumentException if they pass null.>>>>>>>>>>>> On Mon, Apr 1, 2013 at 11:41 AM, Doug Meil <>>>> [EMAIL PROTECTED]>>>>>>>>> wrote:>>>>> HmmmŠ good question.>>>>>>>>>> I think that fixed width support is important for a great many rowkey>>>>> constructs cases, so I'd rather see something like losing MIN_VALUE and>>>>> keeping fixed width.>>>>>>>>>>>>>>>>>>>>>>>>> On 4/1/13 2:00 PM, "Nick Dimiduk" <[EMAIL PROTECTED]> wrote:>>>>>>>>>> Heya,>>>>>> Thinking about data types and serialization. I think null support is an>>>>>> important characteristic for the serialized representations, especially>>>>>> when considering the compound type. However, doing so in directly>>>>>> incompatible with fixed-width representations for numerics. For

I generally don't allow nulls in my composite row keys. Does SQL allownulls in the PK? In the rare case I wanted to do that I might create aseparate format called NullableCInt32 with 5 bytes where the first onedetermined null. It's important to keep the pure types pure.

I have lots of null *values* however, but they're represented by lack of aqualifier in the Put. If a row has all null values, I create a dummyqualifier with a dummy value to make sure the row key gets inserted as itwould in sql.On Mon, Apr 1, 2013 at 4:49 PM, James Taylor <[EMAIL PROTECTED]> wrote:

Precisely how this will be exposed via the hbase client is TBD. We won't bedeprecating the existing Bytes utility from the client view, so a new APIfor supporting these types will be provided. I'll be able to providesupport and/or a patch for Pig (et al) once the implementation is a bitfurther along.

My question for you as a Pig representative is more about how Pig usersexpect Pig to handle NULLs. Are NULL values within a tuple acommon occurrence in Pig? In comparison, I'm thinking about the prevalenceof NULL in SQL.