From user-return-18401-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Jul 4 09:09:16 2011
Return-Path:
X-Original-To: apmail-cassandra-user-archive@www.apache.org
Delivered-To: apmail-cassandra-user-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 0421768D7
for ; Mon, 4 Jul 2011 09:09:16 +0000 (UTC)
Received: (qmail 85917 invoked by uid 500); 4 Jul 2011 09:09:12 -0000
Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org
Received: (qmail 85905 invoked by uid 500); 4 Jul 2011 09:09:02 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 85896 invoked by uid 99); 4 Jul 2011 09:08:57 -0000
Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 09:08:57 +0000
X-ASF-Spam-Status: No, hits=-0.7 required=5.0
tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (nike.apache.org: domain of osishkin@gmail.com designates 209.85.160.172 as permitted sender)
Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jul 2011 09:08:51 +0000
Received: by gyd5 with SMTP id 5so625171gyd.31
for ; Mon, 04 Jul 2011 02:08:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=mime-version:reply-to:in-reply-to:references:date:message-id
:subject:from:to:cc:content-type:content-transfer-encoding;
bh=Nm3+SwZhQUa6Itkhqcw3Nu2FyXosl1KtL8kMY+ma054=;
b=BvbrIsd0GB0RuIWu1VM2/xrSF4/I8oItdMIvmixMry8DyBjUOIuY5wex0Fn5IzoejG
DRt6PdSV4Ar0kFfRJXPhnkjVW9epQQVAQc9HgIpJKR9P1cR0WdsDSYsNq7fqSpSwzy9d
NJ72USZatz9j43GV+eSXYNSkz55CYn7aQCvJ4=
MIME-Version: 1.0
Received: by 10.146.72.27 with SMTP id u27mr4820893yaa.39.1309770510333; Mon,
04 Jul 2011 02:08:30 -0700 (PDT)
Received: by 10.146.84.2 with HTTP; Mon, 4 Jul 2011 02:08:30 -0700 (PDT)
Reply-To: osishkin@gmail.com
In-Reply-To:
References:
<7506C99D83A0A54F8127A4931F6CF0B004DBD7BD@IE2RD2XVS531.red002.local>
Date: Mon, 4 Jul 2011 12:08:30 +0300
Message-ID:
Subject: Re: Multi-type column values in single CF
From: osishkin osishkin
To: =?ISO-8859-1?Q?Silv=E8re_Lestang?=
Cc: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Virus-Checked: Checked by ClamAV on apache.org
I appreciate both your answers.
I'll use them soon.
Thanks!
On Mon, Jul 4, 2011 at 11:48 AM, Silv=E8re Lestang
wrote:
> We do pretty much the same thing here, dynamic column with a timestamp fo=
r
> column name and a different value type for each row. We use the
> serialization/deserialization classes provided with Hector and store the
> type of the value in the key of the row. Example of row key:
> "b6c8a1e7281761e62230ea76daa3d841#INT" =3D> every values are Integer
> "7f30a6a2bbb1b921afc8216d8c5d9257#DOUBLE" =3D> every values are Double
> ....
> If I'll have to do it again, I'll try to use (Dynamic)CompositeType for
> value or an equivalent mechanism as suggested by Roland.
>
> On 3 July 2011 15:07, Roland Gude wrote:
>>
>> You could do the serialization for all your supported datatypes yourself
>> (many libraries for serialization are available and a pretty thorough
>> benchmarking for them can be found here:
>> https://github.com/eishay/jvm-serializers/wiki) and prepend the serializ=
ed
>> bytes with an identifier for your datatype.
>> This would not avoid casting though but would still be better performing
>> then serializing to strings as it is done in your example.
>> Prepending the values with the id seems to be better to me, because you
>> can be sure that a new insertion to some field overwrites the correct co=
lumn
>> even if it changed the type.
>>
>> -----Urspr=FCngliche Nachricht-----
>> Von: osishkin osishkin [mailto:osishkin@gmail.com]
>> Gesendet: Sonntag, 3. Juli 2011 13:52
>> An: user@cassandra.apache.org
>> Betreff: Multi-type column values in single CF
>>
>> Hi all,
>>
>> I need to store column values that are of various data types in a
>> single column family, i.e I have column values that are integers,
>> others that are strings, and maybe more later. All column names are
>> strings (no comparator problem for me).
>> The thing is I need to store unstructured data - I do not have fixed
>> and known-in-advacne column names, so I can not use a fixed static map
>> for casting the values back to their original type on retrieval from
>> cassandra.
>>
>> My immediate naive thought is to simply prefix every column name with
>> the type the value needs to be cast back to.
>> For example i'll do the follwing conversion to the columns of some key -
>> {'attr1': 'val1','attr2': 100} =A0~> {'str_attr1' : 'val1', 'int_attr2' =
:
>> '100'}
>> and only then send it to cassandra. This way I know to what should I
>> cast it back.
>>
>> But all this casting back and forth on the client side seems to me to
>> be very bad for performance.
>> Another option is to split the columns on dedicated column families
>> with mathcing validation types - a column family for integer values,
>> one for string, one for timestamp etc.
>> But that does not seem very efficient either (and worse for any
>> rollback mechanism), since now I have to perform several get calls on
>> multiple CFs where once I had only one.
>>
>> I thought perhaps someone has encountered a similar situation in the
>> past, and can offer some advice on the best course of action.
>>
>> Thank you,
>> Osi
>>
>>
>
>