From user-return-9638-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Oct 04 13:36:26 2010
Return-Path:
Delivered-To: apmail-cassandra-user-archive@www.apache.org
Received: (qmail 22173 invoked from network); 4 Oct 2010 13:36:26 -0000
Received: from unknown (HELO mail.apache.org) (140.211.11.3)
by 140.211.11.9 with SMTP; 4 Oct 2010 13:36:26 -0000
Received: (qmail 67619 invoked by uid 500); 4 Oct 2010 13:36:24 -0000
Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org
Received: (qmail 67453 invoked by uid 500); 4 Oct 2010 13:36:22 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 67445 invoked by uid 99); 4 Oct 2010 13:36:22 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Oct 2010 13:36:22 +0000
X-ASF-Spam-Status: No, hits=-0.0 required=10.0
tests=RCVD_IN_DNSWL_NONE,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of jedd.rashbrooke@imagini.net designates 209.85.214.172 as permitted sender)
Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172)
by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Oct 2010 13:36:15 +0000
Received: by iwn3 with SMTP id 3so8347325iwn.31
for ; Mon, 04 Oct 2010 06:35:54 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.36.202 with SMTP id u10mr10263135ibd.64.1286199354386;
Mon, 04 Oct 2010 06:35:54 -0700 (PDT)
Received: by 10.231.185.142 with HTTP; Mon, 4 Oct 2010 06:35:54 -0700 (PDT)
In-Reply-To:
References:
<4C938CF3.2060907@digg.com>
Date: Mon, 4 Oct 2010 14:35:54 +0100
Message-ID:
Subject: Re: Dazed and confused with Cassandra on EC2 ...
From: Jedd Rashbrooke
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Hi Peter,
Thanks again for your time and thoughts on this problem.
We think we've got a bit ahead of the problem by just
scaling back (quite savagely) on the rate that we try to
hit the cluster. Previously, with a surplus of optimism,
we were throwing very big Hadoop jobs at Cassandra,
including what I understand to be a worst-case usage
(random reads).
Now we're throttling right back on the number of parallel
jobs that we fire from Hadoop, and we're seeing better
performance, in terms of the boxes generally staying up
as far as nodetool and other interactive sessions are
concerned.
As discussed, we've adopted quite a number of different
approaches with GC - at the moment we've returned to:
JVM_OPTS=" \
-ea \
-Xms2G \
-Xmx3G \
-XX:+UseParNewGC \
-XX:+UseConcMarkSweepGC \
-XX:+CMSParallelRemarkEnabled \
-XX:SurvivorRatio=8 \
-XX:MaxTenuringThreshold=1 \
-XX:+HeapDumpOnOutOfMemoryError \
-Dcom.sun.management.jmxremote.port=8080 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false"
... which is much closer to the default as shipped - notable
change is the heap size, which out of the box comes as 1G.
There's some words on the 'Net that - the recent pages on
Riptano's site in fact - that strongly encourage scaling left
and right, rather than beefing up the boxes - and certainly
we're seeing far less bother from GC using a much smaller
heap - previously we'd been going up to 16GB, or even
higher. This is based on my previous positive experiences
of getting better performance from memory hog apps (eg.
Java) by giving them more memory. In any case, it seems
that using large amounts of memory on EC2 is just asking
for trouble.
And because it's Amazon, more smaller machines generally
works out as the same CPU grunt per dollar, of course ..
although the management costs go up.
To answer your last question there - we'd been using some
pretty beefy EC2 boxes, but now we think we'll head back
to the 2-core 7GB medium-ish sized machines I think.
All IO still runs like a dog no matter how much money you
spend, sadly.
cheers,
Jedd.