One metric to watch is pending compactions (via nodetool compactionstats). This count will
give you some idea of whether you are falling behind with compactions. The other measure
is how long you are compacting after your inserts have stopped.
If I understand correctly, since you never update the data, that would explain why the compaction
logging shows 100% of orig. With size-tiered, you are flushing small files, compacting when
you get 4 of like size, etc. Since you have no updates, the compaction will not shrink the
data.
As Aaron said, use iostat –x (or dstat) to see if you are taxing the disks. If so, then
leveled compaction may be your option (for reasons already stated). If not taxing the disks,
then you might want to increase your compaction throughput, as you suggested.
Depending on what version you are using, another thing to possibly tune is the size of sstables
when flushed to disk. In your case of insert only, the smaller the flush size, the more times
that row is going to be rewritten during a compaction (hence increase I/O).
jc
From: Edward Capriolo <edlinuxguru@gmail.com<mailto:edlinuxguru@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 2:33 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: help turning compaction..hours of run to get 0% compaction....
There is some point where you simply need more machines.
On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman <mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
wrote:
Right, I guess I'm saying that you should try loading your data with leveled compaction and
see how your compaction load is.
Your work load sounds like leveled will fit much better than size tiered.
From: Brian Tarbox <tarbox@cabotresearch.com<mailto:tarbox@cabotresearch.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 1:58 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: help turning compaction..hours of run to get 0% compaction....
The problem I see is that it already takes me more than 24 hours just to load my data...during
which time the logs say I'm spending tons of time doing compaction. For example in the last
72 hours I'm consumed 20 hours per machine on compaction.
Can I conclude from that than I should be (perhaps drastically) increasing my compaction_mb_per_sec
on the theory that I'm getting behind?
The fact that it takes me 3 days or more to run a test means its hard to just play with values
and see what works best, so I'm trying to understand the behavior in detail.
Thanks.
Brain
On Mon, Jan 7, 2013 at 4:13 PM, Michael Kjellman <mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
wrote:
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
"If you perform at least twice as many reads as you do writes, leveled compaction may actually
save you disk I/O, despite consuming more I/O for compaction. This is especially true if your
reads are fairly random and don’t focus on a single, hot dataset."
From: Brian Tarbox <tarbox@cabotresearch.com<mailto:tarbox@cabotresearch.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 12:56 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: help turning compaction..hours of run to get 0% compaction....
I have not specified leveled compaction so I guess I'm defaulting to size tiered? My data
(in the column family causing the trouble) insert once, ready many, update-never.
Brian
On Mon, Jan 7, 2013 at 3:13 PM, Michael Kjellman <mkjellman@barracuda.com<mailto:mkjellman@barracuda.com>>
wrote:
Size tiered or leveled compaction?
From: Brian Tarbox <tarbox@cabotresearch.com<mailto:tarbox@cabotresearch.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, January 7, 2013 12:03 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: help turning compaction..hours of run to get 0% compaction....
I have a column family where I'm doing 500 inserts/sec for 12 hours or so at time. At some
point my performance falls off a cliff due to time spent doing compactions.
I'm seeing row after row of logs saying that after 1 or 2 hours of compactiing it reduced
to 100% of 99% of the original.
I'm trying to understand what direction this data points me to in term of configuration change.
a) increase my compaction_throughput_mb_per_sec because I'm falling behind (am I falling
behind?)
b) enable multi-threaded compaction?
Any help is appreciated.
Brian
----------------------------------
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
­­
----------------------------------
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
­­
----------------------------------
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: http://on.fb.me/UAdL4f
­­