For key cache files (which you have below – note ‘cassandra_saved_caches’ and
-KeyCache– ), the safest thing to do is to simply remove them (move them aside
or delete them). They’re simple caches, and they’ll be recreated shortly after
starting.
- Jeff
From: David Paulsen
Reply-To:

A few months back, a user in #cassandra on freenode mentioned that when they
transitioned from thrift to cql, their overall performance decreased
significantly. They had 66 columns per table, so I ran some benchmarks with
various versions of Cassandra and thrift/cql combinations.
It shouldn’t

@cassandra.apache.org
Subject: Re: Practical limitations of too many columns/cells ?
Ah.. yes. Great benchmarks. If I’m interpreting them correctly it was ~15x
slower for 22 columns vs 2 columns?
Guess we have to refactor again :-P
Not the end of the world of course.
On Sun, Aug 23, 2015 at 1:53 PM, Jeff

What consistency level are you using with your query?
What replication factor are you using on your keyspace?
Have you run repair?
The most likely explanation is that you wrote with low consistency (ANY, ONE,
etc), and that one or more replicas does not have the cell. You’re then reading
with

You can copy all of the sstables into any given data directory without issue
(keep them within the keyspace/table directories, but the mnt/mnt2/mnt3
location is irrelevant).
You can also stream them in via sstableloader if your ring topology has changed
(especially if tokens have moved)

This is not currently possible, though it has been proposed in the past and may
potentially be implemented in the future:
https://issues.apache.org/jira/browse/CASSANDRA-9110
- Jeff
From: yuankui
Reply-To: user@cassandra.apache.org
Date: Thursday, August 13, 2015 at 6:24 PM
To:

The timestamp is arbitrary precision, selected by the client. If you’re seeing
milliseconds on some data and microseconds on others, then you have one client
that’s using microseconds and another on milliseconds – adjust your clients.
From: yhq...@sina.com
Reply-To:

You’re trying to force your view onto an established ecosystem. It’s not “wrong
only because its currently bootstrapping”, it’s not bootstrapping at all, you
told it not to bootstrap.
‘auto_bootstrap’ is the knob that tells cassandra whether or not you want to
stream data from other replicas

Seeds are used in two different ways:
1) When joining the ring, the joining node knows NOTHING about the cluster, so
it uses a seed list to discover the cluster. Once discovered, it saves the
peers to disk, so subsequent starts it will find/reconnect to other nodes
beyond just the explicitly

You can check for progress using `nodetool compactionstats` (which will show
Cleanup tasks), or check for ‘Cleaned up’ messages in the log
(/var/log/cassandra/system.log).
However, `nodetool cleanup` has a very specific and limited task - it deletes
data no longer owned by the node, typically

At this point, it is only/automatically managed by cassandra, but if you’re
clever with mount points you can probably work around the limitation.
From: Ahmed Eljami
Reply-To: user@cassandra.apache.org
Date: Tuesday, August 25, 2015 at 2:09 AM
To: user@cassandra.apache.org
Subject: How can

Because the data format has changed, you’ll need to read it out and write it
back in again.
This means using either a driver (java, python, c++, etc), or something like
spark.
In either case, split up the token range so you can parallelize it for
significant speed improvements.
From:

The cassandra system.log would be more useful
When Cassandra starts rejecting or dropping tcp connections, try to connect
using cqlsh, and check the logs for indication that it’s failing.
From: Eduardo Alfaia
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, October 28, 2015 at 5:09 PM

It’s possible that it could be different depending on your consistency level
(on write and on read).
It’s also possible it’s a bug, but you didn’t give us much information – here
are some questions to help us help you:
What version?
What results are you seeing?
What’s the “right” result?

What Kai Wang is hinting at: one of the common tuning problems people face is
they follow the advice in cassandra-env.sh, which says 100M of young gen space
(Xmn) per core. Many people find that insufficient – raising that to be 30,40,
or 50% of heap size (Xmx) MAY help keep short-lived objects

After 15 hours, you have "112000 SSTables over all nodes for all CF’s” -
assuming I’m parsing that correctly, that’s about 30 per table (100 KS * 10
tables * 4 nodes), which is actually not unreasonable with LCS.
Your symptom is heavy GC and you’re running 2.1.6 and don’t want to upgrade

I’m going to disagree with Carlos in two points.
You do have a lot of columns, so many that it’s likely to impact performance.
Rather than using collections, serializing those into a single JSON field is
far more performant. Since you write each record exactly once, this should be
easily

r.
I'm running a nodetool repair now to hopefully fix this.
On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
auto_bootstrap=false tells it to join the cluster without running bootstrap –
the node assumes it has all of the necessary data, and won’t stream an

auto_bootstrap=false tells it to join the cluster without running bootstrap –
the node assumes it has all of the necessary data, and won’t stream any missing
data.
This generally violates consistency guarantees, but if done on a single node,
is typically correctable with `nodetool repair`.
If

Worth noting that repair may not work, as it’s possible that NONE of the nodes
with data (for some given row) are no longer valid replicas according to the
DHT/Tokens, so repair will not find any of the replicas with the data.
From: Robert Coli
Reply-To: "user@cassandra.apache.org"
Date:

As long as your hyper-v/vss snapshots include both the data directory and the
commit log directory, then they’re exactly as good as tolerating a single power
outage – you should be able to load the sstables and replay commit log and be
fine.
Assuming you’re moving the hyper-v/vss snapshot to

t, if it is possible, how current Cassandra can solve it?
Regards,
Ibrahim
If anyone can read the above scenario and confirm whether this can occur or
not, if it is possible, how current Cassandra can solve it?
Regards,
Ibrahim
On Sun, Sep 6, 2015 at 5:57 PM,

Yes, it can occur, if you allow it to occur.
Clients should send their own timestamps. Clocks should be synchronized.
Failure to do so while relying on ‘last write wins’ timestamp resolution will
cause undesirable results.
This is unrelated to strong/weak/eventual consistency discussions or

With a 5s collection, the problem is almost certainly GC.
GC pressure can be caused by a number of things, including normal read/write
loads, but ALSO compaction calculation (pre-2.1.9 / #9882) and very large
partitions (trying to load a very large partition with something like row cache
in

2.2.1 has a pretty significant bug in compaction:
https://issues.apache.org/jira/browse/CASSANDRA-10270
That prevents it from compacting files after 60 minutes. It may or may not be
the cause of the problem you’re seeing, but it seems like it may be possibly
related, and you can try the

2.1.4 is getting pretty old. There’s a DTCS deletion tweak in 2.1.5 (
https://issues.apache.org/jira/browse/CASSANDRA-8359 ) that may help you.
2.1.5 and 2.1.6 have some memory leak issues in DTCS, so go to 2.1.7 or newer
(probably 2.1.9 unless you have a compelling reason not to go to 2.1.9)

I’m here. Will be speaking Wednesday on DTCS for time series workloads:
http://cassandrasummit-datastax.com/agenda/real-world-dtcs-for-operators/
Picture if you recognize me, say hi:
https://events.mfactormeetings.com/accounts/register123/mfactor/datastax/events/dstaxsummit2015/jirsa.jpg

When you run unsafeAssassinateEndpoint, to which host are you connected, and
what argument are you passing?
Are there other nodes in the ring that you’re not including in the ‘nodetool
status’ output?
From: Dikang Gu
Reply-To: "user@cassandra.apache.org"
Date: Tuesday, September 22, 2015

s of being refactored (here's at least one of the
issues: https://issues.apache.org/jira/browse/CASSANDRA-9667), but it would be
worth opening an issue with as much information as you can provide to, at the
very least, have information avaiable for others.
On Fri, Sep 25, 2015 at 7:08 AM, Jeff Jirsa &lt

for your reply Jeff!
I will switch to Cassandra 2.1.9.
Quick follow up question: Does the schema, settings I have setup look alright?
My timestamp column's type is blob - I was wondering if this could confuse DTCS?
On Sun, Sep 20, 2015 at 3:37 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com&g

To: cassandra
Subject: Re: Unable to remove dead node from cluster.
@Jeff, I just use jmx connect to one node, run the unsafeAssainateEndpoint, and
pass in the "10.210.165.55" ip address.
Yes, we have hundreds of other nodes in the nodetool status output as well.
On Tue, Sep 22, 201

https://issues.apache.org/jira/browse/CASSANDRA-7953
https://issues.apache.org/jira/browse/CASSANDRA-10505
There are buggy versions of cassandra that will multiple tombstones during
compaction. 2.1.12 SHOULD correct that, if you’re on 2.1.
From: Kai Wang
Reply-To:

Streaming with vnodes is not always pleasant – rebuild uses streaming (as does
bootstrap, repair, and decommission). The rebuild delay you see may or may not
be related to that. It could also be that the streams timed out, and you don’t
have a stream timeout set. Are you seeing data move? Are

8G is probably too small for a G1 heap. Raise your heap or try CMS instead.
71% of your heap is collections – may be a weird data model quirk, but try CMS
first and see if that behaves better.
From: Mikhail Strebkov
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, December 9, 2015 at

There were a few buggy versions in 2.1 (2.1.7, 2.1.8, I believe) that showed
this behavior. The number of pending compactions was artificially high, and not
meaningful. As long as they number of –Data.db sstables remains normal,
compaction is keeping up and you’re fine.
- Jeff
From:

2015 at 3:18 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
There is research into causal consistency and cassandra (
http://da-data.blogspot.com/2013/02/caring-about-causality-now-in-cassandra.html
, for example), though you’ll note that it uses a fork (
https://github.com/wllo

https://issues.apache.org/jira/browse/CASSANDRA-10745
GPFS seems like it SHOULD be easier to package and distribute in most use cases…
From: "sean_r_dur...@homedepot.com"
Reply-To: "user@cassandra.apache.org"
Date: Monday, December 14, 2015 at 11:56 AM
To: "user@cassandra.apache.org"

There is research into causal consistency and cassandra (
http://da-data.blogspot.com/2013/02/caring-about-causality-now-in-cassandra.html
, for example), though you’ll note that it uses a fork (
https://github.com/wlloyd/eiger ) which is unlikely something you’d ever want
to consider in

Why do you think it’s cluster wide? That param is per-node, and you can change
it at runtime with nodetool (or via the JMX interface using jconsole to ip:7199
)
From: Ken Hancock
Reply-To: "user@cassandra.apache.org"
Date: Monday, January 4, 2016 at 12:59 PM
To:

Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our experience
was anywhere from 20-60%, depending on workload).
However, it’s not necessarily true that everything behaves exactly the same –
in particular, memtables are different, commitlog segment handling is
different, and

You chose a specific point in time that is especially painful. Had you chosen
most of 2014, you would have had a long period of 2.0.x that was stable.
Yes, if you were deploying in April 2015, you had an unpleasant choice between
an about-to-EOL 2.0 and a omg-memory-leak 2.1 – if you deploy

1) It comes online in its former state. The operator is responsible for
consistency beyond that point. Common solutions would be `nodetool repair` (and
if you get really smart, you can start the daemon with the thrift/native
listeners disabled, run repair, and then enable listeners, so that

You’ll see better performance using a slice (which is effectively what will
happen if you put them into the same table and use query-1table-b), as each
node will only need to merge cells/results once. It may not be twice as fast,
but it’ll be fast enough to make it worthwhile.
On 1/8/16,

“It takes as long as necessary to rewrite any sstable that needs to be
upgraded”.
>From 2.2.4 to 2.2.6, the sstable format did not change, so there’s nothing to
>upgrade.
If you want to force the matter (and you probably don’t), ‘nodetool
upgradesstables –a’ will rewrite them again, but you

k into updating soon). Just to clarify, in the current version of
Cassandra when do fully expired SSTables get dropped? Is it when a minor
compaction runs or is it separate from minor compactions? Also thanks to the
link to the slides, great stuff!
Jerome
From: Jeff Jirsa <jeff.ji...@crowdstrike.co

First, DTCS in 2.0.15 has some weird behaviors -
https://issues.apache.org/jira/browse/CASSANDRA-9572 .
That said, some other general notes:
Data deleted by TTL isn’t the same as issuing a delete – each expiring cell
internally has a ttl/timestamp at which it will be converted into a

Correcting myself - https://issues.apache.org/jira/browse/CASSANDRA-9882 made
it check for fully expired tables no more than once every 10 minutes (still
happens on flush as described, just not EVERY flush). Went in 2.0.17 / 2.1.9.
- Jeff
From: Jeff Jirsa <jeff

Make sure streaming throughput isn’t throttled on the destination cluster.
Stream from more machines (divide sstables between a bunch of machines, run in
parallel).
On 1/11/16, 5:21 AM, "Noorul Islam K M" wrote:
>
>I have a need to stream data to new cluster using

Very large partitions create a lot of garbage during reads:
https://issues.apache.org/jira/browse/CASSANDRA-9754 - you will see significant
GC pauses trying to read from large enough partitions.
I suspect GC, though it’s odd that you’re unable to see it.
From: Bryan Cheng
Reply-To:

When you change compaction strategy, nothing happens until the next flush. On
the next flush, the new compaction strategy will decide what to do – if you
change from STCS to DTCS, it will look at various timestamps of files, and
attempt to group them by time windows based on the sstable’s

With SSDs, the typical recommendation is up to 0.8-1 compactor per core
(depending on other load). How many CPU cores do you have?
From: Kai Wang
Reply-To: "user@cassandra.apache.org"
Date: Friday, January 15, 2016 at 12:53 PM
To: "user@cassandra.apache.org"
Subject: compaction throughput

ccess_mode which should be mmap. We also used LZ4Compressor when
created table.
We will let you know if this property had any effect. We were testing with
2.1.11 and this was only fixed in 2.1.12 so we need to play with latest version.
Praveen
From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
R

Is this during streaming plan setup (is your 10-20 second time of impact
approximately 30 seconds from the time you start the node that’s joining the
ring), or does it happen for the entire time you’re joining the node to the
ring?
If so, there’s a chance it’s GC related – the streaming plan

> For instance, way AAA (authentication, authorization, audit) is done, doesn't
> allow for centralized account and access control management, which in reality
> translates into shared accounts and no hierarchy.
Authentication and Authorization are both pluggable. Any organization can write

a, such as during the full processing of flushing memtables, but for the
fsync at the end a solid guarantee is needed.
-- Jack Krupansky
On Mon, Feb 1, 2016 at 12:56 AM, Eric Plowe <eric.pl...@gmail.com> wrote:
Jeff,
If EBS goes down, then EBS Gp2 will go down as well, no? I'm no

the "small to medium databases" use case.
Do older instances with local HDD still exist on AWS (m1, m2, etc.)? Is the doc
simply for any newly started instances?
See:
https://aws.amazon.com/ec2/instance-types/
http://aws.amazon.com/ebs/details/
-- Jack Krupansky
On Mon, Feb 1, 20

Also in that video - it's long but worth watching
We tested up to 1M reads/second as well, blowing out page cache to ensure we
weren't "just" reading from memory
--
Jeff Jirsa
> On Jan 31, 2016, at 9:52 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote:
>

Yes, but getting at why you think EBS is going down is the real point. New GM
in 2011. Very different product. 35:40 in the video
--
Jeff Jirsa
> On Jan 31, 2016, at 9:57 PM, Eric Plowe <eric.pl...@gmail.com> wrote:
>
> Jeff,
>
> If EBS goes down, then EBS Gp2 will go

Free to choose what you'd like, but EBS outages were also addressed in that
video (second half, discussion by Dennis Opacki). 2016 EBS isn't the same as
2011 EBS.
--
Jeff Jirsa
> On Jan 31, 2016, at 8:27 PM, Eric Plowe <eric.pl...@gmail.com> wrote:
>
> Thank you all for

of m1.large.
-- Jack Krupansky
On Mon, Feb 1, 2016 at 5:12 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
A lot of people use the old gen instances (m1 in particular) because they came
with a ton of effectively free ephemeral storage (up to 1.6TB). Whether or not
they’re viable is a d

If you have to ask that question, I strongly recommend m4 or c4 instances with
GP2 EBS. When you don’t care about replacing a node because of an instance
failure, go with i2+ephemerals. Until then, GP2 EBS is capable of amazing
things, and greatly simplifies life.
We gave a talk on this topic

Upgrade from 2.1.9+ directly to 3.0 is supported:
https://github.com/apache/cassandra/blob/cassandra-3.0/NEWS.txt#L83-L85
- Upgrade to 3.0 is supported from Cassandra 2.1 versions greater or equal to
2.1.9, or Cassandra 2.2 versions greater or equal to 2.2.2. Upgrade from
Cassandra 2.0 and

disk is rarely the
bottleneck. YMMV, of course.
On Fri, Jan 29, 2016 at 7:32 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
If you have to ask that question, I strongly recommend m4 or c4 instances with
GP2 EBS. When you don’t care about replacing a node because of an instance
failu

ke i have to force a major compaction to delete a lot of data ? are
there any other solutions ?
thanks
anishek
On Mon, Feb 22, 2016 at 11:21 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
1) getFullyExpiredSSTables in 2.0 isn’t as thorough as many expect, so it’s
very likely

Cassandra is streaming it at a near constant rate (if you had metrics for
network interface, you’d probably see that), but it doesn’t register in
nodetool status until it completes all of the sstables for a column family. At
that point, the -tmp–Data.db files get renamed to drop the –tmp, and

1) getFullyExpiredSSTables in 2.0 isn’t as thorough as many expect, so it’s
very likely that some sstables stick around longer than you expect.
2) max_sstable_age_days tells cassandra when to stop compacting that file, not
when to delete it.
3) You can change the window size using both the

The value of cassandra is in its replication – as a single node solution, it’s
slower and less flexible than alternatives
From: John Lammers
Reply-To: "user@cassandra.apache.org"
Date: Friday, January 22, 2016 at 12:57 PM
To: Cassandra Mailing List
Subject: Fwd: Production with Single Node

"As I understand TTL, if there is a compaction of a cell (or row) with a TTL
that has been reached, a tombstone will be written.”
The expiring cell is treated as a tombstone once it reaches it’s end of life,
it does not write an additional tombstone to disk.
From:

If you don’t overwrite or delete data, it’s not a concern.
If the clocks show a time in the past instead of in the future, it’s not a
concern.
If the clock has drifted significantly into the future, when you start NTP you
may be writing data with timestamps lower than timestamps on data that

A bit of Splunk-fu probably works for this – you’ll have different line entries
for memtable flushes vs compaction output. Comparing the two will give you a
general idea of compaction amplification.
From: Dikang Gu
Reply-To: "user@cassandra.apache.org"
Date: Thursday, March 10, 2016 at

Drain should not run for days – if it were me, I’d be:
Checking for ‘DRAINED’ in the server logs
Running ‘nodetool flush’ just to explicitly flush the commitlog/memtables
(generally useful before doing drain, too, it can be somewhat race-y)
Explicitly killing cassandra following the flush – drain

SELECT COUNT(*) probably works (with internal paging) on many datasets with
enough time and assuming you don’t have any partitions that will kill you.
No, it doesn’t count extra replicas / duplicates.
The old way to do this (before paging / fetch size) was to use manual paging
based on

It is possible to use OpsCenter for open source / community versions up to
2.2.x. It will not be possible in 3.0+
From: Anuj Wadehra
Reply-To: "user@cassandra.apache.org"
Date: Sunday, April 10, 2016 at 9:28 AM
To: User
Subject: DataStax OpsCenter with Apache Cassandra
Hi,
Is it

> am trying to concretely understand how DTCS makes buckets and I am looking
> at the DateTieredCompactionStrategyTest.testGetBuckets method and played with
> some of the parameters to GetBuckets method call (Cassandra 2.1.12). I don’t
> think I fully understand something there.
Don’t feel

> We added a bunch of new nodes to a cluster (2.1.13) and everything went fine,
> except for the number of pending compactions that is staying quite high on a
> subset of the new nodes. Over the past 3 days, the pending compactions have
> never been less than ~130 on such nodes, with peaks of

0 bytes on all nodes.
We have replication factor 3 but the problem is only on two nodes.
the only other thing that stands out in cfstats is the read time and write time
on the nodes with high GC is 5-7 times higher than other 5 nodes, but i think
thats expected.
thanks
anishek

Compaction falling behind will likely cause additional work on reads (more
sstables to merge), but I’d be surprised if it manifested in super long GC.
When you say twice as many sstables, how many is that?.
In cfstats, does anything stand out? Is max row size on those nodes larger than
on

100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with RF=3
in both of those Dcs. That’s exactly what you’d expect it to be, and a
perfectly viable production config for many workloads.
From: Anuj Wadehra
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, April 13,

The keyspace with RF=1 may lose data, but isn’t blocking the replacement.
The most likely cause of the delay is hung streaming. Run `nodetool netstats`
on the joining (replacement) node. Do the byte counters change? If not,
streaming is hung, and you’ll likely need to restart the process. If

Even with the same data, bloom filter is based on sstables. If your compaction
behaves differently on 2 nodes than the third, your bloom filter RAM usage may
be different.
From: Kai Wang
Reply-To: "user@cassandra.apache.org"
Date: Tuesday, May 17, 2016 at 8:02 PM
To:

If you remove a node at a time, you’ll eventually end up with a single node in
the DC you’re decommissioning which will own all of the data, and you’ll likely
overwhelm that node.
It’s typically recommended that you ALTER the keyspace, remove the replication
settings for that DC, and then you

Cassandra isn’t a traditional DB – it doesn’t “replicate” in the same way that
a relational DB replicas.
Cassandra clients send mutations (via native protocol or thrift). Those
mutations include a minimum consistency level for the server to return a
successful write.
If a write says

“removenode” instead of “decommission” to make it even faster. Will
that have any side-effect (I think it shouldn’t) ?
From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Monday, May 23, 2016 4:43 PM
To: user@cassandra.apache.org
Subject: Re: Removing a datacenter
If you remove a node at a tim

Fastest way? Stop cassandra, use sstablemetadata to remove any files with
maxTimestamp > 2 days. Start cassandra. Works better with some compaction
strategies than others (probably find a few droppable sstables with either DTCS
/ STCS, but not perfect).
Cleanest way? One by one (starting with

You can’t stream between versions, so in order to grow the cluster, you’ll need
to be entirely on 2.0 or entirely on 2.1.
If you go to 2.1 first, be sure you run upgradesstables before you try to
extend the cluster.
On 5/18/16, 11:17 AM, "Erik Forsberg" wrote:
>Hi!
>