Solr? Are you on DSE or am i missing something ( huge ) about Cassandra? ( wouldnt be the first time :-)
Or do you mean the json manifest ? Its there and it looks ok, in fact its been corrupted twice due to storage problems and i hit https://issues.apache.org/jira/browse/CASSANDRA-5041
TBH i think this was a repair without -pr
thanks,
Andras
Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84
[cid:7BDF7228-D831-4D98-967A-BE04FEB17544]
On 18 Dec 2012, at 22:09, B. Todd Burruss > wrote:
in your data directory, for each keyspace there is a solr.json. cassandra stores the SSTABLEs it knows about when using leveled compaction. take a look at that file and see if it looks accurate. if not, this is a bug with cassandra that we are checking into as well
On Thu, Dec 6, 2012 at 7:38 PM, aaron morton > wrote:
The log message matches what I would expect to see for nodetool -pr
Not using pr means repair all the ranges the node is a replica for. If you have RF == number of nodes, then it will repair all the data.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 6/12/2012, at 9:42 PM, Andras Szerdahelyi > wrote:
Thanks!
i'm also thinking a repair run without -pr could have caused this maybe ?
Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84
On 06 Dec 2012, at 04:05, aaron morton > wrote:
- how do i stop repair before i run out of storage? ( can't let this finish )
To stop the validation part of the repair…
nodetool -h localhost stop VALIDATION
The only way I know to stop streaming is restart the node, their may be a better way though.
INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java (line 666) [repair #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113, /X.X.0.71 on range (85070591730234615865843651857942052964,0] for ( .. )
Am assuming this was ran on the first node in DC west with -pr as you said.
The log message is saying this is going to repair the primary range for the node for the node. The repair is then actually performed one CF at a time.
You should also see log messages ending with "range(s) out of sync" which will say how out of sync the data is.
- how do i clean up my stables ( grew from 6k to 20k since this started, while i shut writes off completely )
Sounds like repair is streaming a lot of differences.
If you have the space I would give Levelled compaction time to take care of it.
Hope that helps.
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 6/12/2012, at 1:32 AM, Andras Szerdahelyi > wrote:
hi list,
AntiEntropyService started syncing ranges of entire nodes ( ?! ) across my data centers and i'd like to understand why.
I see log lines like this on all my nodes in my two ( east/west ) data centres...
INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java (line 666) [repair #7c7665c0-3eab-11e2-0000-dae6667065ff] new session: will sync /X.X.1.113, /X.X.0.71 on range (85070591730234615865843651857942052964,0] for ( .. )
( this is around 80-100 GB of data for a single node. )
- i did not observe any network failures or nodes falling off the ring
- good distribution of data ( load is equal on all nodes )
- hinted handoff is on
- read repair chance is 0.1 on the CF
- 2 replicas in each data centre ( which is also the number of nodes in each ) with NetworkTopologyStrategy
- repair -pr is scheduled to run off-peak hours, daily
- leveled compaction with stable max size 256mb ( i have found this to trigger compaction in acceptable intervals while still keeping the stable count down )
- i am on 1.1.6
- java heap 10G
- max memtables 2G
- 1G row cache
- 256M key cache
my nodes' ranges are:
DC west
0
85070591730234615865843651857942052864
DC east
100
85070591730234615865843651857942052964
symptoms are:
- logs show sstables being streamed over to other nodes
- 140k files in data dir of CF on all nodes
- cfstats reports 20k sstables, up from 6 on all nodes
- compaction continuously running with no results whatsoever ( number of stables growing )
i tried the following:
- offline scrub ( has gone OOM, i noticed the script in the debian package specifies 256MB heap? )
- online scrub ( no effect )
- repair ( no effect )
- cleanup ( no effect )
my questions are:
- how do i stop repair before i run out of storage? ( can't let this finish )
- how do i clean up my stables ( grew from 6k to 20k since this started, while i shut writes off completely )
thanks,
Andras
Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84