Description

Looks like a cluster I left to create replica indexes overnight has crashed. At time of crash an empty MnesiaCore file was created, and attempts to restart couchbase service creates an empty erl_crash.dump. Excerpt from error log below with diags attached:

Spoke with Filipe about this....who explained that compaction doesn't start until indexing finishes.
So what happens is my couch data disk size is 3.5 GB and
Couchbase is going to try and create main and replica index files for 20 view, and worst case(if I re-emit the entire db), my cluster would have to reserve an extra 140 GB (3.5gb*40) for queries.

Tommie McAfee
added a comment - 29/Feb/12 12:27 PM Spoke with Filipe about this....who explained that compaction doesn't start until indexing finishes.
So what happens is my couch data disk size is 3.5 GB and
Couchbase is going to try and create main and replica index files for 20 view, and worst case(if I re-emit the entire db), my cluster would have to reserve an extra 140 GB (3.5gb*40) for queries.
Filipe says it's possible to implement some sort of incremental compaction or possibly giving compaction threads priority when necessary.

Unfortunately once we get out of disk space, we can't have query views with ?stale=ok or ?stale=update_after (default).
We get a file_error from within ns_server (somewhere in the HTTP handlers / ALE logger):

Technically the view engine is capable of serving queries with stale=ok|update_after if there's no space left on disk, as long as the logger doesn't crash when there's no disk space left.
Queries with ?stale=false will always get an error mentioning the posix error code 'enospc'.

Running out of disk space shouldn't cause corruptions, but other than that we cannot do anything. A possible future feature is to have an admin function to purge all indexes, which will have them rebuilt from scratch, but will take a lot of disk IO and potential application downtime, that might be easily resolved in another way by the administrator.

damien
added a comment - 17/Apr/12 3:27 PM Running out of disk space shouldn't cause corruptions, but other than that we cannot do anything. A possible future feature is to have an admin function to purge all indexes, which will have them rebuilt from scratch, but will take a lot of disk IO and potential application downtime, that might be easily resolved in another way by the administrator.