Still stuck with a severely under-performing cluster where with every
version change the problem seems to change.
I'm curious to know if anyone has devised a systematic approach to
finding issues in their configurations.
A simple method would be to strip down a configuration to the most basic
form, then add configuration options until things break down. Of course
when the most basic configuration doesn't work, this isn't enough.
What tools do people look to, and what are they most suspicious of
initially? I'm concerned I may be overlooking something major as after
going through hardware saturation checks and service logs, returning
everything to the simplest configuration possible and trying various
other configurations, I'm now mainly diving around the source code and
stack dumps from the master/region servers/datanodes/namenode and making
no progress, and I don't feel this is the right approach, but can't see
any other alternative.
http://hstack.org/hbase-performance-testing/ has been one source of some
pretty good advice, as well as the o'reilly books.