cassandra-user mailing list archives

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

Date

Thu, 28 Jul 2011 21:57:18 GMT

Benchmarks was done with up to 96GB memory, much more caching than most people will ever have.
The point anyway is that you are talking I/O in 10's or at best, a few hundred MB/sec before
cassandra will eat all your CPU (with dual CPU 6 cores in our case).
The memcopy involved here deep inside the kernel will not be very high on the list of expensive
operations.
The assumption also seems to be that mmap is "free" cpu wise.
It clearly isn't. There is definitely work involved for the CPU also when doing mmap. It is
just that you move it from context switching and small I/O buffer copying to memory management.
Terje
On Jul 29, 2011, at 5:16 AM, Jonathan Ellis wrote:
> If you're actually hitting disk for most or even many of your reads then mmap doesn't
matter since the extra copy to a Java buffer is negligible compared to the i/o itself (even
on ssds).
> On Jul 28, 2011 9:04 AM, "Terje Marthinussen" <tmarthinussen@gmail.com> wrote:
> >
> > On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:
> >
> >> This is not advisable in general, since non-mmap'd I/O is substantially slower.
> >
> > I see this again and again as a claim here, but it is actually close to 10 years
since I saw mmap'd I/O have any substantial performance benefits on any real life use I have
needed.
> >
> > We have done a lot of testing of this also with cassandra and I don't see anything
conclusive. We have done as many test where normal I/O has been faster than mmap and the differences
may very well be within statistical variances given the complexity and number of factors involved
in something like a distributed cassandra working at quorum.
> >
> > mmap made a difference in 2000 when memory throughput was still measured in hundreds
of megabytes/sec and cpu caches was a few kilobytes, but today, you got megabytes of CPU caches
with 100GB/sec bandwidths and even memory bandwidths are in 10's of GB/sec.
> >
> > However, I/O buffers are generally quiet small and copying an I/O buffer from kernel
to user space inside a cache with 100GB/sec bandwidth is really a non-issue given the I/O
throughput cassandra generates.
> >
> > In 2005 or so, CPUs had already reached a limit where I saw that mmap performed
worse than regular I/O on as a large number of use cases.
> >
> > Hard to say exactly why, but I saw one theory from a FreeBSD core developer speculating
back then that the extra MMU work involved in some I/O loads may actually be slower than cache
internal memcopy of tiny I/O buffers (they are pretty small after all).
> >
> > I don't have a personal theory here. I just know that especially on large amounts
of smaller I/O operations regular I/O was typically faster than mmap, which could back up
that theory.
> >
> > So, I wonder how people came to this conclusion as I am, under no real life use
case with cassandra, able to reproduce anything resembling a significant difference and we
have been benchmarking on nodes with ssd setups which can churn out 1GB/sec+ read speeds.
> >
> > Way more I/O throughput than most people have at hand and still I cannot get mmap
to give me better performance.
> >
> > I do, although subjectively, feel that things just seem to work better with regular
I/O for us. We have currently have very nice and stable heap sizes at regardless of I/O loads
and we have an easier system to operate as we can actually monitor how much memory the darned
thing work.
> >
> > My recommendation? Stay away from mmap.
> >
> > I would love to understand how people got to this conclusion however and try to
find out why we seem to see differences!
> >
> >> The OP is correct that it is best to disable swap entirely, and
> >> second-best to enable JNA for mlockall.
> >
> > Be a bit careful with removing swap completely. Linux is not always happy when it
gets short on memory.
> >
> > Terje