History

Personally, I prefer using M_ARENA_MAX=1 (via MALLOC_ARENA_MAX
env) myself, but there is currently a performance penalty for
that.

gc.c (Init_GC): set M_ARENA_MAX=2 for glibc malloc

This is not desirable in the longer term.

CRuby will likely get true concurrency in the future via ko1's Guild proposal. Reducing arenas will create new contention and serialisation at the memory allocator level thus negating the full benefits of Guilds.

Debate is currently occurring in feature $14718 about using jemalloc to solve Ruby's memory fragmentation issue on Linux. Resolution of that (one way or the other) should inform what to do here.

Yusuke, your script doesn't create any memory fragmentation, it throws away everything after 1600 and reads the exact same amount of data each time. I don't believe this is how Rails apps behave; they fragment over time. My script creates random sized data and holds onto 10% of the data to create "holes" in the heap and fragment memory quickly. I believe this better represents normal app conditions. I've edited your script slightly to randomly keep some data; it better matches the results I posted earlier. I think changing the IO to read random sizes would also exhibit worse memory:

I tried to change Mike's script to use I/O, and I've created a
script that works best with glibc with no MALLOC_ARENA_MAX
specified.

Interesting, you found a corner case of some fixed sizes where
the glibc default appears the best.

I tested 16K instead of 64K since my computer is too slow
and 16K is the default buffer size for IO.copy_stream,
net/protocol, etc...) and the default was still best
in that case.

So, I wonder if there is a trim threshold where this happens
and multiple arenas tickles the threshold more frequently.

However, I believe Mike's script of random sizes is more
representative of realistic memory use. Unfortunately,
srand+rand alone is not enough to give consistently reproducible
results for benchmarking with threads...

Maybe a single thread needs to generate all the random numbers
and feed them round-robin to per-thread SizedQueue for
deterministic results.

Btw, has anybody tested this patch with various allocators
to ensure it doesn't trigger a conflict (resulting in a
segfault) when using the other allocator via LD_PRELOAD?
jemalloc seems fine, but I'm not sure about the others.

It will be a hard choice to have slow allocations when there are many guilds,
or remove this behavior and regress on existing programs not using guilds.
What does ko1 (Koichi Sasada) think about this?
And any idea when guilds could land?

We will probably match arenas to Guild count dynamically;
depending on whether the program uses Guilds or not.

glibc checks arena_max whenever a new thread allocates memory
for the first time, so arena_max doesn't need to be frozen at
Ruby startup.

I don't know about ko1's timeline, but glibc releases every
6 months (Aug and Feb); and Carlos's willingness to accept
URCU use in glibc is increasing my interest in contributing
to glibc to fix malloc problems for all GNU users :>

We will probably match arenas to Guild count dynamically;
depending on whether the program uses Guilds or not.

glibc checks arena_max whenever a new thread allocates memory
for the first time, so arena_max doesn't need to be frozen at
Ruby startup.

Ah, great to know! I have no objection then.

I don't know about ko1's timeline, but glibc releases every
6 months (Aug and Feb); and Carlos's willingness to accept
URCU use in glibc is increasing my interest in contributing
to glibc to fix malloc problems for all GNU users :>

One question: is it possible to cancel the effect of M_ARENA_MAX ? Given mame (Yusuke Endoh)'s corner case, it might be desirable for a user (or sysadmin) to be able to choose the behaviour between the proposed one and the status quo.

One question: is it possible to cancel the effect of
M_ARENA_MAX ? Given mame (Yusuke Endoh)'s corner case, it might be
desirable for a user (or sysadmin) to be able to choose the
behaviour between the proposed one and the status quo.

Environment variable (MALLOC_ARENA_MAX) at startup (only),
or they can use fiddle or C extension to call mallopt
at anytime once a program is running.

One question: is it possible to cancel the effect of
M_ARENA_MAX ? Given mame (Yusuke Endoh)'s corner case, it might be
desirable for a user (or sysadmin) to be able to choose the
behaviour between the proposed one and the status quo.

Environment variable (MALLOC_ARENA_MAX) at startup (only),
or they can use fiddle or C extension to call mallopt
at anytime once a program is running.

Yes the question is, what exactly is the value of MALLOC_ARENA_MAX that a user should specify to let malloc behave as it works in 2.5 now?

Ruby use generally falls into one of two categories: short-lived or very long-lived.

For short-lived Ruby scripts MALLOC_ARENA_MAX could, and maybe should, be left as is?

For long-lived Ruby processes MALLOC_ARENA_MAX absolutely should be reduced from the glibc default as noted in this request and #14718. Hopefully, in time glibc can be improved, but that may be many years away.

I wonder if a new runtime flag should be added to the Ruby executable, --long-lived? That flag could tweak internals to favour long runtimes. For the moment it could tweak MALLOC_ARENA_MAX, in the future it could tweak other aspects to favour low memory fragmentation.

I suspect some/many folks would not like that since that will not be the default behaviour. But that would avoid the weird regression noted above that concerns @shyouhei. Ideally we don't want to penalise short-lived Ruby users for the benefit of long-lived Ruby users.