Tag Archives: Linux

ELLCC uses the very cool musl standard C library as a replacement for the normal Linux standard library. In the latest version of ELLCC, ELLCC was not able to compile itself on an x86_64 Fedora 20 Linux system. I was stumped for a while trying to track down the problem. It was weird: self hosting worked on a 32 bit Linux system (Fedora 19), but failed on a 64 bit system? Furthermore, self hosting only failed with ELLCC compiled with itself and linked with musl, but not with ELLCC compiled with itself and linked with glibc.

04:34:00 PM - rdp: OK. malloc fails on my x86_64 linux after about 65527 4K allocations with musl malloc(). glibc malloc() doesn't, probably because it reverts to mmap() if brk fails. Yet I don't see any resource limits set. The gloibc brk() also failes after about 64K allocations.
04:37:52 PM - dalias: rdp, oh, we've seen this before
04:37:57 PM - dalias: it's a kernel bug with some optional kernel feature
04:38:21 PM - dalias: it keeps the kernel from merging adjacent vma's, so you end up with 64k pages each as their own tiny vma
04:38:36 PM - rdp: Excellent.
04:38:38 PM - dalias: it would happen if we used mmap too
04:38:50 PM - dalias: the reason it doesn't affect glibc is that they allocate huge amounts at a time
04:38:58 PM - rdp: Ah.
04:38:59 PM - dalias: and thereby waste memory if the program doesn't actually need much
04:39:17 PM - dalias: i'll try to find the option
04:39:21 PM - rdp: Any work around?
04:39:25 PM - rdp: OK. Thanks.
04:40:20 PM - dalias: CONFIG_MEM_SOFT_DIRTY
04:40:23 PM - dalias: turn it off
04:40:28 PM - dalias: there might be a way to do it at runtime
04:40:42 PM - dalias: or you could increase the limit on # of vma's
04:40:50 PM - dalias: but basically this option wastes MASSIVE amounts of ram
04:40:57 PM - dalias: by refusing to merge vma's
04:41:47 PM - dalias: it's a hack to make process checkpointing (save and restore running processes) more efficient
04:42:01 PM - dalias: by better tracking what has changed
04:42:50 PM - dalias: i don't see a way to turn it off
04:42:55 PM - dalias: check /proc/$pid/maps tho
04:43:08 PM - dalias: you should see a separate line for each page (i.e. 64k lines)
04:43:16 PM - dalias: if this is the issue that's affecting you
04:43:39 PM - rdp: I do.
04:43:51 PM - dalias: ok then this is the issue
04:43:57 PM - dalias: you can just up the limit if you want
04:44:04 PM - dalias: /proc/sys/vm/max_map_count
04:44:09 PM - dalias: but again this is expensive
04:44:16 PM - dalias: you want to disable CONFIG_MEM_SOFT_DIRTY
04:44:21 PM - dalias: and we really need to report this bug to the kernel folks
04:44:25 PM - dalias: i don't think they're aware of it
...
04:45:11 PM - rdp: dalias: Thanks.
...
04:46:47 PM - rdp: dalias: is it x86_64 specific? Not on i386?
...
04:49:13 PM - dalias: rdp, i think it may be
04:50:26 PM - dalias: http://stackoverflow.com/questions/20997809/analyzing-cause-of-performance-regression-with-different-kernel-version
04:50:28 PM - feepbot: Analyzing cause of performance regression with different kernel version - Stack Overflow
04:51:38 PM - dalias: the accepted answer tracked down the cause of the soft_dirty bug and seems to cover how to fix it
...
04:53:02 PM - rdp: gotta love stackoverflow
...
05:47:19 PM - dalias: rdp, haha with regard to that SO answer:
05:47:26 PM - dalias: Finally fixed in Linux 3.13.3 and Linux 3.12.11, released 2014-02-13. – osgx 21 hours ago
05:57:32 PM - rdp: dalias: :-)
...
07:00:00 PM - dalias: rdp, i think it would be worth adding the issue you had to the faq on the wiki
07:00:51 PM - dalias: with a link to the stack overflow question/answer and information that it's fixed in 3.13.3, and that you can work around it by turning off CONFIG_MEM_SOFT_DIRTY (good fix) or increasing max_map_count (expensive fix)

For now, I got around the problem by using Rich’s expensive fix option (as superuser):

echo 128000 > /proc/sys/vm/max_map_count

Why didn’t ELLCC linked with glibc fail? Somebody considered it a bug at one point, but the glibc maintainers disagreed, I guess.
[Update]
Rich pointed out that my guess about why the glibc malloc() doesn’t fail is probably wrong. But it is still a kernel bug nevertheless.