Introduction

This page describes techniques and issues with measuring Linux system memory accurately. This is important for embedded systems since usually there is limited memory, and no swap space, available. It is currently (as of 2.4 and 2.6 kernels) very difficult to get an accurate count of used and free memory for the system. Having an accurate count could potentially enable better error handling for out-of-memory conditions, or error avoidance for low-memory conditions, in CE products.

This page currently lists 3 systems which aid in getting an accurate memory measurement for the Linux kernel:

/proc/meminfo: underestimates room by excluding pages which can shrink.

Therefore: we implemented a memory usage API to estimate current room of memory more exactly.

Page 17 - Memory Usage API 3/4

Memory Usage API:

Estimates amount of page cache and slabs to be reclaimed by shrink in addition to free pages.

Execution time < 1 msec

Remaining issues:

Excludes i-node cache and directory entry cache which could be reclaimed

omitted for complexity and time consumption

Race condition with shrink_caches() may cause inaccurate result

Page 18 - Memory Usage API 4/4

Memory Usage API gives a fairly good estimate of memory remaining.

Description:

A process was run to constantly allocate memory, eventually exhausting the memory of the machine. While this was running, the memory usage API was called to determine the amount of free memory remaining in the machine. The machine had no other activity on it. The amount of memory used by the process and the amount of memory remaining should add up to the total memory on the machine. The diagram shows a pink line (B) indicating the amount of memory used by the test program, a blue line (A) indicating the return value from the memory usage API, and a yellow line (A+B) showing the addition of the two values. The yellow line fluctuates slightly due to some inaccuracies (a race condition with shrink_caches), but overall stays fairly constant.

Description of algorithm

When the is API invoked:

Get the number of free pages using nr_free_pages()

Get the number of shrinkable page cache by inspecting active- and inactive- page cache list, and counting pages that can be free'ed. The inspection logic is basically same as shrink_cache(). The Difference is whether pages are actually free'ed or not.

Get the number of pages in slab free list.

Get the number of i-node cache and directory entry cache. We do not inspect the status of those caches in detail for saving time.

I think this implementation is not mature enough. For example, race condition between kswapd and this API can create some amount of error in the free page count.

Patch

Here's a patch which adds a new function to determine the "shrinkable" size of memory. This is against a 2.4.x
kernel.

Kernel 2.6 status

Sony has been ported this feature to 2.6.11; See the next section.

Sony detailed memory accounting

Watching user space program memory usage

The Linux kernel provides the ability to view certain pieces of information about system and per-process memory usage. However, the information currently provided is not detailed enough. The feature described here adds some extra memory instrumentation to the kernel, and reports more detailed information about process memory usage, via some new entries in the /proc filesystem.

The feature is described in detail in the specification below. In summary, however, the feature adds some global and some per-process entries in the /proc filesystem to provide detailed memory usage information. The following system-wide entries are added:

This function utilizes Memory Typed Allocation to handle different type memories with NUMA based thecnology. If you want to port this function to vanilla 2.4/2.6 kernel you should remove this dependancy.

for Kernel 2.6

Show detail page stat info, like PG_* flags; pages could be categorized as following; (need to check this categorization)

other type of shared page (need to show how many processes/threads share this)

non-shared page

active/inactive

dirty/clean

reseved/not

locked/not

pageout (not in-core)

cached/not cached

How about "/proc/<process id>/smaps" ? It shows the categorized memory usage of each sections of a process.

Kernel 2.6 status

Sony has ported the above features and Panasonic's "accurate memory counting API" mentioned to kernel 2.6.11. We replace new system call introduced by original 2.4 patch from Panasonic, to new /proc interface "/proc/freemem" for better acceptance.

Nokia out-of-memory notifier module

Description

The issue of low memory notification prior to OOM killing was raised at a previous AG meeting. Nokia pointed out that they had an LSM module for this and would see about getting the source available for it. This module was part of the kernel source for their 770 internet tablet. The code is implemented as an LSM module. Below is security/lowmem.c from the 770 kernel source

tree (2.6.12.3):

(Code was originally obtained from here There is a .deb file, which I de-archived with 'ar -x', then un-tarred data.tar.gz, and then un-tarred kernel-source-2.6.12.3.tar.bz2 and copied the file security/lowmem.c).

The heart of the measurement feature of this module is in the low_vm_enough_memory() routine, about midway through the source:

lowmem patch

kpagemap

Matt Mackall mainlined a new "kpagemap" system in kernel version 2.6.25.

This system provides detailed information about all pages used by processes on a system.

See the file Documentation/vm/pagemap.txt in the kernel source tree to learn
about the /proc interfaces used to obtain information from this system.

Matt gave a presentation on this system (before it was merged?) at Embedded Linux
Conference 2007. See Matt's presentation for details.

Kernelnewbies question about measuring memory

Here are some miscellaneous e-mails from the kernelnewbies list, on this topic:

>I know that some part of memory is free, but they are used in caches
>> to optimise the performance when the system needs to allocate more
>> memory. And, dentry caches and disk buffer_head are used to minimise
>> disk access. SO, give the current mem info from "cat /proc/meminfo",
>> how sould I calculate how much memory is really free creently in the
>> system?
>>
>
>>> > cat /proc/meminfo
>
>> [[Mem Total]]: 1017848 kB
>> [[Mem Free]]: 10380 kB
>> Buffers: 37480 kB
>> Cached: 149868 kB
>>
>> Can I just assume that 70% of un-used memory (un-used==mem_total -
>> buffers - cached) is free, without actually causing the system to
>> swapping?
is this what you are looking for ?
you may use _SC_AVPHYS_PAGES field of sysconf
#include <unistd.h>
eg : long ret == sysconf(_SC_AVPHYS_PAGES);
alternatively
#include <unistd.h>
int get_avphys_pages(void);
man sysconf for further reading
also, check /proc/slabinfo