Summary

Find tools to profile memory use in Python programs, document them, and if necessary, tweak them.

Release Note

This spec has no direct impact on end users.

Rationale

Relatively large parts of the software in an Ubuntu system are written in Python. The memory requirements of Ubuntu are growing. Tools to profile memory use in Python programs are needed. Since many of the tools that most use memory have a graphical user interface, the tools need to work with programs that use PyGTK.

Use Cases

Crabtree is a Python hacker, and wants to know why deskbar-applet takes up so much memory.

Overview

Memory is used in many ways on a Linux system. The kernel allocates memory by page, collected into areas. Pages can be filled with data loaded from files, and such pages may be read-only or read-write. Writeable pages can be "clean", i.e., identical to the data in the file. Pages may also be in RAM or in swap. Read-only pages may be shared between processes. Plus other complications. Because of all this, it is not enough to just look at how much memory is allocated to a process to determine its memory cost.

Memory profiling tools need to look at each area to see whether it is read-only or writeable, and if writeable, whether it is clean (same as on disk) or dirty (modified). A clean read-only page can be immediately freed by the kernel, whereas a dirty writeable page cannot. The latter page has a bigger memory cost. Indeed, it is not a bad idea to concentrate on minimizing the number of dirty pages in a process, when minimizing memory requirements. Clean pages can be freed and demand-paged back in as necessary, and shared with the disk block cache. (This is highly simplistic, but good enough for a first approximation, at least.)

Python memory use

Looking at the number of dirty pages used by a Python program (or rather, the Python interpreter while running a Python program) does not help much when reducing memory requirements. There needs to be tools specific to Python to profile how the memory is used: what objects exists, how much memory they use, how many there are, which part of the code created them, and so on.

Because Python manages memory and has its own garbage collector, the memory profiling tool should also be able to tell how well that works: if there is a lot of garbage in Python's memory heap, and the garbage collector is not called to free it, then things are bad.

System-level tools

top, htop: list processes according to CPU or memory usage, or other criteria