The issue #3329 proposes an API to replace memory allocator functions. But Python calls directly malloc(), realloc() and free() in some functions, so custom allocators would not be used there.
Examples of functions calling malloc/realloc/free directly: _PySequence_BytesToCharpArray(), block_new() (of pyarena.c), find_key() (of thread.c), PyInterpreterState_New(), win32_wchdir(), posix_getcwd(), Py_Main(), etc.
We have to be careful with the GIL: PyMem_*() functions can only be called when holding the GIL.

> Be aware about external code which allocate memory itself (i.e. expat).
Well, I know that it will hard to cover 100% of the stdlib. I just want to replace the most obvious calls.
Some libraries can be configured to use a custom memory allocators:
- zlib: zalloc and zfree, http://www.zlib.net/manual.html#Usage
- bz2
- lzma: LzmaEnc_Create parameter
- OpenSSL: CRYPTO_set_mem_functions
- etc.
We should probably uses these functions to reuse Python allocators (PyMem_Malloc()), or maybe only if PyMem_SetAllocators() has been called? (if Python does not use system allocators, aka malloc)
See also #18178 for libffi, it may be related.
The _decimal module configures libmpdec to use PyMem_Malloc:
#define _Py_DEC_MINALLOC 4
mpd_mallocfunc = PyMem_Malloc;
mpd_reallocfunc = PyMem_Realloc;
mpd_callocfunc = mpd_callocfunc_em;
mpd_free = PyMem_Free;
mpd_setminalloc(_Py_DEC_MINALLOC);

> We have to be careful with the GIL: PyMem_*() functions can only be
> called when holding the GIL.
> Some libraries can be configured to use a custom memory allocators:
> [...]
> We should probably uses these functions to reuse Python allocators
> (PyMem_Malloc())
I think there's a potential problem here :)

>> We have to be careful with the GIL: PyMem_*() functions can only be
>> called when holding the GIL.
> (...)
> I think there's a potential problem here :)
I didn't understand the motivation to require the GIL held for PyMem_Malloc(). I searched in the source code history and on the Internet (archives of python-dev). In my opinion, the restiction is motivated by a bug: PyMem_Malloc() calls (indirectly) PyObject_Malloc() in debug mode, and PyObject_Malloc() is not thread-safe.
I opened a thread on python-dev to discuss this point.

Keeping the GIL requirement is _very_ useful for PyMem_MALLOC et al. It allows applications to hook in their own monitoring code, accessible from python itself, without having to worry about conflicts with python.
even if it were not for the GIL itself, PyMem_Malloc() may have all sorts of side effects.
Because of this, and to allow ourselves the flexibility to do all sorts of things inside PyMem_Malloc(), at CCP we added a parallel api, PyMem_MALLOC_RAW() etc.
This api is guaranteed to delegate directly to the external allocator (malloc by default, or an embedding application's supplied allocastor)
We have patched pythoncore in 2.7 in all places there were using malloc directly using the file attached to the defect. Notice how it can patch "malloc" in two different ways, using either regular malloc (in non-sensitive areas) and using the raw malloc (in sensitive areas.)
e.g. thread.c contains the following lines in our branch:
#include "Python.h"
/* patch malloc/free with threadsafe python versions */
#define CCPMEM_PATCH_RAW
#include "ccpmem_patch.h"

I commited my new API to customize memory allocators:
New changeset 6661a8154eb3 by Victor Stinner in branch 'default':
Issue #3329: Add new APIs to customize memory allocators
http://hg.python.org/cpython/rev/6661a8154eb3
I added PyMem_RawMalloc(), PyMem_RawRealloc() and PyMem_RawFree() in the same commit. These functions are wrappers to malloc/realloc/free which can be called without the GIL held. Using these new functions instead of malloc/realloc/free is interesting because the internal functions can be replaced with PyMem_SetRawAllocators() and many checks are added in debug mode (ex: check for buffer under- and overflow).

py_finalize.patch: modify Py_Finalize() to destroy the GIL after the last call to PyMem_Malloc() or PyObject_Malloc(). For example,
PyCFunction_Fini() calls PyObject_GC_Del() which calls PyObject_FREE().

malloc_init.patch: Patch for functions called at Python initialization.
This patch is not complete: to parse "-X" option (PySys_AddXOption) and "-W" (PySys_AddWarnOption), PyMem_Malloc() and PyObject_Malloc() are still called indirectly. Fixing this issue may need to reorganize completly how Python is initialized, because we need to have basic Python types (ex: Unicode) to be ready to be able to parse the command line. But we cannot initialize too much because the Python initialization also depends on options from the command line...

Ok, initial patches are attached. Let describe them a little bit.
- pymem_debugcheckgil-2.patch: I don't think that this patch can be commited before the "bootstrap" issue is solved (Python doesn't start in debug mode with this patch when -W or -X command line option is used). I wrote it to check that other patches are correct (check if the GIL is held when PyMem_Malloc is called).
- py_finalize.patch: should be safe
- malloc_init.patch: change _Py_char2wchar() API, the result must now be freed by PyMem_RawFree() instead of PyMem_Free(). the change does not hurt in release mode (both functions are just wrapper to free()), but may break in debug mode (because python checks that PyMem_RawFree() is called on a buffer allocated by PyMem_RawMalloc()) in extension modules using _Py_char2wchar(). I would like to _Py_char2wchar() API (not to solve this issue, just because it is very useful for applications embedding Python), so changing its name would avoid a crash (applications would get a compilation or link error instead). Other change of the patch: replace free() and PyMem_Free() with PyMem_RawFree(), should be safe.
- malloc_modules.patch: I replaced many malloc() with PyMem_Malloc(), which is not safe.
All these patches must be reviewed carefully to check if the GIL is held or not, and tested with pymem_debugcheckgil-2.patch (on Windows too! some patched functions os posixmodule.c are only compiled on Windows).

Note that CPython's main function accesses the C API before calling Py_Initialize(). This is insane, but fixing it is hard (and one of the main goals of PEP 432).
I suggest using Py_IsInitialized() to exclude those from your debug checks for now.