Den 23.05.2011 06:59, skrev "Martin v. Löwis":
>> My expectation is that your approach would likely make the issues
> worse in a multi-CPU setting. If you put multiple reference counters
> into a contiguous block of memory, unrelated reference counters will
> live in the same cache line. Consequentially, changing one reference
> counter on one CPU will invalidate the cached reference counters of
> that cache line on other CPU, making your problem a) actually worse.
In a multi-threaded setting with concurrent thread accessing reference
counts, this would certainly worsen the situation.
In a single-threaded setting, this will likely be an improvement.
CPython, however, has a GIL. Thus there is only one concurrently active
thread with access to reference counts. On a thread switch in the
interpreter, I think the performance result will depend on the nature of
the Python code: If threads share a lot of objects, it could help to
reduce the number of dirty cache lines. If threads mainly work on
private objects, it would likely have the effect you predict. Which will
dominate is hard to tell.
Instead, we could use multiple heaps:
Each Python thread could manage it's own heap for malloc and free (cf.
HeapAlloc and HeapFree in Windows). Objects local to one thread only
reside in the locally managed heap.
When an object becomes shared by seveeral Python threads, it is moved
from a local heap to the global heap of the process. Some objects, such
as modules, would be stored directly onto the global heap.
This way, objects only used by only one thread would never dirty cache
lines used by other threads.
This would also be a way to reduce the CPython dependency on the GIL.
Only the global heap would need to be protected by the GIL, whereas the
local heaps would not need any global synchronization.
(I am setting follow-up to the Python Ideas list, it does not belong on
Python dev.)
Sturla Molden