I understand that creating too many threads in an application isn't being what you might call a "good neighbour" to other running processes, since cpu and memory resources are consumed even if these threads are in an efficient sleeping state.

What I'm interested in is this: How much memory (win32 platform) is being consumed by a sleeping thread?

Theoretically, I'd assume somewhere in the region of 1mb (since this is the default stack size), but I'm pretty sure it's less than this, but I'm not sure why.

Any help on this will be appreciated.

(The reason I'm asking is that I'm considering introducing a thread-pool, and I'd like to understand how much memory I can save by creating a pool of 5 threads, compared to 20 manually created threads)

Quick potential correction - as I understand it, 1MB is the size of the default reserved stack size. Any single stack frame is likely to be much, much smaller than that.
–
Jon SkeetNov 1 '08 at 22:21

Thanks for that - not sure what I was thinking when I wrote that!
–
AlanNov 1 '08 at 23:13

Your premise is false. Creating too many threads is considered being a bad neighbor because it creates lots of extra context switches and cache pollution that make the whole system slow. Unless you're talking about ridiculous numbers of threads (tens of thousands) the memory consumption is negligible on modern systems.
–
David SchwartzApr 30 '12 at 4:34

6 Answers
6

I have a server application which is heavy in thread usage, it uses a configurable thread pool which is set up by the customer, and in at least one site it has 1000+ threads, and when started up it uses only 50 MB. The reason is that Windows reserves 1MB for the stack (it maps its address space), but it is not necessarily allocated in the physical memory, only a smaller part of it. If the stack grows more than that a page fault is generated and more physical memory is allocated. I don't know what the initial allocation is, but I would assume it's equal to the page granularity of the system (usually 64 KB). Of course, the thread would also use a little more memory for other things when created (TLS, TSS, etc), but my guess for the total would be about 200 KB. And bear in mind that any memory that is not frequently used would be unloaded by the virtual memory manager.

The page size for x86 and x64 is 4KB, for ia64 it is typically 8KB but is configurable.
–
Rob WalkerNov 1 '08 at 23:51

The allocation granularity (as returned from GetSystemInfo()) is 64 KB on x86 and x64. The VirtualAlloc() documentation seems to say that reservations are restricted by the allocation granularity, but pages in a block of reserved memory can be individually committed.
–
bk1eNov 2 '08 at 17:06

Memory is your second concern, not your first. The purpose of a threadpool is usually to constrain the context switching overhead between threads that want to run concurrently, ideally to the number of CPU cores available.

A context switch is very expensive, often quoted at a few thousand to 10,000+ CPU cycles.

A little test on WinXP (32 bit) clocks in at about 15k private bytes per thread (999 threads created). This is the initial commited stack size, plus any other data managed by the OS.

If you're using Vista or Win2k8 just use the native Win32 threadpool API. Let it figure out the sizing. I'd also consider partitioning types of workloads e.g. CPU intensive vs. Disk I/O into different pools.

I think you'd have a hard time detecting any impact of making this kind of a change to working code - 20 threads down to 5. And then add on the added complexity (and overhead) of managing the thread pool. Maybe worth considering on an embedded system, but Win32?

But usually, each processes is independent. Usually the system scheduler makes sure that each processes gets equal access to the available processor. Thus a multi threaded application time is multiplexed between the available threads.

Memory allocated to a thread will affect the memory available to the processes but not the memory available to other processes. A good OS will page out unused stack space so it is not in physical memory. Though if your threads allocate enough memory while live you could cause thrashing as each processor's memory is paged to/from secondary device.