Description:
------------
We had to start our FPM service during peak time, so traffic to the website was already heavily loaded. The FPM service is configured for pm = dynamic.
To take the initial spike in load off the FPM service we tried to inflate pm.start_servers only to discover that we can not set it higher than the value of the pm.max_spare_servers setting.
The pm.max_spare_servers setting essentially defines the maximum number of idle children in the pool, which when exceeded, FPM will kill off idle children starting with the oldest first at a rate of one every second.
The reasoning is unclear why pm.start_servers should be possibly constrained by such an "idle" pool size in the first place?
Pressing on, we also discovered issues with the pm.min_spare_servers setting, which essentially defines the number of additional spare children to spawn when the number of idle children in the pool drops below this value. This value can not be set higher than pm.max_spare_servers. Again, the reasoning behind this is unclear?
If pm.min_spare_servers is set to the same value as pm.max_spare_servers then this means that the pool size can only grow in multiples of pm.max_spare_servers or fewer. Or so we thought...
It turns out that FPM is hard-coded to spawn no more than 32 children per second only if pm.min_spare_servers is set to any value higher than 32.
As an extreme case, assume for the moment pm.max_children to be 256 and there are 0 idle children at any point in time. With pm.min_spare_servers set to a value higher than 32 and pm.start_servers and pm.max_spare_servers both set to 32, then starting FPM would take an additional 7 seconds to spin up to maximum capacity.
Raising pm.max_spare_servers and hence pm.start_servers helps to reduce the spin up time, but involves another reconfiguration and restart during offpeak time to restore the original configuration.
To avoid such a costly spin up time under high load conditions followed by an unnecessary reconfiguration later, it is beneficial to initially start a high number of children with a slow kill off of excess idle children when demand recedes, rather than a slow ramp up of children to meet the demand.
I propose the following changes be made:
1. The configuration constraint for pm.start_servers be replaced with it having a value between 1 and pm.max_children inclusive and not a value between pm.min_spare_servers and pm.max_spare_servers inclusive (the notion of pm.min_spare_servers as a lower constraint for pm.start_servers is nonsensical).
2. The configuration constraint for pm.min_spare_servers be replaced with it having a value between 1 and (pm.max_children - pm.max_spare_servers) inclusive. If this value is 0, then there will be no growth in pool size (c.f.: pm = static).
3. The default value for pm.start_servers be changed from (pm.min_spare_servers + ((pm.max_spare_servers - pm.min_spare_servers)/2)) to (pm.max_spare_servers + pm.min_spare_servers).
4. The configuration documentation and the manual for FPM be updated to take note of the above changes.
5. Update the manual that the fact that FPM will spawn additional spare children at a rate of no more than 32 children/second and kill of idle children at a rate of 1 child/second.