DevOps Stack Exchange is a question and answer site for software engineers working on automated testing, continuous delivery, service integration and monitoring, and building SDLC infrastructure. Join them; it only takes a minute:

Somewhere around I got to know about different approaches which were used to scale our webapp which included Scaling using local request counters. Below that they had written the drawbacks of this approach adding that

Each instance would reach the threshold almost at the same time and
hence each would demand a new instance, leading to a large number of
instances even though the number should have increased by one only

I was curious to know if there's a solution to this problem or any workaround ?

Yes don't use local counters, use a central system to handle the overall load and scale accordingly. I don't get what more you are after, your quote sounds absolutely clear for me. Could you elaborate on what you are missing ?
– Tensibai♦Apr 10 '17 at 11:56

What if you named the instance you are creating the number that is being requested, and thus when you go to try to create "instance_5" the name is already in use and can't be created?
– aviApr 10 '17 at 13:56

2 Answers
2

On the instances serving your webapp continue to monitor the number of incoming requests – and anything else you see fit.

Publish the number of incoming requests to a monitoring system. If it is not yet implemented, this step will improve your monitoring capacities, and will help you to monitor the load on each hosts as well as host balancing.

From the incoming requests, deduce an estimated required number of webaspp servers needed to serve that work load, as well as the difference between the actual number and the estimated needed number. In the case you describe, it seems that the estimated number is just a scalar function of the total number of incoming requests on a recent period of time. On other systems or after some times, more subtle strategies can be implemented. Monitoring these quantities and the difference could ease the traceability of the auto-scaling strategy, and will monitor its responsivity.

Last implement the auto-scaling itself, at this point, this is really just reading some number from your monitoring system and writing it to your scaling system.

One possible approach is to allow such instances to make demands for new instances based on local request counters, but instead of directly reacting to those demands you would funnel them to a central instance creation logic.

That logic would immediately react to the first demand, but also start a "cool off" countdown timer. Any subsequent demand received while the timer is still active would be considered to be caused by the same traffic spike that triggered the first demand and would thus be ignored.

A similar logic could be used to gradually shutdown idle instances, if the local counters remain below a minimum level.

Note: the local counters would need to operate in a leaky bucket manner or be periodically reset for such approach to be possible - so that the demands are repeated if the traffic remains high.

Another possible approach is to just publish the local counters. A central piece of logic would periodically collect and aggregate these local counter values in order to decide on lauching new instances or shutting down existing ones.

The advantage of such method is the lack of a single global counter which would require write access locking (to prevent corruption) which would be a scalability limitation.