Problem: A single client of our system can completely flood all available resources with a massive workload. You can consider we have only one queue, and anyone can schedule any amount of work in it. Any other client that subsequently submits a small amount of work will have to wait until the first tasks have been fully treated. It is an asynchronous system so it will there's no risk for DOS. The problem we'd like to solve is to allow our clients to have a fair amount of the processing at any time, no matter if a few clients have submitted a lot of work. It is a distributed system with a good amount of workers, all work is chunked into small pieces so tasks flow correctly through the system.

This seems like a very common problem to me and I'm a bit alarmed by the fact that I don't find a very simple solution. It is similar to process scheduling in an operating system in that the processes are given processing slots in a round robin fashion, no one process can preschedule a lot of work.

One solution would be to use a particular queuing topology. One queue per user, that feeds a small bounded queue. Because of the small amount of work in the latter queue no one process can monopolize the workers for an extended amount of time. Like this

I expected this to be easy to implement in RabbitMQ or possibly ZeroMQ, but there are several challenges. First I need to manually create a new queue if a new user submits work. Second and more importantly it seems I'd have to implement the red part myself, listening to all queues in a non blocking fashion in order to submit them to the bounded queue.

My concern is that I'm working with very low level abstractions here, all I want is fair task scheduling of capacity limitation. Basically create some backpressure to allow scheduling to happen just before the actual work and hence disallow any user to monoplize the system.

You'll probably want to very carefully define the term 'fair' as used in this context. No matter what definition you pick, there will be some combination of events that a client will claim is 'unfair' to them.
– Dan PichelmanJun 1 '18 at 21:58

You're right. I expect our perception of fair will change over time as we observe what happens in this new topology. Nevertheless I can see how to complexify this by for instance introducing a queue for each organisation between the user queues and the round robin reader, to ensure that each org gets as much computing power etc. What is important at this stage is to make sure I don't reinvent the wheel with this solution (I really have that impression) AND that I don't miss other solutions that would take a very dfferent approach.
– Johan MartinssonJun 4 '18 at 8:23

I'm not sure your question makes a lot of sense - "allow scheduling to happen just before the actual work arrives" is a bit of a paradox, isn't it? If you know the work is coming, you can design the system to accommodate it. If you don't know the work is coming, how long do you wait to figure out it isn't coming?
– rmayer06Sep 21 '18 at 16:53

I have a problem of state-management in that I have to start and probably stop (for memory) to listen to some queues whenever they're inactive. My feeling is that if I distribute that state, the problem is more difficult to handle, whereas as is I can simply reboot the one or the few "round robin" reposters I have. Also, for information, I have a ~fixed amount of workers.
– Johan MartinssonJun 4 '18 at 8:17

I'm not sure I agree with the premise of the question - and here's why.

Mathematical Theory

According to Queuing Theory, in any First-In-First-Out stochastic process, you have an arrival rate for requests and an average time the process takes to service requests. The process then operates according to the rules of a Poisson Process, and we can tell analytically how long the queue length is likely to be at any point, as well as the throughput for the process.

So, from an analytical/design standpoint, we have to make sure that a few assumptions are valid:

The arrival of requests is a random variable with an exponential distribution

The service time for an individual request is a random variable with a exponential distribution

In practice, these assumptions are difficult to meet exactly, but they provide good enough approximations to do some basic design and analysis. If they are grossly violated (e.g. your system handles batches of requests, or some requests take significantly longer than others by their nature), then your design work should be focused on bringing the system into a state where the basic assumptions are no longer violated.

The Situation

So, given the above, I will assume you have a stochastic process with request arrival rate l and service time m, and number of processors c. If c*m is less than your arrival rate, then your process is unstable and the queue size will become large.

So, the goal of your system design is to ensure that at all reasonable times, there are enough processors (c) available to service the requests that are expected to arrive during the interval (~l) based on the typical request processing time (m) and subject to a reasonable maximum time spent in the system, including the queue.

If you have done this, then you have adequately designed the system to service the requests it expects to handle. In practice, this approach works for so many situations that I'm not aware of any other queuing approaches that are commonly used in these types of systems. You could set up priority queuing without preemption with relative ease (similar to how the first-class lanes work at the airport check-in), but how would that be any more "fair"?