Comments

This patch turns QFQ into QFQ+, a variant of QFQ that provides the
following two benefits: 1) QFQ+ is faster than QFQ, 2) differently
from QFQ, QFQ+ correctly schedules also non-leaves classes in a
hierarchical setting. The way how QFQ+ achieves these goals is
discussed briefly below. I also report some performance
measurements. A detailed description of QFQ+ and of these results can
be found in [1].
QFQ+ achieves a higher speed than QFQ by grouping classes into
aggregates, and uses the original QFQ scheduling algorithm to schedule
aggregates instead of single classes. An aggregate is made of at most
M classes, all with the same weight and maximum packet size. M is
equal to the minimum between tx_queue_len+1 and 8 (value chosen to get
a good trade-off between execution time and service guarantees). QFQ+
associates each aggregate with a budget equal to the maximum packet
size for the classes in the aggregate, multiplied by the number of
classes of the aggregate. Once selected an aggregate for service, QFQ+
dequeues only the packets of its classes, until the aggregate finishes
its budget. Finally, within an aggregate, classes are scheduled with
DRR. In my measurements, described below, the execution time of QFQ+
without TSO/GSO and with M=8 was from 16% to 31% lower than that of
QFQ, and close to that of DRR.
QFQ+ does not use packet lengths for computing aggregate timestamps,
but budgets. Hence it does not need to modify any timestamp if the
head packet of a class changes. As a consequence, differently from
QFQ, which uses head-packet lengths to compute class timestamps, QFQ+
does not need further modifications to correctly schedule also
non-leaf classes and classes with non-FIFO qdiscs.
As for service guarantees, thanks to the way how M is computed, the
service of QFQ+ is close to the one of QFQ. For example, as proved in
[1], under QFQ+ every packet of a given class is guaranteed the same
worst-case completion time as under QFQ, plus an additional delay
equal to the transmission time, at the rate reserved to the class, of
three maximum-size packet. See [1, Section 7.1] for a numerical
comparison among the packet delays guaranteed by QFQ+, QFQ and DRR.
I measured the execution time of QFQ+, DRR and QFQ using the testing
environment [2]. In particular, for each scheduler I measured the
average total execution time of a packet enqueue plus a packet
dequeue, without TSO/GSO. What would happen with TSO/GSO is discussed
at the end of this description. For practical reasons, in the testing
environment each enqueue&dequeue is also charged for the cost of
generating and discarding an empty, fixed-size packet (using a free
list). The following table reports the results with an i7-2760QM,
against four different class sets. Time is measured in nanoseconds,
while each set or subset of classes is denoted as
<num_classes>-w<weight>, where <num_classes> and <weight> are,
respectively, the number of classes and the weight of every class in
the set/subset (for example, 250-w1 stands for 250 classes with weight
1). For QFQ+, the table shows the results for the two extremes for M:
1 and 8 (see [1, Section 7.2] for results with other values of M and
for more information).
-----------------------------------------------
| Set of | QFQ+ (M) | DRR QFQ |
| classes | 1 8 | |
|-----------------------------------------------|
| 1k-w1 | 89 63 | 56 81 |
|-----------------------------------------------|
| 500-w1, | | |
| 250-w2, | 102 71 | 87 103 |
| 250-w4 | | |
|-----------------------------------------------|
| 32k-w1 | 267 225 | 173 257 |
|-----------------------------------------------|
| 16k-w1, | | |
| 8k-w2, | 253 187 | 252 257 |
| 8k-w4 | | |
-----------------------------------------------
About DRR, it achieves its best performance when all the classes have
the same weight. This is fortunate, because in such scenarios it is
actually pointless to use a fair-queueing scheduler, as the latter
would provide the same quality of service as DRR. In contrast, when
classes have differentiated weights and the better service properties
of QFQ+ make a difference, QFQ+ has better performance than DRR. It
happens mainly because QFQ+ dequeues packets in an order that causes
about 8% less cache misses than DRR. As for the number of
instructions, QFQ+ executes instead about 7% more instructions than
DRR, whereas QFQ executes from 25% to 34% more instructions than DRR.
With TSO/GSO, the number of instructions per packet enqueue/dequeue
executed by QFQ and QFQ+ is the same as without TSO/GSO. In contrast,
for each dequeue, the number of iterations executed by the main loop
of DRR is 64K/quantum times as high as without TSO/GSO.
Paolo
[1] P. Valente, "Reducing the Execution Time of Fair-Queueing Schedulers"
http://algo.ing.unimo.it/people/paolo/agg-sched/agg-sched.pdf
[2] http://algo.ing.unimo.it/people/paolo/agg-sched/test-env.tgz
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
---
net/sched/sch_qfq.c | 834 +++++++++++++++++++++++++++++++++++----------------
1 file changed, 571 insertions(+), 263 deletions(-)

From: Paolo Valente <paolo.valente@unimore.it>
Date: Tue, 20 Nov 2012 18:45:13 +0100
> - struct sk_buff *skb;> + struct sk_buff *skb = NULL;
This is not really an improvement, now the compiler can think
that NULL is passed eventually into qdisc_bstats_update().
Please make the logic easier for the compiler to digest.
For example, restructure the top-level logic into something like:
skb = NULL;
if (!list_empty(&in_serv_agg->active))
skb = qfq_peek_skb(in_serv_agg, &cl, &len);
else
len = 0; /* no more active classes in the in-service agg */
if (len == 0 || in_serv_agg->budget < len) {
...
/*
* If we get here, there are other aggregates queued:
* choose the new aggregate to serve.
*/
in_serv_agg = q->in_serv_agg = qfq_choose_next_agg(q);
skb = qfq_peek_skb(in_serv_agg, &cl, &len);
}
if (!skb)
return NULL;
That way it is clearer, to both humans and the compiler, what is
going on here.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Il 20/11/2012 19:54, David Miller ha scritto:
> From: Paolo Valente <paolo.valente@unimore.it>> Date: Tue, 20 Nov 2012 18:45:13 +0100>>> - struct sk_buff *skb;>> + struct sk_buff *skb = NULL;>> This is not really an improvement,
Sorry for trying this silly short cut
now the compiler can think
> that NULL is passed eventually into qdisc_bstats_update().>> Please make the logic easier for the compiler to digest.>> For example, restructure the top-level logic into something like:>> skb = NULL;> if (!list_empty(&in_serv_agg->active))> skb = qfq_peek_skb(in_serv_agg, &cl, &len);> else> len = 0; /* no more active classes in the in-service agg */>> if (len == 0 || in_serv_agg->budget < len) {> ...> /*> * If we get here, there are other aggregates queued:> * choose the new aggregate to serve.> */> in_serv_agg = q->in_serv_agg = qfq_choose_next_agg(q);> skb = qfq_peek_skb(in_serv_agg, &cl, &len);> }> if (!skb)> return NULL;>> That way it is clearer, to both humans and the compiler, what is> going on here.>
Got it. Actually, if the first qfq_peek_skb returns NULL, then the
example version that you are proposing apparently may behave in a
different way than the original one: in your proposal the scheduler
tries to switch to a new aggregate and may return a non-NULL value,
whereas the original version would immediately return NULL. I guess that
this slightly different behavior is fine as well, and I am preparing a
new patch that integrates these changes.
> Thanks.>

From: Paolo Valente <paolo.valente@unimore.it>
Date: Wed, 21 Nov 2012 10:45:14 +0100
> Got it. Actually, if the first qfq_peek_skb returns NULL, then the> example version that you are proposing apparently may behave in a> different way than the original one: in your proposal the scheduler> tries to switch to a new aggregate and may return a non-NULL value,> whereas the original version would immediately return NULL. I guess> that this slightly different behavior is fine as well, and I am> preparing a new patch that integrates these changes.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html