Placed a small note, in my original post, that all settings are default/unchanged. Everything is running stock 18.06.1, r7258-5eb055306f. Please check out the section "NEED MORE SPEED!! Enabling hardware flow control:" and let me know if I need to make more refinements.

I have benchmarked my Dir-860L by hooking up a computer to WAN and another one to LAN. iperf was giving me 550-600 mbit with sqm on, and could even be pushed to 700 mbit with qos-simplest. Cake was roughly 550 mbit. His numbers seem to match mine.

CPU performance and I/O performance are sadly not the same, nor is the level of optimization or hardware offloading between different archs or even SOCs.

MIPS is basically a traditional workstation architecture (SGI), 'just' scaled down (for power consumption, costs, etc.) to router needs (and then forgotten for far too long), it has inherited a good I/O performance from there.

ARM on the other hand comes from the other end of the spectrum, low-power and low performance (more firmware than operating system) and was actively improved to gain performance, mostly with a focus on mobile (phone-) usage so far - a task where I/O performance doesn't matter that much and where features tend to get offloaded into dedicated IP blocks (which don't necessarily have FOSS driver support).

Awesome, that was extremely thorough information, thanks, and it confirms my gut instinct that cake was going to do just something like 200Mbps, which it did. Here's the clicky-link to the piece_of_cake result: http://www.dslreports.com/speedtest/44441633

Since it was set to 950Mbps, cake is being throttled by its own ability to do calculations on the packets. If you set it to something like 200Mbps speeds it will have even better bufferbloat performance. For someone doing VOIP or games, the variation between 20 and 70ms ping times is going to be noticeable in terms of variable hitreg or garbled glitchy audio.

These numbers make a lot more sense. Thanks for running through all of these permutations. I would update the original post so there is no confusion though.

I'm not sure your final conclusion about the stock configuration is correct though. My thoughts:

Seems like if your connection is 200mbps or less, then SQM is probably still a good idea (less variability in latency) since the CPU can handle it.

If your connection is in the 200-600mbps range, I'm not sure your test is enough for those users. You are not saturating your connection, so your buffers/queues are not full on the router. This may keep bloat low in this scenario, but users with speeds in this range should perform their own tests.

For the higher speeds, I think this is not an issue with HW acceleration per se - you are simply getting closer to saturating your connection, which inherently results in more saturated queues (and therefore bloat). Since HW accel is not currently compatible with SQM, that can't save you.

I'm curious what CPUs are fast enough to handle near gigabit WAN connections with SQM (cake or fq_codel).

pretty sure @jeff uses the Apu2 boards and yes they'll do it or at least get close to 1Gbps. If I were looking for x86 boards I'd look for AES-NI, anything with it is going to be fast enough for 1Gbps shaping.

Hmm, a little strange that all those tests had similar speeds, are you sure you're disabling hw offload and saving/reloading the sqm instance? because you're getting like 500Mbps even when you sent 100Mbps speeds?

Going to toss out a hypothesis. "Hardware flow offloading" (HFO) is better then SQM, in all cases, on the ERX.

My logic. SQM maxes out at 200Mbps. If we compare HFO @ 200Mbps, I would argue buffer bloat is as good, or better, than what SQM can offer.

Problem. I don't know how to test that one. I can hard limit my connection down to 100Mbps, but I would like to test this at all level and see where buffer bloat starts to become an issue. Can any one test this, or give me a "how to" on rate limiting my incoming WAN connection?