This way doesn't work properly either. The classful qdiscs don't care
about their own q.qlen and just try to dequeue, if successful they
decrement q.qlen. When duplicating packets in an inner qdisc the
top-level qdisc's q.qlen will turn negative at some point (it's
unsigned, but returned as int from qdisc_restart) and will cause
qdisc_run() to spin forever.

Other ways like injecting
packets again at top of queue with a thread/tasklet seem rather gross

Injecting at the top is also problematic because it needs to
follow the same classification-path as the first packet, which
can only be guarenteed with stateless classification.