hfsc_destroy_qdisc takes O(n) time wrt. the number of classes,
but 5-6 seconds is still long. If all these classes contain inner
qdiscs other than the default, I guess removing the classes from
dev->qdisc_list in qdisc_destroy takes up most of the time, with
n O(n) operations. The __qdisc_destroy rcu callback also calls
reset before destroy, I don't know any qdisc where this is really
neccessary. Without inner qdiscs, I need to see the script first to
judge what's going wrong. Tomasz ?

http://www.e-wro.pl/~acid/tc.batch.gz. In my opinion it's not the case
of expensive algorithms, but the number of classes. With this rule set loaded
(tc -b tc.batch) command:
for i in 'e1.903 e0.930 e0.931 e0.932' ; do
tc qdisc del dev ${i} root
done
completly freezes machine for about 5-6 seconds.

I've done some profiles with your script (on an old kernel without
the lockless loopback patch), qdisc_destroy takes up 89% of the time
when destroying the qdiscs.
These are the exact results:
- execute the script on unpatched kernel:
time:
real 2m28.822s
user 0m2.347s
sys 2m25.395s
top 5 in profile: