Hi!
In the debloat.sh script from dtaht,
(https://github.com/dtaht/deBloat/blob/master/src/debloat.sh)
he shows how to use the flow classifier to distribute flows across multiple hardware queues.
But I can't get this working
First I did setup mq + fq_codel:
tc filter add dev eth0 root protocol ip prio 10 flow hash keys src,dst,proto,proto-src,proto-dst divisor 8 baseclass 1:1
Results in:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel, -1
Switchting out root for parent 1: also does not work.
(root is wrong there, I guess, because tc -g class dev eth0 root gives no output,
but tc -g class dev eth0 parent 1: does)
flow classifier is built into the kernel.
Oy that code is ancient. Don't do that. If you want to use hw
mq just make your default qdisc be fq_codel and mq should pick it up automatically.
(this is not an sqm question, if you want to run at line rate)
A mildly better way to configure fq_codel on mq is to set the flows variable to 1024/x where x is the number of hw queues. But that would involve figuring out the filter command again, which I'm not up for today.
It does. But the tc qdisc show output is almost the same.
mq as main qdisc and eight fq_codel sub qdiscs.
But only one fq_codel queue is used out of the eight. (tc stats output)
So the flow classifier is still needed, I guess...
//edit
Kernel 4.19.74
net: sched: fix reordering issues
[ Upstream commit b88dd52c62bb5c5d58f0963287f41fd084352c57 ]
Whenever MQ is not used on a multiqueue device, we experience
serious reordering problems. Bisection found the cited
commit.
The issue can be described this way :
- A single qdisc hierarchy is shared by all transmit queues.
(eg : tc qdisc replace dev eth0 root fq_codel)
- When/if try_bulk_dequeue_skb_slow() dequeues a packet targetting
a different transmit queue than the one used to build a packet train,
we stop building the current list and save the 'bad' skb (P1) in a
special queue. (bad_txq)
- When dequeue_skb() calls qdisc_dequeue_skb_bad_txq() and finds this
skb (P1), it checks if the associated transmit queues is still in frozen
state. If the queue is still blocked (by BQL or NIC tx ring full),
we leave the skb in bad_txq and return NULL.
- dequeue_skb() calls q->dequeue() to get another packet (P2)
The other packet can target the problematic queue (that we found
in frozen state for the bad_txq packet), but another cpu just ran
TX completion and made room in the txq that is now ready to accept
new packets.
- Packet P2 is sent while P1 is still held in bad_txq, P1 might be sent
at next round. In practice P2 is the lead of a big packet train
(P2,P3,P4 ...) filling the BQL budget and delaying P1 by many packets :/
To solve this problem, we have to block the dequeue process as long
as the first packet in bad_txq can not be sent. Reordering issues
disappear and no side effects have been seen.
Interesting.
This implies that fq_codel is multi queue aware.
So replacing mq with fq_codel
Well,
On WRT* devices there seems to be no much difference when using mq + fq_codel.
Looking at the tc stats output, only one hardware queue seems to get used most of the time.
(Removing the tx queue workaround patch doesn't create big difference)
I haven't found a way to make mq use more queues.
Maybe someone else has an idea?
So best bet is XPS/RPS which do take effect later in proccesing (?) and do the queue assignment in software?
Heh. I remember trying this back in the day. But I don't remember the result. Either it crashed... or it was a strict priority queue implemented in hw with no way to turn it off.
This gave rise to this patch that has been with mvebu devices using mvneta since that year (I think it was kernel 4.4 at the time?), surviving to kernel 5.4: