BracketQos - Rust, EBPF, XDP, & cake, oh my!

@Borromini Switch doesn't matter for this, MVPP2 is the ethernet controller.
I know it has some fancy stuff, but no idea if it has BQL

1 Like

@Zoxc There is a lot of "we shipped openwrt 22.03.0, now what?", sort of speculation going on.

In my case I'm very frustrated that vendors like mikrotik have only just added cake, and are 6-7 years behind openwrt in many respects. They make good hardware, but....

One of the biggest problems linux has had in the last few years is that it bottlenecks on reads, not writes. Some bleeding edge stuff like vpp and dpdk appeared that could use 1/16th the cores to blast more packets... but lack any method at all at doing good queue control. XDP and ebpf are responses to that while still retaining good egress characteristics.

"Something power efficient for a tower" & well queued & encrypted backhaul would be great! I am allergic to python, but rust seems fast and performant and easier to make run on openwrt/routerOS/vyatta/linux/etc

PS if you (or anyone) can figure out a way to poll tcp_info in rust in crusader, it would be GREAT. For those here that haven't played with it yet, it's here: https://github.com/Zoxc/crusader

@dtaht Does this help any?

root@RB5009UG+S+IN:/sys/class/net# ls
bonding_masters  br-lan.1         fiber            p1               p2.20            p4               p6               p8               sfp
br-lan           eth0             lo               p2               p3               p5               p7               pppoe-wan        wg0
root@RB5009UG+S+IN:/sys/class/net# cat */queues/tx-0/byte_queue_limits/limit
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
root@turris:~# cat /sys/class/net/eth2/queues/tx-0/byte_queue_limits/*
1000
0
7590
1879048192
0

Turris Omnia, BQL seems to work... the 3rd number changes during load.

@Borromini doesn't look like BQL is there. Sigh. It's only 8 lines of code, and makes a huge difference, especially for bidirectional traffic. There's a mini-BQL tutorial over here: http://www.taht.net/~d/broadcom_aug9_2018.pdf

@moeller0 That's on OpenWrt or on Turris OS?

@dtaht I'm getting an 'SSL_ERROR_RX_RECORD_TOO_LONG' from your webserver here?

that is an http not https address.

Sorry. Overly zealous Firefox.

That is on OpenWrt19 based TurrisOS 5.4.3. I bought this thing as I wanted automatic updates (from a source I trust) and it has been delivering; however it currently straggles behind upstream OpenWrt*. But I assume that BQL if available in the old series 4.14 kernel, will also be available for more modern mvneta drivers.

*) This is not a complaint just a statement of current fact, upgrades to a more recent OpenWrt base are in the works and hopefully will be rolled-out soon. If I wanted/dared I could already test the upcoming TOS6.

1 Like

wow. that is a highly capable river, with xdp and tso support, but no bql. doesn't look hard to add though. https://elixir.bootlin.com/linux/v6.0-rc6/source/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c#L4417

I'd be happy to help (and I read the PDF you linked to) but my C skills are extremely limited. I'll gladly test any patches you'd have though, ideally for 5.10 (device is on 22.03).

Don't have the hardware, and doing BQL really requires having it, due to how hard it is to find all the sources of resets from code inspection. But I'll put it on the todo list. 1024 tx descriptors is a bit much! At a gbit,
30k (22 big packets), is all bql will put on the ring. Also the flow control to the switch was "interesting", I don't know how much buffering is in the switch itself.

1 Like

I tend to agree, but in C it's often too easy to do this:
image

While a bit alien to me as well, I really do like Rust's memory safety features a lot.
Although admittedly I curse at the borrow checker sometimes :grinning_face_with_smiling_eyes:

Half relatedly I came across this: https://github.com/carbon-language/carbon-lang
Another Google-backed language that aims to be an successor to C++, not ready for usage yet, and likely won't be for a while, but I'll be keeping an eye on it.

1 Like

What I wish for is a language that works well running inside an interpreter but that can also be compiled... for development and troubleshooting an interpreted autorate implementation is so much easier to work with. (I guess if the compiler and development environment is small enough to include on a router that would also work, but I somehow doubt that is a viable way forward for low storage all-in-one router)...

Seemed more appropriate to continue in the autorate thread, so I've posted my reply there instead

1 Like

I toyed around with getting tcp_info from a socket, seems to work, result here: https://github.com/Lochnair/tcp_info_test

I likely won't have time to add this into Crusader, so I'll leave that part for someone else :slight_smile:

@zoxc - even incremental progress like this is very, very helpful. Being able to see tcp marks and drops would be so great to also have in crusader, in addition to the measurement loss metric. Also, rtt, retransmits.... ton of useful info in tcp_info!!!!! My big hope was to be able to collect data from the socket every 10ms or so but even just at the end of the run really helps.

thx @lochnair for taking a stab at it!

https://github.com/cloudflare/quiche looks easier to instrument than tcp!

@borromini - I spoke with the mvpp2 maintainer about adding BQL. Are you in a position to test a backport?

Sure enough! I backported 5.15 to 22.03 so I can test 5.15, I assume that would make things slightly easier?

Like MSS size! Just keep in mind that there will/should be one info record per TCP flow used in a test and these can be quite different (for example with the go-responsiveness networQuality tool I see occasionally that a test run uses both IPv4 and IPv6 flows* the reported TCP Info however tries to only report an aggregate**). flent does this right and reports results for each test flow, but that can turn into quite a long list...

*) No clue why though, but I can clearly see IPv4 and IPv6 measurement flows in parallel.
**) With the tool easily ramping up into the 20 and above parallel flows, I can see the need for some aggregation, but e.g. for MSS aggregating IPv4 and IPv6 flows is not all that helpful.