Add support for MikroTik RB5009UG

The graphing tool is called netdata (which I think is excellent) and the flent test was rrul. fq_codel + htb for the second test. Irqbalance was installed an irqs seem balanced from cat /proc/interrupts.

I'll try cake unshaped with sch_mq later today.

1 Like

Is this with the BQL patch? I can share a patch to apply to your OpenWrt tree if you'd like, but I suppose you know how to integrate it.

You needn't post the rrul graphs, but was cake shaped better than htb+fq_codel in terms of latency?

thx. so many great graphing tools have showed up in the past couple years, I can't keep track.

I sometimes get an urge to ressurrect "fq_codel_fast". I'd ripped out the extra stats, put in two needed features (gso-splitting and a shaper), and it was faster than htb + fq_codel by more than a bit. When we started cake, it too was just fq_codel + a shaper, and then as it added feature after feature, got slower, and slower.

Is this with the BQL patch? I can share a patch to apply to your OpenWrt tree if you'd like, but I suppose you know how to integrate it.

No BQL patch, but I'm nowhere close to link speed.

Very similar, DOCSIS flakiness is the bigger factor in both graphs.

my hope is that bql makes things stabler at all speeds.

@dtaht @mwojtas I experienced frequent connection drops in the time I ran my build with the BQL patch against 5.15 (which was a little more than a day). Both WAN disconnects (PPPoE authentication issues, which has otherwise been very stable) and weird LAN issues (router being unreachable from within the remainder of the network). I haven't been able to pinpoint the issue, but after dropping the patch the RB5009 has been stable for almost two days now, like it used to be before. This is 22.03 with 5.15 backported (I cannot afford to run master on those devices), so a bad base to start digging. I feel like in this context (with a 'contaminated' testbed) my Tested-by (or 'bug report') is of little value.

I'll be deploying my second RB5009 onto location come Monday, and I cannot afford to use my own for testing either, between homeworking and my wife needing her Netflix fix. Sorry I cannot be of any more help.

No issues here with the BQL patch applied (on master), both WAN connections (one with PPPoE) and LAN have been stable for the 24h I've been running with the BQL patch.

1 Like

Great! @dtaht will surely appreciate your Tested-by so the patch can be upstreamed.

I had the same experience as well, lots of network disconnects. It felt like a layer two problem STP loop issue. I finally rolled the BQL patch back and the interfaces have become stable. I am currently on SNAPSHOT at the moment and so far things are behaving well. Will be working on setting up a 10Gb clients to do some iperf3 tests on the SFP+ interface.

I only have two issues at the moment. 1) mwan3 is hanging on startup and unbound seems to have a slow memory link.

1 Like

What kind of disconnects have you experienced? It would be great if I could reproduce any of such locally.

After some testing, I also have a consistent issue with the BQL patch. Installed the exact same OpenWRT without the BQL patch and it's not having this issue.

In my case the router locks up during a speedtest, flent rrul run or fast.com speedtest.

Steps to reproduce:

  1. Run one of those speedtests
  2. During the download phase, the router CPU usage is much higher than expected, fully maxing out one of the cores. Router becomes very sluggish, SSH stops responding nor sending anymore output. Router does not seem to recover, no longer responds to pings.

Takes about 1 or 2 speedtest.net runs to trigger the issue for me.

Interesting. Are you using a single router port both for speedtest and other connections such as SSH?

No, I'm using 3 ports in total on the router.

p0, 2.5Gbps port connected to a cable modem, link at 1Gbps due to the cable modem not supporting higher.
p1, 1Gbps port connected to a DSL modem.
p2, 1Gbps port connected to a switch.

FYI @dtaht
I also did some testing, however, on top of the net-next main branch (i.e. v6.1-rc1). Some observations:

  1. Rootfs over NFS (1G RGMII)- stable all the time
  2. Another 1G RGMII port: ssh sessions when a port was under iperf3 -P4 (@10/100/1000 speeds, both directions) - no single disconnect. With iperf3 + ping I haven't noticed an improvement in response (with and without BQL), but regardless of speed and direction, the intervals were very stable. I probably need to run a more sophisticated BQL examination
  3. 10G SFP+ ports bridged and bidirectional RFC2544 latency/jitter test. Latency improved by 1-2% with BQL.
    The question is, why I don't observe similar issues to what you see on Mikrotik and OpenWRT. I am wondering if this can be a kernel baseline-related, or some other configuration. Can you provide me with a zcat /proc/config.gz output?

All I have are the scars on my back from 3 weeks of hacking on the beaglebone BQL driver, where in some flash of insight and a lot of printks, I saw it was rounding up the size of the smallest tx packets by a few on the rx return, and that was why it was hanging.

I was disproportionately happy at finally getting this right as finally my audio and LED blinkenlights were reliably in sync, no matter the rest of the load on the audio subsystem.

I know this is not necessarily adding insight but... in the other tester's setups are there encapsulations in play?

1 Like

As someone was asking about PoE:

I started to work on this yesterday and I can successfully communicate with the PoE controller using mtpoe_ctrl but the PoE controller on the RB5009 uses the Mikrotik PoE protocol v4 but mtpoe_ctrl only implements up to version 3 so only some commands (firmware version, temperature and input voltage readout) work.
I'm currently working on reversing their kernel driver but it's pretty tedious work... (maybe I'll whip out the logic analyzer and stock firmware later, it almost seems like the easier option...).
For details see Mikrotik RB750 r2 series POE - #68 by TheChaosJack

4 Likes

@adron @robimarko Hi guys, I am wondering if you ever get the JTAG working on the Marvell Armada 7040 (88f7040). I get the same errors as you guys and it just not seem to work. Did you had any luck in the recent time or even a working config file for openocd? Thank you!

Unfortunatelly not, even on a different 7040 board I cannot get OpenOCD to work

That is a pity. Thank you for the replay though!