How can we make the lantiq xrx200 devices faster

I dont think if anyone knows the uboot hex values for lantiq devices which correspond to CPU clock frequencies. Hence no overclocking available for these devices yet.

I just can't get it why an SoC with integrated gigabit switch gets only an equivalent of 100Mbit on internal port. That's just DMA FIFO to 250MHz RAM (on 16bit bus). Surely that must be just a problem with the driver.

@Plonk34 from a 7490

`cat /proc/interrupts`
           CPU0       CPU1       
  0:       7026      17845       IPI  IPI_resched
  1:       1792       2515       IPI  IPI_call
 11:          0         74   IFX_ICU  YIELD_TO_LINUX_IPI
 57:        198          0   IFX_ICU  mei_vr9
 64:          0          0   IFX_ICU  dma-core-17
 66:          0          0   IFX_ICU  dma-core-0
 67:          0          0   IFX_ICU  dma-core-1
 68:          0          0   IFX_ICU  dma-core-2
 69:          0          0   IFX_ICU  dma-core-3
 70:          0          0   IFX_ICU  dma-core-4
 71:          0          0   IFX_ICU  dma-core-5
 72:          0          0   IFX_ICU  dma-core-6
 73:          0          0   IFX_ICU  dma-core-7
 74:          0          0   IFX_ICU  dma-core-8
 75:          0          0   IFX_ICU  dma-core-9
 76:          0          0   IFX_ICU  dma-core-10
 77:          0          0   IFX_ICU  dma-core-11
 82:          0          0   IFX_ICU  dma-core-18
 87:          0          0   IFX_ICU  dma-core-19
 89:          0          0   IFX_ICU  e5_mailbox0_isr
 90:          0          0   IFX_ICU  e5_mailbox_isr
 91:          0          0   IFX_ICU  dma-core-12
 92:          0          0   IFX_ICU  dma-core-13
 93:          0          0   IFX_ICU  dma-core-14
 94:          0          0   IFX_ICU  dma-core-15
 95:          0          0   IFX_ICU  dma-core-16
105:       1783          0   IFX_ICU  asc1_tx
107:         43          0   IFX_ICU  asc1_rx
108:          0          0   IFX_ICU  asc1_err
120:          0          0   IFX_ICU  gptu
121:          0          0   IFX_ICU  gptu
122:          0          0   IFX_ICU  gptu
123:          0          0   IFX_ICU  gptu
124:          0          0   IFX_ICU  gptu
125:          0          0   IFX_ICU  gptu
129:          0          0   IFX_EIC  gpio pushbuttons
130:        267          0   IFX_ICU  dma-core-20
131:      11668          0   IFX_ICU  dma-core-21
132:          0          0   IFX_ICU  dma-core-22
133:          0          0   IFX_ICU  dma-core-23
134:          0          0   IFX_ICU  dma-core-24
135:          0          0   IFX_ICU  dma-core-25
136:      35726          0   IFX_ICU  dma-core-26
137:        206          0   IFX_ICU  dma-core-27
138:          0          0   IFX_ICU  
160:          0          0   IFX_EIC  gpio pushbuttons
161:          0          0   IFX_ICU  perf_ctr
178:      20882          0       IPI  timer
179:          0      22023       IPI  timer
ERR:          0

And

cat /proc/cpuinfo
system type             : VR9
machine                 : Unknown
processor               : 0
cpu model               : MIPS 34Kc V5.6
BogoMIPS                : 331.77
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 16
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
core                    : 0
VPE                     : 0
VCED exceptions         : not available
VCEI exceptions         : not available

mips-options: 0x006d638b icache.flags 0x00000000 dcache.flags 0x00000004 isa_level 0x00000063 ases 00000031
processor               : 1
cpu model               : MIPS 34Kc V5.6
BogoMIPS                : 250.67
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 16
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
core                    : 0
VPE                     : 1
VCED exceptions         : not available
VCEI exceptions         : not available

mips-options: 0x006d638b icache.flags 0x00000000 dcache.flags 0x00000004 isa_level 0x00000063 ases 00000031

Hi guys, I think I found some speedup tips, they requires kernel changes though.

First you can try my irq balancing from here. That should decrease irq load.

Changing file arch/mips/lantiq/xway/dma.c , function ltq_dma_init_port , case DMA_PORT_ETOP to this:

ltq_dma_w32_mask(0, DMA_ETOP_ENDIANNESS | DMA_PDEN,
                    LTQ_DMA_PCTRL);

//burst
ltq_dma_w32_mask(0x3c, (2<<4) | (2<<2), LTQ_DMA_PCTRL);

seems to increase DMA speed by about 10% (ethernet netcat pipe). It is the same burst setting as in case bellow (DMA_2W_BURST). More words during burst -> speedup. There is a value 3, but the packets were damaged (a comment in vendor's kernel notes that burst mode is broken).

Finally it seems the openwrt snapshot about 2-3 weeks old with kernel 4.14.93 has somewhat broken ethernet driver. It seems to be only under 100Mbit/s speed of communication. When I use the code (RX, TX, irq, napi...) from vanilla kernel v5 I can get following raw speeds: 100 MiBytes/s into xrx200 (so basically a gigabit), 17 MiBytes/s from xrx200 (roughly about 150 Mbit/s). The tx fifo (from xrx200) was the one which caused slowdowns in the original openwrt driver.

If there is no openwrt kernel/patches maintaner in the forum I will send reports to the mailing list tomorrow. But I was able to speed up the network just by changing the file drivers/net/ethernet/lantiq_xrx200.c , function xrx200_poll_rx to this:

if (complete || !rx) {
        if (napi_complete(&ch->napi))
                ltq_dma_enable_irq(&ch->dma);
}
1 Like

Just to be clear,, you are not using OpenWrt's build for this? Also it seems you are talking aboit 100 MiBytes for LAN and 17 MiBytes for Wi-Fi.

If this is possible through above code changes, then flowoffloading can, theoratically speaking, maybe increase it further (for wifi)? Or we just don't need it anymore?

In any case, if this gets implemented, it seems a lot of improvement because right now my TD-W8980 can only handle 3 MiBytes/s with around 1.7 load average on wifi and going above it crashes after a few minutes.

It is possible to change these settings in uboot for Easybox 904xDSL but i think it is not the problem in commercial products like Speedports or AVM get speeds up to 100Mb/s and VMMC, without that.

Yes the values can be changed, ar71xx target is pretty much known to it. But the problem is no one knows what values need to be changed for xrx200 devices yet. Overclocking is always a fun thing and you can also squeeze out a little bit more out if your router.

Yes those devices have proper driver support to enable those kinds of speeds. If @pc2005, as explained above, can achieve the rates mentioned then I think there may be no need to enable flowoffloading for these devices anymore because they can achieve the full capacities without it. It is just a matter of time to see how things go from here.

OK but how can i see or test it, when i enable more speed in uboot ?
Or better how can i see that is working ?

Right now I dont have the overclocked TpLink MR3420 v2 with me but maybe you can search this forum for it. The overclocked speed shows up in uboot when you turn on the device. It always says that the CPU is xxxMHz and RAM is xxx so if you correctly implement the overclock it should show you an increase in CPU clock. My MR3420 showed 700 MHz while in uboot.

Edit: As for testing you can always use iperf to see how much transfer speeds you get with overclocked CPU and also any stress tests to see that the system doesnt crash so these trials and errors will help you implement a safe overclocked CPU which works well. I, myself, have two xrx200 devices, namely W8980 and HomeHub 5A, but I cant use them for testing uboot because there is always a risk of bricking the device and I dont have a flash programmer with me nor any skills to revive them.

I'm using git snapshot openwrt. I changed the kernel source code (openwrt build system will download a copy) and do a recompile.

No I was talking about upload/download speed from gigabit ethernet. I will gonna try wifi when I have the solid base (for example in the snapshot the usb doesn't work - I was thinking about making devel rootfs there).

It is just a raw speedup, so I think the wifi will work faster. I don't think there is much offloading in the driver, just DMA to an address of the packet.

BTW you can measure the ethrnet speed yourself, compile support for netcat with UDP "server" and run a server on one computer and client on the other pushing data from /dev/zero:

nc -l -p port | pv > /dev/null       #computer
cat /dev/zero | nc host port         #xrx200

If you use UDP (-u) then there won't be handshake packets in the opposite way.

2 Likes

I am trying your speedup tweaks for v18.06.1 snapshots. These snapshots are different from the actual master snapshot images and are still in the stable category. First I'll be looking into IRQ balance and after that the above tweaks. It msy take a couple of days to test these tweaks and compile everything. So hopefully it will be worth the wait.

I understand DMA technique, it means CPU will not be direcly bound for the data transfer and even if it is bound the actual load will be far less than the current situation and thus the speeds should improve.

These changes should be fine, just check if the file isn't so much different.

Yeah it did improved the broken driver speed a little. But the main problem is the driver itself not a borderline optimalization of the DMA transfers.

@arnysch have develop a method for the Easybox 904xDSL to load an other uboot into RAM and start them from here.
see:


I do not know if it work (with changes) from an other Lantiq device too, but maybe possible.

So i have start my selfcreated uboot with 600MHz CPU speed and 300MHz RAM speed with this method
and it show me the speed into bootlog.
But i do not know if it shows me into bootlog if it really works, how can i see it ?

You can compile a dhrystone benchmark. There should be one in busybox. I think even running a long shell loop should be OK for benchmarking (measure script with time command)

1 Like

I've managed to backport ethernet driver from vanilla kernel v5 (seems to be better than actual openwrt) and I've added switch and phy from the old one. This seems to increase the troughtput of the ethernet (server 10.0.0.1, tplink xrx200 10.0.0.80):

test upload:
        server
                nc -u -l -p 4321 | pv > /dev/null
        tplink
                cat /dev/zero | nc -u 10.0.0.1 4321
        speed = 18,5MiB/s
test download:
        tplink
                nc -u -l -p 4321 > /dev/null
        server
                cat /dev/zero | pv | nc -u 10.0.0.80 4321
        speed = ~100 MiB/s (varies)

If you test the old ethernet driver by same commands you will see an increase to about 200%.

My second patch is for DMA burst size. Both patches were tested in openwrt git snapshot 6e104c63d678518f93425e5e34f6caf75228024c, but they are updated for a3ccac6b1d693527befa73532a6cf5abda7134c0, where they was not yet tested. I don't know the exact way how add patches to openwrt, but it should be fine to put them into target/linux/lantiq/patches-4.14 or directly patch the kernel at build_dir/target-mips_24kc_musl/linux-lantiq_xrx200/linux-4.14.96 .

0904-backport-vanilla-eth-driver.patch
0905-increase-dma-burst-size.patch

Another speedup patches are in this thread.

edit: fixed patches, there was surplus "./" in path, works from target/linux/lantiq/patches-4.14

3 Likes
1 Like

I've managed to fix bugs in the ethernet driver when phy didn't start and I've implemented a basic skb frags (DMA scather gather) functionality, try to test these patches [0904-backport-vanilla-eth-driver.patch] [0905-increase-dma-descriptors.patch]. The speedup against original openwrt driver seems to be pretty high. Script for testing here, change IP, run iperf3 -s on lantig, script on host. Don't forget to remove old patches.

1 Like

I put all of your patches in the openwrt/target/linux/lantiq/patches-4.14/ (also from the other thread) but I don't think if they have been applied at all. I dont see any changes in the throuput speed although I have a 100mbit port on my Laptop not a 1gbit one. I am seeing 91 mbit/s for upload and 44.5 mbit/s for download through iperf. On the other hand I dont see any changes in cat /proc/interrupts either. I think for some reason your patches are not being applied. How can I see if they are being applied?

Edit: I can see the patches being applied but it also says patch unexpectedly ends in the middle of the line for all of your patches and after that it says Hunk # applied successfully. and it continues.

Edit 2: I carefully copied patches and applied 0666 permissions as was the case for other ones and then compiled again and there you go it works. Since my laptop can only handle 100mbit/s I can actually download and upload at 92mbit/s with my TD-W8980. Also irq balance seems to work and it increase the wifi throughput about 10% as you suggested before. With OpenWrt ethernet driver the upload to router was 92mbit/s and download was 44mbit/s and with this driver both directions get 92mbit/s.

Hi there same result for patches 901 / 902 / 903
But this works with the files of post #1 of this thread.
See not working: Xrx200 IRQ balancing between VPEs post 19
See working: Xrx200 IRQ balancing between VPEs post 22

Patch 904 / 905: I can not test it on Easybox 904xDSL because 904 it is incompatible with 4027-NET-MIPS-lantiq-support-fixed-link.patch
I will test it on O2-Box 6431, but the question are: in combination with patches 901 / 902 / 903 or with the raw files from post #1 or without changes before.

I think pastebin deletes the last empty lines in a post. But the patch applies OK

So an upgrade yay, congrat :smiley:
Do all ports work?

These are just normal patches same as 4027-NET-MIPS-lantiq-support-fixed-link.patch the openwrt build will use them automatically. So the only problem is the compatibility with 4072. You should be to fix this merge problem manualy, it is just a few lines in noncritical section (you can see the differences if you make two patched version, 4027 only and 904 only).

DMA burst patch is not required, ICU patches (dts, irq.c and smp-mt) should be usefull, but system should build without them. I did used all of them.