How can we make the lantiq xrx200 devices faster


#1

The Lantiq VRX200 with the combo VRX268|VRX288 with VRX208 (called xrx200 or VR9) chipset from ca. 2013 are normaly used in xDSL-Modem-Router with and without Telefon support.
I read here and herethat the chipset have a MIPS 34Kc Architecture with 2 VPEs (Virtual Processing Elements) and 9 Thread Context (TC)s per VPE. These both VPEs can be used for SMP or each VPE can run a separateoperating system. In the case of Telefon-Modem-Router the VoIP Firmware (VMMC) can or better must run on it.

The Modem can theoreticly 200 Mbit/s but the Routers reached as Modem-Router with NAT only 50-80Mbit/s, specialy if the LEDE-image have no SMP support. (because it is not possible when telefon support needed). SMP support is enabled on the newest LEDE snapshot.
On this place i want to say Thank you to Stefan Koch the guy that have added it to Lede, make the German O2-Box 6431 ussably and writing the Asterisk channel-lantiq.

I know that AVM-Fritzboxes 7360 and 7490 with the same chipset combo reached a speed like > 100 Mbit/s (The datasheet say 100Mbit/s) tested with 102-103 Mbit/s.

I have an 100 Mbit/s plan VDSL connection + FB 7360 from my Provider where i can equal, but i have only a old Laptop with Fast-Ethernet, seriously i can test up to Max 90 Mbit/s. I use an Arcadyan VGV7510KW22 and VGV952CJW33-E-IR and my aim is it to reached the Minimum 90Mbit/s with complex settings with a lot LANs VLANs + OpenVPN + fastd + Asterisk. But not without call up the webif and no ssh connection. (call up the Webif cost 15Mbit/s)

I want here to start an discusion about how can i make Lantiq VR9 devices faster. And with experience, speedtests and knowing about devices with the VR9 chipset exist. The coolest will be if someone have an idea how can we use VMMC support and SMP.

When i use the newest LEDE snapshot with SMP and disable VMMC i reached the 90Mbit/s simple, But is contradicts this thread My result looks same but it is faster. But what is the meaning of them ?

SMP is no probleme on devices without Telefon FXS-ports but with VoIP you have to decide between VMMC or SMP.

I know a littlebit more speed are possible by modifie the Compiler option "-O3" instead "-Os" to build speed optimized code, but it only minimally brings something 1-2Mbit/s maybe measurement error

using of the right Architecture LEDE uses mips_24Kc but the CPU are mips_34Kc i test it to, it brings minimally somthing too, 5-10Mbit/s or maybe measurement error

My Questions are:
Have everbody A Fritzbox with VR9 Chipset and ssh or Telnet ?
If yes what is the result of cat /proc/interrupts and cat /proc/cpuinfo ?
If i must reserved one VPE for VMMC is it usefull to give them more recources like vpe1_mem=16M maxtcs=9 (only on devices with 128MB RAM or more) ?
Can i make or better is it usefull to make a hardfloat build ?
Some ideas to not use are VPE for the VoIP-firmware like running the Firmware on the same VPE like the OS ?
Some other ideas ?


TD-W9980: Installing DSL modem firmware
How can flow-offloading be enabled on lantiq xrx200 devices?
Simple QoS for VoIP
[Solved] Setting up VDSL in Germany with new LEDE box
BT Homehub 5 performance
#2

To the best of my knowledge, proprietary firmwares have access to further hardware acceleration options (similar to the dreaded hardware NAT in Atheros hardware) not available for LEDE. The decision between SMP XOR VMMC should be the same.


#3

Btw., in the particular case of the VGV7510KW22, the use of a 100 MBit/s switch in that devcie will always prevent full 100 MBit/s line speed. The inherent overhead comes at the expense of your theoretical wire speed (that might not be that much, but you'll see it while benchmarking).


#4

I littlebit deal with the image and source of AVM Fritzbox 7490. I think AVM have there own solution.
I can not find the tapi or the vmmc driver but it exist an 1,1MB kernelmodule called isdn_fbox_fon5.ko.
Inside the directory /GPL-release_kernel/linux-3.10/drivers/isdn/isdn_fon5 where the source should be does no c source exist.
There are some other big kernelmodules inside the image, why can be the reason for the big size ?
I found the source here: http://osp.avm.de/fritzbox/fritzbox-7490/

An other VR9 device where the kernelsource exist a here ALLNET ALL-BM100VDSL2V
source: ftp://212.18.29.48/ftp/pub/allnet/vdsl/all-bm100vdsl2v/SourceCode_Allnet_GPL_ALL-BM100VDSL2V_C48a.zip
It based on Openwrt 10.03.1_LTQ tapi.ko and vmmc.ko exist but no idea if it has smp and telefon support.
If know everbody some links to source from other VR9 devices like Speedport or TP-link etc, please post here !

An other question are: support LEDE "PPE" (Protokoll Processor Engine) and other features of the Lantiq smart CPU Architecture ?
It is listed inside the VRX268 product brief (PSB-80910) under Main Features.
It is not listed inside the VRX288 product brief, but in this sheet Main Features are not listed.

I found PPE inside the OEM Bootlog of TP-Link W8970 (VRX268)

Please press Enter to activate this console. .....pid 228: wait the running hotplug to end itself....... 
[ dm_readFile ] 2042: can not open xml file /var/tmp/pc/reduced_data_model.xml!, about to open file /etc/reduced_data_model.xml 
Loading A5 (MII0/1 + ATM) driver ...... 
MAC-0: a0-f3-c1-xx-xx-xx MAC-1: a0-f3-c1-xx-xx-xx 
Succeeded! 
PPE datapath driver info: 
  Version ID: 64.3.7.1.0.1.4 
  Family : VR9 
  DR Type : Normal Data Path | Indirect-Fast Path 
  Interface : MII0 | MII1 | ATM 
  Mode : Routing 
  Release : 0.1.4 
PPE firmware info: 
  Version ID: 7.2.4.6.2.0 
  Family : VR9 
  FW Type : Acceleration 
  Interface : MII0/1 + ATM 
  Mode : Bridging + IPv4 Routing 
  Release : 2.0 
IFXOS, Version 1.5.14 (c) Copyright 2009, Lantiq Deutschland GmbH 

Lantiq CPE API Driver version: DSL CPE API V4.11.4

and i found equal inside the OEM Bootlog of ZyXEL P-2812HNU-F1

But not in LEDE.


Bt homehub 5 business 20mb out of 200mb
#5

Any news at this topic?
Did you still search for the sourcecode from the other xrx200 based devices?


#6

On a 3370:

cat /proc/interrupts:

          CPU0       CPU1       
  0:      75747      87999      MIPS   0  IPI_resched
  1:      30063     496541      MIPS   1  IPI_call
  7:     410964     409969      MIPS   7  timer
  8:          0          0      MIPS   0  IPI call
  9:          0          0      MIPS   1  IPI resched
 22:          3          0       icu  22  spi_rx
 23:          6          0       icu  23  spi_tx
 24:          0          0       icu  24  spi_err
 62:          0          0       icu  62  1e101000.usb, dwc2_hsotg:usb1
 63:      86106          0       icu  63  mei_cpe
 72:     861008          0       icu  72  vrx200_rx
 73:    1149460          0       icu  73  vrx200_tx
 75:          0          0       icu  75  vrx200_tx_2
 91:          0          0       icu  91  1e106000.usb, dwc2_hsotg:usb2
 96:    1607186          0       icu  96  ptm_mailbox_isr
112:        197          0       icu 112  asc_tx
113:          0          0       icu 113  asc_rx
114:          0          0       icu 114  asc_err
126:          0          0       icu 126  gptu
127:          0          0       icu 127  gptu
128:          0          0       icu 128  gptu
129:          0          0       icu 129  gptu
130:          0          0       icu 130  gptu
131:          0          0       icu 131  gptu
144:          0          0       icu 144  ath9k
161:          0          0       icu 161  ifx_pcie_rc0
ERR:          0

cat /proc/cpuinfo:

system type             : xRX200 rev 1.2
machine                 : AVM Fritz!Box WLAN 3370 Rev. 2 (Micron NAND)
processor               : 0
cpu model               : MIPS 34Kc V5.6
BogoMIPS                : 332.54
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 16
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
package                 : 0
core                    : 0
VPE                     : 0
VCED exceptions         : not available
VCEI exceptions         : not available

processor               : 1
cpu model               : MIPS 34Kc V5.6
BogoMIPS                : 333.82
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 16
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
package                 : 0
core                    : 0
VPE                     : 0
VCED exceptions         : not available
VCEI exceptions         : not available

How can i do this?


#7

I think this is usually set in kernel cmd line...


#8

So far i can see, it is useful to use pppoe on another device if you have vdsl2 and take your xrx200 as modem only. (use vlan and put together the modem and a lan port in a bridge). Until someone is going to work on the hardware acceleration for pppoe available on some devices...

And how exactly can it be done?


#9

Take a look into your device dts in the node choosen.
There is a property named bootargs.

Eg.:


#10

My Lantic device have same issue. Over 100Mbit sync, only 50-60Mbit are possible in routing speed. Could someone help fixing this?


#11

This helped: How can flow-offloading be enabled on lantiq xrx200 devices?


#12

Still interested? ZyXEL P-2812-HNU-F1 3.10TUE4 here.


#13

Thanks
flow-offloading rise up the speed on Easybox-904xDSL from 65 to 85 Mbit/s (Download)


#14

There was a thread about an update to netifd degrading wan to lan throughput on the xrx200 based bt hub 5a, here (without flow offloading enabled) :- https://openwrt.ebilan.co.uk/viewtopic.php?f=7&t=1105&start=10#p3150


#15

Have someone took a look into the u-boot code if we cant use upstream u-boot code and overclock the CPU speed? I dont see any reports or requests in the forum here about overclocking capabilities from the xrx200 devices. Would be nice to know how far they could go without the need of increasing the voltage.


#16

I dont think if anyone knows the uboot hex values for lantiq devices which correspond to CPU clock frequencies. Hence no overclocking available for these devices yet.


#17

I just can't get it why an SoC with integrated gigabit switch gets only an equivalent of 100Mbit on internal port. That's just DMA FIFO to 250MHz RAM (on 16bit bus). Surely that must be just a problem with the driver.


#18

@Plonk34 from a 7490

`cat /proc/interrupts`
           CPU0       CPU1       
  0:       7026      17845       IPI  IPI_resched
  1:       1792       2515       IPI  IPI_call
 11:          0         74   IFX_ICU  YIELD_TO_LINUX_IPI
 57:        198          0   IFX_ICU  mei_vr9
 64:          0          0   IFX_ICU  dma-core-17
 66:          0          0   IFX_ICU  dma-core-0
 67:          0          0   IFX_ICU  dma-core-1
 68:          0          0   IFX_ICU  dma-core-2
 69:          0          0   IFX_ICU  dma-core-3
 70:          0          0   IFX_ICU  dma-core-4
 71:          0          0   IFX_ICU  dma-core-5
 72:          0          0   IFX_ICU  dma-core-6
 73:          0          0   IFX_ICU  dma-core-7
 74:          0          0   IFX_ICU  dma-core-8
 75:          0          0   IFX_ICU  dma-core-9
 76:          0          0   IFX_ICU  dma-core-10
 77:          0          0   IFX_ICU  dma-core-11
 82:          0          0   IFX_ICU  dma-core-18
 87:          0          0   IFX_ICU  dma-core-19
 89:          0          0   IFX_ICU  e5_mailbox0_isr
 90:          0          0   IFX_ICU  e5_mailbox_isr
 91:          0          0   IFX_ICU  dma-core-12
 92:          0          0   IFX_ICU  dma-core-13
 93:          0          0   IFX_ICU  dma-core-14
 94:          0          0   IFX_ICU  dma-core-15
 95:          0          0   IFX_ICU  dma-core-16
105:       1783          0   IFX_ICU  asc1_tx
107:         43          0   IFX_ICU  asc1_rx
108:          0          0   IFX_ICU  asc1_err
120:          0          0   IFX_ICU  gptu
121:          0          0   IFX_ICU  gptu
122:          0          0   IFX_ICU  gptu
123:          0          0   IFX_ICU  gptu
124:          0          0   IFX_ICU  gptu
125:          0          0   IFX_ICU  gptu
129:          0          0   IFX_EIC  gpio pushbuttons
130:        267          0   IFX_ICU  dma-core-20
131:      11668          0   IFX_ICU  dma-core-21
132:          0          0   IFX_ICU  dma-core-22
133:          0          0   IFX_ICU  dma-core-23
134:          0          0   IFX_ICU  dma-core-24
135:          0          0   IFX_ICU  dma-core-25
136:      35726          0   IFX_ICU  dma-core-26
137:        206          0   IFX_ICU  dma-core-27
138:          0          0   IFX_ICU  
160:          0          0   IFX_EIC  gpio pushbuttons
161:          0          0   IFX_ICU  perf_ctr
178:      20882          0       IPI  timer
179:          0      22023       IPI  timer
ERR:          0

And

cat /proc/cpuinfo
system type             : VR9
machine                 : Unknown
processor               : 0
cpu model               : MIPS 34Kc V5.6
BogoMIPS                : 331.77
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 16
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
core                    : 0
VPE                     : 0
VCED exceptions         : not available
VCEI exceptions         : not available

mips-options: 0x006d638b icache.flags 0x00000000 dcache.flags 0x00000004 isa_level 0x00000063 ases 00000031
processor               : 1
cpu model               : MIPS 34Kc V5.6
BogoMIPS                : 250.67
wait instruction        : yes
microsecond timers      : yes
tlb_entries             : 16
extra interrupt vector  : yes
hardware watchpoint     : yes, count: 4, address/irw mask: [0x0ffc, 0x0ffc, 0x0ffb, 0x0ffb]
isa                     : mips1 mips2 mips32r1 mips32r2
ASEs implemented        : mips16 dsp mt
shadow register sets    : 1
kscratch registers      : 0
core                    : 0
VPE                     : 1
VCED exceptions         : not available
VCEI exceptions         : not available

mips-options: 0x006d638b icache.flags 0x00000000 dcache.flags 0x00000004 isa_level 0x00000063 ases 00000031

#19

Hi guys, I think I found some speedup tips, they requires kernel changes though.

First you can try my irq balancing from here. That should decrease irq load.

Changing file arch/mips/lantiq/xway/dma.c , function ltq_dma_init_port , case DMA_PORT_ETOP to this:

ltq_dma_w32_mask(0, DMA_ETOP_ENDIANNESS | DMA_PDEN,
                    LTQ_DMA_PCTRL);

//burst
ltq_dma_w32_mask(0x3c, (2<<4) | (2<<2), LTQ_DMA_PCTRL);

seems to increase DMA speed by about 10% (ethernet netcat pipe). It is the same burst setting as in case bellow (DMA_2W_BURST). More words during burst -> speedup. There is a value 3, but the packets were damaged (a comment in vendor's kernel notes that burst mode is broken).

Finally it seems the openwrt snapshot about 2-3 weeks old with kernel 4.14.93 has somewhat broken ethernet driver. It seems to be only under 100Mbit/s speed of communication. When I use the code (RX, TX, irq, napi...) from vanilla kernel v5 I can get following raw speeds: 100 MiBytes/s into xrx200 (so basically a gigabit), 17 MiBytes/s from xrx200 (roughly about 150 Mbit/s). The tx fifo (from xrx200) was the one which caused slowdowns in the original openwrt driver.

If there is no openwrt kernel/patches maintaner in the forum I will send reports to the mailing list tomorrow. But I was able to speed up the network just by changing the file drivers/net/ethernet/lantiq_xrx200.c , function xrx200_poll_rx to this:

if (complete || !rx) {
        if (napi_complete(&ch->napi))
                ltq_dma_enable_irq(&ch->dma);
}

Xrx200 IRQ balancing between VPEs
#20

Just to be clear,, you are not using OpenWrt's build for this? Also it seems you are talking aboit 100 MiBytes for LAN and 17 MiBytes for Wi-Fi.

If this is possible through above code changes, then flowoffloading can, theoratically speaking, maybe increase it further (for wifi)? Or we just don't need it anymore?

In any case, if this gets implemented, it seems a lot of improvement because right now my TD-W8980 can only handle 3 MiBytes/s with around 1.7 load average on wifi and going above it crashes after a few minutes.