[PR] ipq806x: kernel 5.10 bump code propose

So this is what I did. Initially I was running openwrt-21.02-snapshot-r15870-e4d061cd1a from downloads.openwrt.org in my R7800.

  1. FIrst I tried upgrading to R7800-master-r16173-3d1ea0d77f-20210312-2210-sysupgrade.bin using luci with "retain settings" enabled. It silently failed.
  2. Then I loaded R7800-master-r16173-3d1ea0d77f-20210312-2210-sysupgrade.bin to /tmp using SFTP and through ssh ran "sysupgrade -v /tmp/*sysupgrade.bin". I got the error message "Config cannot be migrated from swconfig to DSA" in ssh terminal window.
  3. FInally after few minutes I tried to upgrade to R7800-master-r16173-3d1ea0d77f-20210312-2210-sysupgrade.bin using luci with "retain settings" unchecked. After that it went into boot-loop.

After it went into boot-loop, I used the TFTP method to flash openwrt-21.02-snapshot-r15879-3b6c93298c-ipq806x-generic-netgear_r7800-squashfs-factory.img to get back to working R7800.

you should have used the "-F -n" options, meaning "force" and "not keep settings".

Ok maybe upgrading using luci was what messed it up. I will try "-F -n". Should I use factory.img or sysupgrade.bin? Also why factory image is .img but sysupgrade is .bin?

Sysupgrade image, as you are using sysupgrade...

Filename type has no special meaning.

it still looks very strange that the image caused a bootloop

I tried just now using "-F -n". Still bootlooped. Went back to 21.02 -SNAPSHOT build using TFTP.

@Ansuel : I've been tinkering with your code the last few days and here's my findings so far:

  • C2600 with kernel 5.10 + DSA: still seeing port drops and family complaining a lot about losing connectivity (probably as a result of those port drops).

  • C2600 with kernel 5.10 + swconfig: very stable (with ath10k-ct firmware).

  • R7800 with kernel 5.10 + DSA: port drops too on both wan and lan. The wan port recovers fast and users hardly notice it but they are present as well as on lan.

  • R7800 with kernel 5.10 + swconfig: stable but the ath10k mainline firmware is crashing occasionaly (it seems that the firmware integration with kernel 5.10 is not good). With the ct version, no crashes so far and working well.

1 Like

the c2600 is connected to the same device of the r7800?

Can you test this command? On both router?

ethtool --set-eee lan1 eee off
ethtool --set-eee lan2 eee off
ethtool --set-eee lan3 eee off
ethtool --set-eee lan4 eee off
ethtool --set-eee wan eee off

(i think you need to compile ethtool with the image)

Did you change the driver as well?

Yes, C2600 is a dumb AP connected to the R7800 as a main router.

I can but please confirm that you want me to test these commands on kernel 5.10+DSA. Both routers are now on kernel 5.10+swconfig so I'd need to change the firmware.

EDIT: I just now saw that you made a few more changes to the code. I'll upgrade to latest master + your new commits and will let you know.

Yes, I always change both.

This is the result that I get for ethtool --set-eee lan1 eee off on the C2600, on all ports. Didn't test on the R7800 yet.

eee unmodified, ignoring

And this is ethtool --show-eee lan1:

EEE Settings for lan1:
        EEE status: disabled
        Tx LPI: disabled
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  Not reported
        Link partner advertised EEE link modes:  Not reported

So, EEE is off by default. If I do ethtool --set-eee lan1 eee on, then I get:

EEE Settings for lan1:
        EEE status: enabled - inactive
        Tx LPI: disabled
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  100baseT/Full
                                    1000baseT/Full
        Link partner advertised EEE link modes:  Not reported

EDIT:

Scrap the above. I now upgraded both the C2600 (dumb AP) and the R7800 (main router) at my home as well as my R7800 at my office (main router) with your latest code (kernel 5.10+DSA) on top of latest master. EEE is on by default on all routers.

My office R7800 is behaving well and the logs are clean. Speed and latency are good. I have a few ethernet connections plugged into it (file servers and a switch) and there are no port drops.

At my house, the C2600 is working ok and logs are clean. On the R7800, however, there were lots of these every 3-5 secs, right after the router booted:

[ 194.496394] qca8k 37000000.mdio-mii:10 lan3: Link is Down
[ 194.496498] br-lan: port 3(lan3) entered disabled state
[ 196.577023] qca8k 37000000.mdio-mii:10 lan3: Link is Up - 100Mbps/Full - flow control rx/tx
[ 196.577061] br-lan: port 3(lan3) entered blocking state
[ 196.577076] br-lan: port 3(lan3) entered forwarding state

I'm not sure what is connected to port 3 (I still have to sort out the cabling...). After I saw these port drops, I decided to turn-off EEE on the port 3 of the R7800 and then the port drops stopped. Maybe there is some device connected to port 3 that cannot handle EEE?

So, other than the port 3 drop above, everything seems to be working well. I'll check what is the device connected to port 3 when I get back home later today and will report for any other stability issue (if any).

Thank you for the nice work!

EDIT 2: Port disconnects seem to have been solved by turning off EEE on the problematic ports. But I'm seeing problems with some devices failing to reconnect to wifi after roaming between routers. I did some googling and it seems that other platforms had similar problems with DSA.

Nicely done. I know mvebu is a totally different target, but kernel 5.10 is running great on the WRT3200ACM too.

@hnyman I tried your latest build R7800-master-r16186-bf4aa0c6a2-20210314-1025-factory.img. I flashed the factory.img directly using TFTP method instead of going through sysupgrade. My R7800 still boot-loops. I even disabled Energy Efficient Ethernet in my WIndows 10 laptop (connected to LAN4 port), and no devices connected on any of the other ports, it still boot-looped.

I tried MASTER SNAPSHOT (not 21.02) r16218-662ceebc4c factory.img via TFTP, and it works fine. I don't know if there is something related to DSA or kernel 5.10 that doesn't work on my R7800. I cannot connect to the R7800 through ethernet or wifi.

I don't think my existing config would have been retained when flashing through TFTP so it should be whatever default config the flash image had or generated on first boot.

If you want any logs/info from my R7800, please let me know. I have access to SSH, SFTP and Luci, but no JTAG or Serial access. I am currently running 21.02-SNAPSHOT r15893-f82e7e96a0.

My current R7800 swconfig setup is

WAN port: Arris S33 Cable Modem
LAN1 - Raspberry Pi 4 Model B - Main Router running Openwrt 21.02-SNAPSHOT
LAN2 - Nothing
LAN3 - Windows 10 Laptop (lan_2 aka guest network)
LAN4 - Windows 10 / Linux Laptop (lan_1 aka main local network)
2.4 GHz WiFi - Disabled
5 GHz WiFi - lan_1 local network

I am using VLAN tagging on LAN1 to separate "internet" (eth0.1001), "lan_1" (eth1.2001) and "lan_2" (eth1.2002) networks and sending it to my RPi4B. I am using my R7800 only as an Ethernet Switch and WiFI access point.

EDIT: Re-formatted post to avoid confusion.

Please differentiate between two things:

  • does your R7800 boot-loop with the default settings?
  • or does it boot-loop with your personal confoig, with added VLAN settings?

Serial access would be crucial, as otherwise you do not see the kernel log at the crash time, so it is pure guessing what causes it. It is pretty easy to enable in R7800. See Netgear R7800 exploration (IPQ8065, QCA9984) - #2 by hnyman

And @Ansuel is actually the person doing the hard work here with kernel 5.10.

Personally I do not use VLANs, so no comments regarding that part. The build has worked for me.

Default config. TFTP recovery flash erases all config in the device so it's whatever default config the flashed build has or gets generated during firstboot. It boot-loops immediately after flashing. I do not have ethernet or wifi connection to the device and therefore caanot login to SSH or Luci to change the config in the 1st place.

I have edited my original post to clarify that my setup (that I mentioned) is what I am currently using in swconfig.

Yeah, I got confused by your kernel logs (that had VLANs), but they are from 21.02, and do have not much relevance with debugging your problems 5.10.

I currrently do not have USB-TTL Serial cable. I will try to obtain boot logs for the DSA build during the weekend.

1 Like

@Ansuel and @hnyman Test-DSA-kernel510-master-r16293-310b7f76e8-20210321 Boot Log

Power Adapter is
Netgear P/N 332-10762-01;
Model No. MU42-3120350-A1 (U.S.A. pin);
Input: AC 100-240V, 50/60Hz, 1.5A;
Output: DC 12V, 3.5A

as i tought it's a voltage problem... now what i need to understand is your cpu is too good or too bad?