Octeon, 24.10.0-rc7 periodic LAN connectivity loss

Is there something I should get worried about? In such a case, how to diagnose and fix?

I didn't notice that behavior prior upgrading to rc7. I'm using openwrt since 23.05.5.

I have not touched to the hardware/cables since openwrt was installed last september. Same cables that used to be perfectly OK.

Router is connected to a Netgear switch with a 18 inches ethernet cable. No wifi whatsoever. Router is ERL3.

root@OpenWrt:~# logread|grep link|grep connecti
Tue Jan 28 21:52:45 2025 daemon.notice netifd: Interface 'loopback' has link connectivity
Tue Jan 28 21:52:46 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Tue Jan 28 21:52:49 2025 daemon.notice netifd: Interface 'wan' has link connectivity
Tue Jan 28 21:52:51 2025 daemon.notice netifd: Interface 'wan6' has link connectivity
Wed Jan 29 17:11:20 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Wed Jan 29 17:11:22 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Wed Jan 29 19:14:17 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Wed Jan 29 19:14:18 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Thu Jan 30 10:10:50 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Thu Jan 30 10:10:52 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 01:07:49 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 01:07:51 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 06:09:34 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 06:09:36 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 08:19:50 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 08:19:51 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 08:53:20 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 08:53:21 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 09:18:32 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 09:18:34 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 14:32:29 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 14:32:31 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Jan 31 21:10:37 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Jan 31 21:10:38 2025 daemon.notice netifd: Interface 'lan' has link connectivity
(...)

I got some hints in this issue https://github.com/openwrt/openwrt/issues/17351 but on the ERL3, eee isn't supported either:

root@OpenWrt:~# ethtool --show-eee eth0
Cannot get EEE settings: Not supported

root@OpenWrt:~# ethtool  eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                1000baseX/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                1000baseX/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 7
        Transceiver: external
        Auto-negotiation: on
        MDI-X: Unknown
        Link detected: yes

okay, so we have a baseline, 23.05.5 worked.
so we can opkg install owut and run:

owut check --version-to

This should list every available 24.10.0 version you could install to test, so you could go owut download --version-to 24.10.0-rc6 and then owut install to install rc6 and see if the issue is still there.
If still there, you can go to rc5, rc4, rc2 and rc1 etc.

2 Likes

It's a very good method. Thanks for your reply and the suggestion.

I'm back to rc6. Not using owut as I have all my prev build images. Will report when I find a negative match, which could take several days.


user@computer:~$ scp -O ./mkimage/openwrt-imagebuilder-24.10.0-rc6-octeon-generic.Linux-x86_64/build_dir/target-mips64_octeonplus_64_musl/linux-octeon_generic/tmp/openwrt-24.10.0-rc6-octeon-generic-ubnt_edgerouter-lite-squashfs-sysupgrade.tar root@192.168.172.121:/tmp/openwrt-24.10.0-rc6-octeon-generic-ubnt_edgerouter-lite-squashfs-sysupgrade.tar^C

root@OpenWrt:~# sysupgrade /tmp/openwrt-24.10.0-rc6-octeon-generic-ubnt_edgerouter-lite-squashfs-sysupgrade.tar

 OpenWrt 24.10.0-rc6, r28388-58d0057481
 -----------------------------------------------------
root@OpenWrt:~# logread|grep link|grep connecti
Wed Feb  5 15:55:37 2025 daemon.notice netifd: Interface 'loopback' has link connectivity
Wed Feb  5 15:55:39 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Wed Feb  5 15:55:41 2025 daemon.notice netifd: Interface 'wan' has link connectivity
Wed Feb  5 15:55:42 2025 daemon.notice netifd: Interface 'wan6' has link connectivity
1 Like

Hi,

Absolutely no connectivity lost in 3 days with rc6 whereas I had several a day for each and every day between Jan 28 and Feb 5 while I was on rc7.

Looks like a commit within rc7 is the culprit. I can't determine which one though, nothing obvious. Ideas?

1 Like

Okay, if we start with
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=4892ea9a74da8cf934dc534665c3689d2139507b

That's the AR8035 driver being enabled for the Octeon boards, tested on the Edgerouter Lite.

After this change, the only Octeon change is the updating of the ubnt-usg.

https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=7165937c3b6ebe5ac692fcde1cc69ba1c7a3d9c9;hp=3295f6f1c254cd7e5e5285a05581bf6abbde8999

Now, the USG-3P and the ERL are basically identical.

https://openwrt.org/toh/ubiquiti/edgerouter_lite
https://openwrt.org/toh/ubiquiti/unifi_security_gateway_3p

So does the ERL or the USG rely on each other in the Octeon makefile, if one references the other and the system board name change

https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/octeon/image/Makefile;h=dec04809150aeef56b9fa39f3041eb49fdffd4ec;hb=7165937c3b6ebe5ac692fcde1cc69ba1c7a3d9c9

Looks like they might, @efahl any ideas if your USG commit has affected the Edgerouter Lite, as the makefile suggests both devices reference each other when being built.

Oops, might have talked too fast, sorry. Downgrading to rc5

 OpenWrt 24.10.0-rc6, r28388-58d0057481
 -----------------------------------------------------
root@OpenWrt:~# logread|grep link|grep connecti
Wed Feb  5 15:55:37 2025 daemon.notice netifd: Interface 'loopback' has link connectivity
Wed Feb  5 15:55:39 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Wed Feb  5 15:55:41 2025 daemon.notice netifd: Interface 'wan' has link connectivity
Wed Feb  5 15:55:42 2025 daemon.notice netifd: Interface 'wan6' has link connectivity
Fri Feb  7 15:01:16 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Feb  7 15:01:18 2025 daemon.notice netifd: Interface 'lan' has link connectivity

 -----------------------------------------------------
 OpenWrt 24.10.0-rc5, r28304-6dacba30a7
 -----------------------------------------------------
root@OpenWrt:~# logread|grep "link connecti"
Fri Feb  7 15:58:40 2025 daemon.notice netifd: Interface 'loopback' has link connectivity
Fri Feb  7 15:58:42 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Feb  7 15:58:43 2025 daemon.notice netifd: Interface 'wan' has link connectivity
Fri Feb  7 15:58:54 2025 daemon.notice netifd: Interface 'wan6' has link connectivity
1 Like

Let us know, the ar8035 driver was rc4.

1 Like

Downgrading to rc4

 OpenWrt 24.10.0-rc5, r28304-6dacba30a7
 -----------------------------------------------------
root@OpenWrt:~# logread|grep "link connecti"
Fri Feb  7 15:58:40 2025 daemon.notice netifd: Interface 'loopback' has link connectivity
Fri Feb  7 15:58:42 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Feb  7 15:58:43 2025 daemon.notice netifd: Interface 'wan' has link connectivity
Fri Feb  7 15:58:54 2025 daemon.notice netifd: Interface 'wan6' has link connectivity
Fri Feb  7 18:44:52 2025 daemon.notice netifd: Interface 'lan' has link connectivity loss
Fri Feb  7 18:44:53 2025 daemon.notice netifd: Interface 'lan' has link connectivity
 
 -----------------------------------------------------
 OpenWrt 24.10.0-rc4, r28211-d55754ce0d
 -----------------------------------------------------

root@OpenWrt:~# logread|grep "link connecti"
Fri Feb  7 19:05:03 2025 daemon.notice netifd: Interface 'loopback' has link connectivity
Fri Feb  7 19:05:05 2025 daemon.notice netifd: Interface 'lan' has link connectivity
Fri Feb  7 19:05:06 2025 daemon.notice netifd: Interface 'wan' has link connectivity
Fri Feb  7 19:05:10 2025 daemon.notice netifd: Interface 'wan6' has link connectivity

It should not have changed anything, but for adding a new entry into the meta data in the img and profiles.json files (all the old names remain as-is). As far as I could tell from the code, both the USG and ERL are identical hardware in different cases.

1 Like

Have you tried rc3?

Issue might be caused by a kernel update, every RC saw one or multiple of those AFAIK. Unless you already ruled those out?

1 Like

Hi,

Entirely possible.

In fact without debug info, I'm a mechanic blindly replacing auto parts until the problem go away.

Thanks for the suggestion.

I'm still testing rc4.

RC7 was extremely prone to failure; rc6 saw it's first failure after 48h; rc5 after 3h.

IIRC, there was no rc3 and rc1 was rapidly discarded. Next would be rc2 then all the way back to 23.05.5 I'm afraid.

The best way to go about this is to build from source and use 'git bisect'. That should narrow it down to the offending commit. I'd jump to RC1 immediately instead of trying all release candidates, if that's broken as well, then it's bisect. With 23.05 (.5?) as your last working commit.

1 Like

Thanks but I'm a very basic git user. Allthough I learned CVS, SVN when they were the norm, I kinda gave up on Mercurial and particularly git and gh.

There's no deep git knowledge required, all you need are commit hashes. Git bisect will ask you to mark a commit as good and another one as bad. Then its algorithm will start picking random commits to narrow the scope, which you again mark as bad (functionality broken) or good (working).

2 Likes

RC4 presented the same connectivity issue. Now running rc2.

RC1 is not an option to test. Modules not found:


Collected errors:
 * pkg_hash_check_unresolved: cannot find dependency kmod-nft-core for nftables-json
 * pkg_hash_check_unresolved: cannot find dependency kmod-nft-core for nftables-nojson
 * pkg_hash_fetch_best_installation_candidate: Packages for nftables found, but incompatible with the architectures configured
 * opkg_install_cmd: Cannot install package nftables.
 * pkg_hash_check_unresolved: cannot find dependency kmod-nft-core for firewall4
 * pkg_hash_check_unresolved: cannot find dependency kmod-nft-fib for firewall4
 * pkg_hash_check_unresolved: cannot find dependency kmod-nft-offload for firewall4
 * pkg_hash_check_unresolved: cannot find dependency kmod-nft-nat for firewall4
 * pkg_hash_check_unresolved: cannot find dependency kmod-ipt-core for firewall
 * pkg_hash_check_unresolved: cannot find dependency kmod-ipt-conntrack for firewall
 * pkg_hash_check_unresolved: cannot find dependency kmod-nf-conntrack6 for firewall
 * pkg_hash_check_unresolved: cannot find dependency kmod-ipt-nat for firewall
 * pkg_hash_fetch_best_installation_candidate: Packages for uci-firewall found, but incompatible with the architectures configured
 * satisfy_dependencies_for: Cannot satisfy the following dependencies for luci:
 *      kmod-nft-core
 *      kmod-nft-fib
 *      kmod-nft-offload
 *      kmod-nft-nat
 *      kmod-nft-core
 * opkg_install_cmd: Cannot install package luci.
 * opkg_install_cmd: Cannot install package kmod-fs-vfat.
 * pkg_hash_check_unresolved: cannot find dependency libuci20130104 for block-mount
 * pkg_hash_fetch_best_installation_candidate: Packages for block-mount found, but incompatible with the architectures configured
 * opkg_install_cmd: Cannot install package block-mount.
 * pkg_hash_check_unresolved: cannot find dependency kmod-nf-conntrack-netlink for libnetfilter-conntrack3
 * pkg_hash_fetch_best_installation_candidate: Packages for libnetfilter-conntrack3 found, but incompatible with the architectures configured
 * opkg_install_cmd: Cannot install package kmod-nft-offload.
 * pkg_hash_check_unresolved: cannot find dependency kmod-ppp for ppp
 * pkg_hash_fetch_best_installation_candidate: Packages for ppp found, but incompatible with the architectures configured
 * opkg_install_cmd: Cannot install package ppp.
 * pkg_hash_check_unresolved: cannot find dependency kmod-pppoe for ppp-mod-pppoe
 * pkg_hash_fetch_best_installation_candidate: Packages for ppp-mod-pppoe found, but incompatible with the architectures configured
 * opkg_install_cmd: Cannot install package ppp-mod-pppoe.
1 Like

I have both ERL and USG, if you can advise your setup I can leave mine running, I'd hate it to be your usb drive needing reflashing

Here we go,

I don't think it's the USB drive. It's only a few months old and has been flashed manually from a dd image to recover from a mistake.

Could be eth0 though. When I was with Bell, eth0 was the WAN and I experienced quite a lot of 'waiting for PADO' timeouts. With my current ISP, WAN was now on eth2 and no timeouts with Ubiquiti firmware.

Now with Openwrt firmware, eth0 is back in business, serving LAN this time.

Coud very well be the config still. It's an old router model, I purchased it second hand on Ebay years ago, there are no real examples of working config available and I did all from scratch using a combination of manual edit and Luci.

Config has been uploaded.

BTW, rc2 failed too so I'm back to 23.05.5 to test eth0 again. If I still have issues, I might try to config eth2 instead (unused).

1 Like

Well, well,

23.05.5 failed as well.

Contemplating the possibility it's a hardware issue. Installed 24.10.0 and connected the LAN to eth2. Eth0 disconnected, unused. Required edits to /etc/board.json and /etc/boards.d/* in addition to /etc/config/network in order to get Luci not to display the default eth0 in the page https://router/cgi-bin/luci/admin/status/overview. Currently under test.

1 Like