Ath79 builds with all kmod packages through opkg [flow offloading]

I flashed the latest 4.14.51 based kernel version build available here for the 1043ND v2.

After flashing, I changed LAN IP from 192.168.1.1 to 192.168.10.2, then got the "30s to apply" message, then a message that states settings will be rolled back. It wouldn't go away. Waited a few minutes.

This is when the device becomes unresponsive and unreachable. It won't provide a DHCP IP, and won't be reachable if manually configuring the network adapter to 192.168.1.2 or 192.168.10.3.

Same behavior as reported by @dacarrs. A reset done through the reset button then makes the device reachable over at 192.168.1.1 as expected. Doing the exact same steps after this reset results in the same behavior.

I wanted to go back to an AR71xx snapshot I was using until then. Luci wouldn't accept a stripped stock firmware in this test ath79 build, so I recovery+TFTP'd back to stock.


No, @juppin I don't think I missed something, I've been using openwrt/LEDE for years since DDWRT stopped satisfying my needs and never had an issue changing the LAN IP through luci. I've had to use snapshots for a while since I also have an Archer C7 v4 as my main router and WDS AP, and there wasn't any issues installing luci through opkg and then configuring the LAN IP to 192.168.10.1. My 1043ND V2 is a WDS client and provides wired+wireless connectivity to the rest of the house.

If this is isn't a luci issue since it's going through some changes lately, I don't really know what could it be. I just reported my experience so that you and the more experienced people here would know about it.

I didn't try changing the LAN IP through ssh and then rebooting, though.

This sounds like a platform specific bug to me then. LuCI relies on netifd/procd to revert the effective network config to its previous defaults. If that does not work it might be something ath79 specific. Due to a lack of ath79 compatible hardware I cannot really test that though.

Whats your client operating system? Windows? OS X? Linux? Can you watch the status of your local network interface while apply/rollback is running? Is it renegotiating DHCP? Is a manual disabling and reenabling of the network interface helping or did it keep its old 192.168.1.x IP all the time?

W10 Pro 1803 (17134.137, latest patch applied), same behavior on both Chrome 67.0.3396.99 and Firefox 61.0, both 64 bit builds

I remember the network interface disabling itself as if losing connectivity and then reenabling itself as usual when you do a LAN IP change, but then would remain with the dreaded yellow exclamation sign. Checking the network interface status I would then get one of those generic 169.254.xxx.xxx/16 addresses you get when there is no DHCP server running to hand out a proper address.

Disabled then enabled the network interface, no difference. Manually set it to 192.168.10.3/24, web interface unreachable. Tried to SSH in, no response. Trying 192.168.1.2 resulted in the same behavior, no web interface nor SSH response.

ipconfig /release, ipconfig /renew in an admin cmd wouldn't do anything. No DHCP renegotiation. Dead.

This is when I reset and tried these steps again, resulting in the same behavior.


Just to be clear, I didn't do any weird ar71xx -> ath79 upgrade that would screw things up (went back to stock and flashed the provided ath79 factory image), and after resetting on the ath79 build, I had the same behavior.

Thanks for the details, this indeed sounds like some underlying issue with ath79 to me then. This is something that most likely requires a serial console to debug...

... what I mean is that eventually the device should be reachable with either the old or the new ip, a state where the network interface on the router side is misconfigured or not up, should not happen.

What should i do with serial console to get more information on this issue?
If i change a ip address of a network interface other than the interface from that the current luci session is served it will work without problems. Seems it only occurs on the interface which serves the luci session.

Should there be two ip´s on the interface until the new one is verified to work?

First of all check if you end up in this stuck state:

... or if you eventually end up with that:

In the latter case, everything works as expected (you could force changing the LAN ip with "Apply unchecked" and reconnect/relogin manually afterwards).

In the former case, something went wrong with the network reconfiguration after rolling back the uci configuration.

While you see ...


... ifconfig br-lan on the router should show the new, changed IP address.

Once the countdown is over and you see ...


... ifconfig br-lan on the router should revert back to the previous, original IP address.

The eventually causes the browser to be able to "ping" http://[old-ip-address]/chi-bin/luci/... again which will then trigger this dialog:

If that last dialog is never appearing, this means that for some reason, the br-lan interface is not reverting back to its old IP address, or if it did, some lower level issue is preventing it from communicating with the outside.

If it is stuck in such a state, I'd further try via serial if things like ifup lan, ifup -a, echo f > /proc/net/nf_conntrack are fixing it. It would also be interesting to observer both logread -f and the output of dmesg while such an apply/roolback session is running to see if there are any carrier events, protocol handlers executed and the like.

1 Like

@jow Here are my test results.

I´m ending in stuck state with "failed to confirm apply withing..."

Periodical (every 5 sec) output of ifconfig br-lan and dmesg while changing default ip to 192.168.2.1: https://pastebin.com/xxDAwwGd

The output of logread -f for the same action: https://pastebin.com/gnXCtpMa

After revert i could not ping the device on 192.168.1.1 and luci never showed dialog to revert or apply unchecked.
I´ve tried ifup lan, ifup -a and echo f > /proc/net/nf_conntrack but device doesn´t resond to pings afterwards anyway.

Then i´ve tried if down lan && ifup lan and does also not work...
If i do ifdown lan && sleep 1 && ifup lan the device does respond to pings and the dialog to revert or apply unchecked on luci shows up.

Don´t have a clue why it needs a delay between ipdown and ifup :thinking:

Does it also work normally if you do an "/etc/init.d/network reload", "ubus call network reload" or "ifup -a" after changing the ip?

If i change the ip in /etc/config/network and do a /etc/init.d/network restart it will work normally, but with /etc/init.d/network reload it will not work

I see. Thats not entirely unexpected. A restart will completely tear down network, stop netifd, start netifd again and reinitialize everything from scratch. A reload, however is supposed to incrementally apply only changed settings.

Can you check if flushing the neighbour cache (ip neigh flush dev br-lan) makes any difference after reload? Can you also try to ping6 the link-local IPv6 address? (ping6 fe80::...%eth0 from a connect client)

With a non working interface:

root@OpenWrt:/# ip neigh flush dev br-lan
Nothing to flush

With a working iface:

root@OpenWrt:/# ip neigh flush dev br-lan
192.168.1.2 lladdr 28:d2:44:87:59:b1 ref 1 used 0/0/0 probes 4 REACHABLE

*** Round 1, deleting 1 entries ***
*** Flush is complete after 1 round(s) ***

ping6 does also not work after ip address change and a reload, after a restart it does work.

I guess its boils down to low level ag71xx ethernet driver things and/or interaction with the switch. Will try to gather some more opinion in irc.

1 Like

Can you please also compare brctl show output before and after?

The same output before and after...

root@OpenWrt:/# brctl show
bridge name     bridge id               STP enabled     interfaces
br-lan          7fff.ec086b8ac472       no              eth1.1
root@OpenWrt:/# nano /etc/config/network # change ip of lan to 192.168.2.1
root@OpenWrt:/# /etc/init.d/network reload
root@OpenWrt:/# brctl show
bridge name     bridge id               STP enabled     interfaces
br-lan          7fff.ec086b8ac472       no              eth1.1
root@OpenWrt:/# /etc/init.d/network restart
root@OpenWrt:/# brctl show
bridge name     bridge id               STP enabled     interfaces
br-lan          7fff.ec086b8ac472       no              eth1.1

Changing lan to non bridged and only to eth1.1 does work with a relaod... But if i change ip on the non bridged interface same behavior as with a bridged lan.

@blogic has a specific idea on what might be wrong and plans to look into it tomorrow. Could be that some required patches to adjust_link() didn't get ported over to ath79.

2 Likes

@jow Thanks for your effort and your time.

@jrambo99 and @dacarrs Thanks for reporting this issue.

3 Likes

When the devs deem the target mature the buildbots will pick it up. Support for lots of devices is being brought up just now, and those devices need to be tested. Kinks are being worked out. You don't want people flashing that stuff only to find out they need serial to recover their device (or, worse, need to buy a new router).

I'm not at all familiar with DTS, but I wonder how much different Archer C7 v4 is from C7 v2? Any chance you can create (and send a PR for) the DTS for C7 v4 and include it in your builds?

PS. I've also sent an e-mail to the contributor of the DTS for the C7 v2.

show us a dmesg and if you are willing to test it, we will help you, but come to the dedicated thread or open a new one

2 Likes

I´ve added a new build!

Changelog:

  • new kernel version 4.14.52
  • fixed sysupgrade on TP-Link WR1043V4
  • fixed wan mac address on TP-Link WR1043V4

Download 4.14.52

Greets

2 Likes