Prevent physical device/port attached to pppoe-wan from going down? (22.03, switchless x86)

I am running 22.03 on an x86-64 device (Sophos XG 86w) and have a curious behaviour that I wish to correct, but I am not familiar enough with the internals to know how.

At present, my WAN is running PPPoE, directly connected to eth0 (the router does not have a switch), which is directly connected to an NTT FTTH ONT:

config interface 'wan'
        option device 'eth0'
        option proto 'pppoe'
        (credentials skipped)

The issue I'm having is that if the wan (PPPoE) interface is disconnected (either through ifup wan or if the other side of the connection temporarily buggers up, which occasionally happens), it causes the eth0 device, the actual physical port, to go down. The resulting song and dance that repeats until the PPPoE connection becomes available again is as follows:

daemon.warn pppd[16211]: Timeout waiting for PADO packets
daemon.err pppd[16211]: Unable to complete PPPoE Discovery
daemon.info pppd[16211]: Exit.
daemon.notice netifd: Interface 'wan' is now down
kern.info kernel: [ 3310.791658] r8169 0000:02:00.0 eth0: Link is Down
daemon.notice netifd: Interface 'wan' is disabled
kern.info kernel: [ 3310.842747] Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-0-200:00, irq=IGNORE)
daemon.notice netifd: Interface 'wan' is enabled
daemon.notice netifd: Interface 'wan' is setting up now

It's port goes up, port goes down until the PPPoE connection is reestablished. I can actually see the ports physically going up and down on both sides of the ethernet cable.

My question would now be: Is there any way to "decouple" the pppoe interface from the physical eth0 port, so it doesn't constantly re-negotiate the physical connection?

I'm also not quite sure if I just didn't observe it with my other X86 routers which have Intel ethernet ports, or it actually does not happen there ... is this a Realtek thing?

Edit/addendum: it feels related that when wan goes down in this manner following a PPPoE failure, netifd throws an error/warning in the logfile:

daemon.notice netifd: wan (16118): Command failed: ubus call network.interface notify_proto { "action": 1, "command": [ "\/usr\/sbin\/pppd", "nodetach", "ipparam", "wan", "ifname", "pppoe-wan", "lcp-echo-interval", "1", "lcp-echo-failure", "5", "lcp-echo-adaptive", "nodefaultroute", "usepeerdns", "maxfail", "1", "user", "(removed)", "password", "(removed)", "ip-up-script", "\/lib\/netifd\/ppp-up", "ip-down-script", "\/lib\/netifd\/ppp-down", "mtu", "1492", "mru", "1492", "plugin", "pppoe.so", "nic-eth0" ], "interface": "wan" } (Permission denied)

From your description, it isn't actually clear if the bouncing of the physical link is happening on the OpenWrt side or the modem side, but there is an easy way to try to figure that out...

-> insert a basic unmanaged switch between the router and the modem. (don't connect anythign else to this switch)

This way, the switch will manage the physical link between each of the devices and you will be able to see which side is responsible for the bounces since you'll be able to see the LEDs on one side change state (if it still happens) while the other will remain connected.

That said, it is possible that a physical bounce of the port is needed to initiate an auto-redial attempt (I'm not really very knowledgable about PPPoE, though, and I don't know if this is true -- I could be making this up).

Another possibility here is that you have a marginal cable that is causing the port to flap like that and may even be responsible for some of the connectivity issues you're experiencing.

It is by design. OpenWrt will restart the port if the connection fails. You may be able to avoid this by creating a bridge instead of connecting ppp directly to the eth port.

That was my first instinct, but it didn't change anything about the behaviour. I created a br-wan bridge with a single port eth0 and connected the wan interface to br-wan. Still the port restarts when the PPPoE connection is lost:

daemon.warn pppd[4384]: Timeout waiting for PADO packets
daemon.err pppd[4384]: Unable to complete PPPoE Discovery
daemon.info pppd[4384]: Exit.
daemon.notice netifd: Interface 'wan' is now down
kern.info kernel: [   70.376464] device eth0 left promiscuous mode
kern.info kernel: [   70.381178] br-wan: port 1(eth0) entered disabled state
daemon.notice netifd: Interface 'wan' is disabled
kern.info kernel: [   70.613884] Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver [Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-0-200:00, irq=IGNORE)
kern.info kernel: [   70.933054] r8169 0000:02:00.0 eth0: Link is Down
kern.info kernel: [   70.938289] br-wan: port 1(eth0) entered blocking state
kern.info kernel: [   70.943908] br-wan: port 1(eth0) entered disabled state
kern.info kernel: [   70.949672] device eth0 entered promiscuous mode
daemon.notice netifd: Interface 'wan' is enabled
kern.info kernel: [   73.828183] r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control off
kern.info kernel: [   73.836046] br-wan: port 1(eth0) entered blocking state
kern.info kernel: [   73.841625] br-wan: port 1(eth0) entered forwarding state
daemon.notice netifd: Network device 'eth0' link is up
kern.info kernel: [   73.847868] IPv6: ADDRCONF(NETDEV_CHANGE): br-wan: link becomes ready
daemon.notice netifd: bridge 'br-wan' link is up
daemon.notice netifd: Interface 'wan' has link connectivity
daemon.notice netifd: Interface 'wan' is setting up now

I wonder, is there something that I missed when setting up the bridge?

At the moment, this is mostly cosmetic, but what I'm mostly worried about is that I will soon be moving toward a dual PPPoE + IPoE setup for my WAN, technically with two different providers on the same line (yes, that's possible in Japan), and that a failure of the PPPoE connection will tear down the port, taking the other connection with it. I hope that will not happen if the port is in promiscuous mode, recognizing there's still another interface connected to it, but I would like to know for sure if there's a problem looming.

(Also, thanks @psherman, but it is definitely the router's port that cycles, and of course I eliminated a faulty cable as the culprit.)

Maybe try adding:

	option force_link '1'

to the wan section in /etc/config/network? At on OpenWrt 19-based turris 5.4 that solved that very issue for me... (my real issue was/is a severe hotplug problem which somehow managed to not running the iface hotplug scripts at all). Not sure whether 22.03 still offers that and whether it will do the right thing for you.

Unfortunately that doesn't change anything either, tried both with wan directly attached to eth0 and to a bridge containing eth0.

It feels like I just have to wait and see how it behaves once I do my IPoE setup.

This is only a pppoe "feature." IPoE (proto dhcp) will not down the port.

Alright, but the other way around, if PPPoE takes down the port it would take down the IPoE connection with it. [edited] it doesn't, see below.

Come to think of it, this definitely does not happen on my other router. Even if PPPoE fails, the port stays up, maybe because I also have a second static interface defined on it to access the modem and PPPoE realizes it's not the only interface on the port? Need to try that later, right now I can't fiddle with the network.

That did not suffice on my turris omnia, I had to set the force option, that unfortunately does not do anything useful for you.

1 Like

Success! I defined a proforma interface on device eth0.254 (to keep it off the regular untagged port):

config interface 'wanport_keepalive'
        option device 'eth0.254'
        option proto 'none'

When I now issue ifup wan, the port stays up. Evidently, whatever handles the restart on the PPPoE interface (proto_block_restart?) is smart enough to see that it is not the only interface on the port and doesn't physically tear it down anymore. That gives me confidence that, once I define a real secondary interface on the port, it will not go down with a PPPoE connection failure.

Thanks to everyone for chiming in!

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.