Netifd interface restart timing issue

I noticed a timing issue when testing pppoe disconnect/restart scenario.
What happens is that when we do teardown of the interface, we release the device and reclaim the device faster than the device driver notifies us about the resulting link down from the device release.
The result is that during the setup we receive the link down notification from the driver and the setup gets interrupted, leading to a new teardown and the cycle repeats.

Some log highlights to illustrate the issue (some debugging added so line numbers may be off):

Wed Feb 10 09:38:00 2021 daemon.info pppd[14503]: No response to 5 echo-requests
Wed Feb 10 09:38:00 2021 daemon.notice pppd[14503]: Serial link appears to be disconnected.
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: interface_proto_event_cb(777): IFPEV_LINK_LOST
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: proto_shell_handler(233): run teardown for interface 'wan'
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: interface_proto_event_cb(763): IFPEV_DOWN
**Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: device_release(452): Release Network device eth0, new active count: 0**   *<--- here dev->set_state(dev, false); will eventually trigger Link DOWN from driver*
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: interface_set_up(1115): Claiming device   *<--- Since iface->autostart is set, we attempt to set the interface up again*
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: device_claim(416): Claim Network device eth0, new active count: 1
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: proto_shell_handler(233): run setup for interface 'wan'
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: netifd_start_process(177): starting process 15034: /usr/sbin/pppd nodetach ipparam
**Wed Feb 10 09:38:08 2021 kern.crit kernel: [57690.144000] eth0 (Int switch port: 0) (Logical Port: 0) (phyId: c) Link DOWN.**   *<--- Now the driver alerts us to the Link DOWN, notice the delay*
Wed Feb 10 09:38:08 2021 daemon.err netifd[4599]: cb_rtnl_event(604): set link down
Wed Feb 10 09:38:08 2021 daemon.err netifd[4599]: proto_shell_handler(202): Kill setup   *<--- Since we already started the setup, it is interrupted*
Wed Feb 10 09:38:08 2021 daemon.err netifd[4599]: proto_shell_handler(233): run teardown for interface 'wan'

So how do I get around this issue?

1 Like

FYI if someone else stumbles upon such isseus in the future: It is easily solved by setting option force_link '1' on the wan interface.
This tells netifd to ignore netlink messages on link state from the underlying device.
This means that setup will not be interrupted by delayed link down notification.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.