I noticed a timing issue when testing pppoe disconnect/restart scenario.
What happens is that when we do teardown of the interface, we release the device and reclaim the device faster than the device driver notifies us about the resulting link down from the device release.
The result is that during the setup we receive the link down notification from the driver and the setup gets interrupted, leading to a new teardown and the cycle repeats.
Some log highlights to illustrate the issue (some debugging added so line numbers may be off):
Wed Feb 10 09:38:00 2021 daemon.info pppd[14503]: No response to 5 echo-requests
Wed Feb 10 09:38:00 2021 daemon.notice pppd[14503]: Serial link appears to be disconnected.
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: interface_proto_event_cb(777): IFPEV_LINK_LOST
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: proto_shell_handler(233): run teardown for interface 'wan'
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: interface_proto_event_cb(763): IFPEV_DOWN
**Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: device_release(452): Release Network device eth0, new active count: 0** *<--- here dev->set_state(dev, false); will eventually trigger Link DOWN from driver*
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: interface_set_up(1115): Claiming device *<--- Since iface->autostart is set, we attempt to set the interface up again*
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: device_claim(416): Claim Network device eth0, new active count: 1
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: proto_shell_handler(233): run setup for interface 'wan'
Wed Feb 10 09:38:06 2021 daemon.err netifd[4599]: netifd_start_process(177): starting process 15034: /usr/sbin/pppd nodetach ipparam
**Wed Feb 10 09:38:08 2021 kern.crit kernel: [57690.144000] eth0 (Int switch port: 0) (Logical Port: 0) (phyId: c) Link DOWN.** *<--- Now the driver alerts us to the Link DOWN, notice the delay*
Wed Feb 10 09:38:08 2021 daemon.err netifd[4599]: cb_rtnl_event(604): set link down
Wed Feb 10 09:38:08 2021 daemon.err netifd[4599]: proto_shell_handler(202): Kill setup *<--- Since we already started the setup, it is interrupted*
Wed Feb 10 09:38:08 2021 daemon.err netifd[4599]: proto_shell_handler(233): run teardown for interface 'wan'
So how do I get around this issue?