Disappearing DHCP Offers & bridge ageing_time

Hi,

I previously reported an issue with my clients having issues roaming. See here.

I first thought I had solved the issue, but I was wrong. And now I stumbled upon the following topic. And the description fits my issue perfectly, however I think the root cause wasn't identified.

Here is my setup:

router (Archer C6v3) => PoE-Switch (TL-SG105PE) |=> ap-1 (EAP235-WALL)
OpenWrt 22.03.3         Stock FW (latest)       |   OpenWrt 22.03.3
10.0.10.1               10.0.10.2               |   10.0.10.3
DHCP-Server                                     |   Dumb AP (No FW OR DHCP)
                                                |
                                                |=> ap-2 (EAP235-WALL)
                                                    OpenWrt 22.03.3
                                                    10.0.10.5
                                                    Dumb AP (No FW OR DHCP)

There are also multiple VLANs, but I don't believe these are the root cause as the primary VLAN is able to reproduce the DHCP issue.

The SSIDs do indeed show +FT and roaming works fine. However if any mobile device is switching the AP, the DHCPDISOCVER & DHCPOFFER lopp starts as previously stated:

C6 | Wed Mar 15 09:50:03 2023 daemon.info dnsmasq-dhcp[1]: DHCPDISCOVER(br-lan.10) de:ad:be:ef:d3:3c
C6 | Wed Mar 15 09:50:03 2023 daemon.info dnsmasq-dhcp[1]: DHCPOFFER(br-lan.10) 10.0.10.149 de:ad:be:ef:d3:3c
[...] 14 more repetitions
C6 | Wed Mar 15 09:52:35 2023 daemon.info dnsmasq-dhcp[1]: DHCPDISCOVER(br-lan.10) de:ad:be:ef:d3:3c
C6 | Wed Mar 15 09:52:35 2023 daemon.info dnsmasq-dhcp[1]: DHCPOFFER(br-lan.10) 10.0.10.149 de:ad:be:ef:d3:3c

After some minutes the device then magically receives the DHCPOFFER and DHCPREQUEST & DHCPACK are exchanged and the client is back onlione.
I also tried to tcpdump on the different devices to narrow the issue down, but as stated by @overmyhead in his post the packets just do not seem to arrive at the "latest" AP of the device until some time expires.
Is there a common sulotion for this scenario?

I also read about the cron script of @Brain2000, but I'm not sure which device this script is supposed to be running on. On the router or on all of the devices?

BR Daniel

You should probably check how the TP-Link switch behaves when a MAC address learned on one port appears on another port.

Check if the MAC address table is updated immediately (as it should be) or if it takes some time (for example, waiting for the aging time to expire).

You mean the MAC address table on the TL-SG105PE switch? Unfortunately I'm unaware how to check that. Can you provide an explanation? The switch doesn't allow ssh access or such.

Yes, but keep in mind that this is just a guess and I could be totally wrong.

When a wireless client moves from one AP to another, the host MAC address learned on one switch port appears on another switch port (where the other AP is connected). A good quality switch should immediately update its L2 forwarding table by replacing the existing entry with the same MAC address but with the more current port number.

I think the TP-Link switch continues to forward the frames over the old interface (where the previous AP is connected) for a while longer.

Sorry, I'm not familiar with your device.

I would assign a static IP to a test wifi device, start a ping session from the router and check the packet losses when switching between APs and the router.

@trendy, my apologies, I didn't read the linked topic where you explain the same thing.

1 Like

Based on the suggestion by @Brain2000 I also applied the adjustment of the bridge ageing, and it seems to solve the issue for now. At least for most situations.
However, I want to further analyze the switching behavior of the TL-SGE105PE.

This series of switches is really poorly designed (I know, I have used them) and they have a number of major flaws in the implementation. My general advice is to avoid them entirely (as well as the entry level Netgear equivalents), and to move up towards the mid-range TP-Link managed switches and/or other brands.

That said, it may not be worth your time to do additional analysis. But, if you do go to the trouble, please do share your findings - it may be interesting and useful for future readers.

2 Likes

Thanks for the advice. Do you have any recommendation of a managed PoE capable switch (maybe even capable of running OpenWRT)?

If I come to any useful conclusions in the meantime, I will share them with the community.

This all depends on your required number of ports (general and how many with PoE), features needed, management methods, etc.. On the 8 port and higher range, you can look at the TP-Link JetStream/Omada series, ZyXel 1900 series, Unifi, and some of the Netgear models (again, stay away from the entry level). I haven't used any switches with OpenWrt yet... for but search the forums for recommendations on that front.

2 Likes

This might be related to https://github.com/openwrt/openwrt/issues/11650
?

2 Likes

Thanks for pointing me to that issue. It seems to exactly describe my issue.

However I can’t downgrade from DSA, as one of the devices is too recent for receiving an old firmware.

Im going to try to repro the fix you used for mac address ageing. Can you tell me what you did please(timeout length you set)?

Im not using a switch as I believe the rpi4 router is adequate. I may need to source one if I cannot fix this because the rpi4 DSA has a defect. I'm assuming a brctl command for the ageing?

If I do repro the same defect, can you recommend a managed switch please? I dont need many ports.

On the switch I didn't alter anything and as of now I don't believe the switch is misbehaving.

On all OpenWrt systems in my network I created the following "Scheduled Task":
*/5 * * * * sleep 10 && brctl setageing br-lan 10

Since applying this, the issues is no longer noticeable.
However, I'm not sure whether it is entirely gone.