DHCP Offer packets blocked

I have a Zbtlink ZBT-WR8305RT running OpenWrt 18.06.4 that I am trying to use as a WiFi to Ethernet bridge that does not need rebooted periodically. On the LAN I have a CentOS server providing the DNS and DHCP services. On the OpenWRT LAN settings I have ticked the "Ignore interface" option for the "DHCP Server" settings.

It works as expected when the OpenWRT router is started but after it has been running a while the WiFi connected devices can not get or renew their DHCP leases. With Wireshark on the server I can see the "DHCP Discover" message from the client and the "DHCP Offer" reply from the server. Running Wireshark on the client I can see it send the "DHCP Discover" but the reply is not seen. Rebooting the OpenWRT router resolves the problem for a while but then occurs again. Using the restarting the firewall does not help. I have also tried enabling the OpenWRT router as effectively a caching DHCP/DNS but that did not appear to change the behavior.

To me it appears that after some time operating it stops forwarding port 67 to port 68 messages from Ethernet to WiFi. Does anyone have any suggestions of what the cause may be?

Just to confirm, the wireless segment is bridged to the wired segment, correct? brctl show should show the Ethernet interface (or sub-interface) and wireless virtual interface in the same bridge.

1 Like

What is the output of uci show network; uci show dhcp; uci show wireless ?
Use preformated text when you paste the output (Ctrl-Shift-c)

1 Like

Yes, the default LAN-WiFi bridge:

root@ap1:~# brctl show
bridge name	bridge id		STP enabled	interfaces
br-lan		7fff.6460f55298da	no		eth0.1
							wlan0

Hi trendy, here is the requested output:

network.loopback=interface
network.loopback.ifname='lo'
network.loopback.proto='static'
network.loopback.ipaddr='127.0.0.1'
network.loopback.netmask='255.0.0.0'
network.globals=globals
network.globals.ula_prefix='fde6:6dbf:9613::/48'
network.lan=interface
network.lan.type='bridge'
network.lan.ifname='eth0.1'
network.lan.proto='static'
network.lan.netmask='255.255.255.0'
network.lan.ip6assign='60'
network.lan.ipaddr='10.11.1.21'
network.lan.gateway='10.11.1.1'
network.lan.dns='10.11.1.1'
network.lan_dev=device
network.lan_dev.name='eth0.1'
network.lan_dev.macaddr='64:60:f5:52:98:da'
network.wan=interface
network.wan.ifname='eth0.2'
network.wan.proto='dhcp'
network.wan_dev=device
network.wan_dev.name='eth0.2'
network.wan_dev.macaddr='64:60:f5:52:98:db'
network.wan6=interface
network.wan6.ifname='eth0.2'
network.wan6.proto='dhcpv6'
network.@switch[0]=switch
network.@switch[0].name='switch0'
network.@switch[0].reset='1'
network.@switch[0].enable_vlan='1'
network.@switch_vlan[0]=switch_vlan
network.@switch_vlan[0].device='switch0'
network.@switch_vlan[0].vlan='1'
network.@switch_vlan[0].ports='0 1 2 3 6t'
network.@switch_vlan[1]=switch_vlan
network.@switch_vlan[1].device='switch0'
network.@switch_vlan[1].vlan='2'
network.@switch_vlan[1].ports='4 6t'
dhcp.@dnsmasq[0]=dnsmasq
dhcp.@dnsmasq[0].domainneeded='1'
dhcp.@dnsmasq[0].localise_queries='1'
dhcp.@dnsmasq[0].rebind_protection='1'
dhcp.@dnsmasq[0].rebind_localhost='1'
dhcp.@dnsmasq[0].local='/lan/'
dhcp.@dnsmasq[0].domain='lan'
dhcp.@dnsmasq[0].expandhosts='1'
dhcp.@dnsmasq[0].readethers='1'
dhcp.@dnsmasq[0].leasefile='/tmp/dhcp.leases'
dhcp.@dnsmasq[0].resolvfile='/tmp/resolv.conf.auto'
dhcp.@dnsmasq[0].nonwildcard='1'
dhcp.@dnsmasq[0].localservice='1'
dhcp.@dnsmasq[0].server='10.11.1.1'
dhcp.lan=dhcp
dhcp.lan.interface='lan'
dhcp.lan.dhcpv6='server'
dhcp.lan.ra='server'
dhcp.lan.ra_management='1'
dhcp.lan.ignore='1'
dhcp.wan=dhcp
dhcp.wan.interface='wan'
dhcp.wan.ignore='1'
dhcp.odhcpd=odhcpd
dhcp.odhcpd.maindhcp='0'
dhcp.odhcpd.leasefile='/tmp/hosts/odhcpd'
dhcp.odhcpd.leasetrigger='/usr/sbin/odhcpd-update'
dhcp.odhcpd.loglevel='4'
wireless.radio0=wifi-device
wireless.radio0.type='mac80211'
wireless.radio0.hwmode='11g'
wireless.radio0.path='platform/10180000.wmac'
wireless.radio0.htmode='HT20'
wireless.radio0.country='00'
wireless.radio0.legacy_rates='1'
wireless.radio0.channel='1'
wireless.default_radio0=wifi-iface
wireless.default_radio0.device='radio0'
wireless.default_radio0.network='lan'
wireless.default_radio0.mode='ap'
wireless.default_radio0.ssid='ACCL'
wireless.default_radio0.encryption='psk2'
wireless.default_radio0.key='xxx'

As a precaution enable STP.
One more thing to try, when this problem occurs, is to have a host with static IP and start pinging the DHCP server and the gateway.
Does it occur only to wifi connected hosts or wired ones too?

1 Like

Ok, I have turned on STP:

root@ap1:~# brctl show
bridge name	bridge id		STP enabled	interfaces
br-lan		7fff.6460f55298da	yes		eth0.1
							wlan0

then restarted the LAN interface from the LuCI interface but it did not help.

Wired devices are unaffected.

I tried setting a static IP address on a WiFi connected laptop and it did not work. I can see traffic from it on my server so assume there is actual an issue with all traffic going back after a while and I was being mislead by my server logs which are only reporting dhcpd issues.

Thanks for suggesting that as I now realise I am probably chasing something more general issue and my post subject probably needs updated.

When the issue occurs can you ping the bridge IP 10.11.1.21 ?
Does the bridge have valid arp entries to know where to find each host?
Is there anything suspicious in the logs?

I can ping it ok from the Ethernet LAN network but not the WiFi network. The arp table on the router shows entries for the Ethernet devices that have been connected to it but only one of the WiFi devices and I am assuming that is a cached entry that has not timed out since the problem occurred (I'm rebooting often to keep WiFi users happy). On the WiFi client there is no arp entries, I assume they have all expired.

The OpenWRT system log looks pretty much the same all the way through and I think are ok. Here is that tail of it:

Wed Jul 24 22:11:06 2019 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 38:f7:3d:6a:54:f9
Wed Jul 24 22:11:06 2019 daemon.info hostapd: wlan0: STA 38:f7:3d:6a:54:f9 WPA: pairwise key handshake completed (RSN)
Wed Jul 24 22:11:13 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED e0:4f:43:db:a5:ef
Wed Jul 24 22:11:13 2019 daemon.info hostapd: wlan0: STA e0:4f:43:db:a5:ef IEEE 802.11: disassociated
Wed Jul 24 22:11:14 2019 daemon.info hostapd: wlan0: STA e0:4f:43:db:a5:ef IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Wed Jul 24 22:11:17 2019 daemon.info hostapd: wlan0: STA e0:4f:43:db:a5:ef IEEE 802.11: authenticated
Wed Jul 24 22:11:17 2019 daemon.info hostapd: wlan0: STA e0:4f:43:db:a5:ef IEEE 802.11: associated (aid 5)
Wed Jul 24 22:11:17 2019 daemon.notice hostapd: wlan0: AP-STA-CONNECTED e0:4f:43:db:a5:ef
Wed Jul 24 22:11:17 2019 daemon.info hostapd: wlan0: STA e0:4f:43:db:a5:ef WPA: pairwise key handshake completed (RSN)
Wed Jul 24 22:11:22 2019 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 68:37:e9:84:00:57
Wed Jul 24 22:11:23 2019 daemon.info hostapd: wlan0: STA 68:37:e9:84:00:57 IEEE 802.11: authenticated
Wed Jul 24 22:11:23 2019 daemon.info hostapd: wlan0: STA 68:37:e9:84:00:57 IEEE 802.11: associated (aid 6)
Wed Jul 24 22:11:24 2019 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 68:37:e9:84:00:57
Wed Jul 24 22:11:24 2019 daemon.info hostapd: wlan0: STA 68:37:e9:84:00:57 WPA: pairwise key handshake completed (RSN)

The kernel log is showing an error which might be relevant, here is the tail of starting from near the end of the system boot:

[   27.057904] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   28.028694] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   28.110823] br-lan: port 2(wlan0) entered blocking state
[   28.121500] br-lan: port 2(wlan0) entered disabled state
[   28.132562] device wlan0 entered promiscuous mode
[   30.818313] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   30.831267] br-lan: port 2(wlan0) entered blocking state
[   30.841891] br-lan: port 2(wlan0) entered listening state
[   32.847994] br-lan: port 2(wlan0) entered learning state
[   34.221060] br-lan: port 2(wlan0) entered forwarding state
[   34.232008] br-lan: topology change detected, propagating
[   37.818565] random: crng init done
[ 1782.202924] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.221369] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.239783] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.258187] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.276589] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.294985] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.313385] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.331787] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.350185] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.368586] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.386982] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.405378] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.423775] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.442172] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.460569] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.478965] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.497362] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.515759] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.534164] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.552575] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.570977] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.589373] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.607770] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.626167] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1782.644564] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 1910.772251] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 2005.996851] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 2006.015306] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 2006.033726] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 2429.443725] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2
[ 2429.462192] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2

These rt2x00queue errors have been a long standing issue with rt2x00 (so I wouldn't hold my breath for ever getting this fixed), but there has been activity on this particular issue recently - please try snapshots instead of 18.06.x.

3 Likes

Thanks. Searching on that kernel error message does back up that it is not actually a simple dropped frame and explains the things I have seen. I saw early mention of disabling WMM so I am trying that and the latest snapshot as per slh's suggestion. I post an update on how it goes after it has been running a while.

Disabling WMM also means disabling 802.11n and restricting yourself to 54 MBit/s, I'd recommend against that.

2 Likes

I suspected that was the case but a reliable 54Mb connection is more useful to me than a faster one that takes everything off line periodically until rebooted.

Overnight it has run reliably so I have just turned back on WMM and will see if the problem returns.

A similar issue here and a bug report here:

It has been running now for a while without this issue. I am seeing other issues but they are minor by comparison.

For anyone reading this thread in future the summary is:

  • The issue was not related to DHCP, it just that the LAN's server was generating log events about that since they were affected.
  • The real issue was seen as "ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2" messages in the kernel log and is know issue with the current stable builds (18.06.4) for the MT7620 SOC.
  • The best fix is to flash the system with the current snapshot build. Hopefully the fixes in the snapshot build will find their way in to a release build soon.

My thanks to all the people you took the time to reply and point me in the right direction :slight_smile:

4 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.