Issues with multiple networks / VLANs with a router and a dumb AP

I've been using OpenWrt for a couple of years now and overall everything is running extremely smooth. For the past couple of days I'm trying to add a new network, similar to the default lan that will house specific devices that need to be seperate and invisible from the rest of the network. Those devices are both wired and wireless. Here's a picture of my current setup:

My Awesome Network

As you can see the network consists of 3 devices, an ISP-provided modem/router in bridge mode and two OpenWrt-flashed ASUS RT-AC58U routers, which have been upgraded to OpenWrt 21.02.0-rc2 as part of this test. The first of them connects through its WAN port to the ISP's modem/router LAN1 port and does PPPoE. The second one connects to the first one, again through its WAN to the first's LAN1 port and get a DHCP-provided IP address.

The second one has everything router-related disabled and its WAN port bridged to the LAN through LuCI. It is assigned a statically assigned DHCP address (192.168.1.2 for the record) from the first router. Both devices have the same WiFi networks in both 2.4GHz and 5GHz for roaming. This setup is proven working and has served multiple devices for the past months.

Jumping to the past couple of days and my new requirement. I have created a new Firewall zone, a new Interface and new WiFi access points on the first router and I can confirm that everything works. I'm getting an IP from a different subnet (which is what I want), and devices from lan can't reach devices in my new network. I have also setup VLANs on both sides and I can get connectivity on both ends (e.g. I can get an IP from my new network on the second router's switch). I can connect to the new SSIDs and a specific LAN port and be inside my new network.

Note that the RT-AC58U has an IPQ4018 which has some issues with VLANs 1 and 2, this is why I'm using 10 and 20.

There are two issues that I'm facing (which could be related):

I'm getting a lot of Sat Jun 12 16:09:17 2021 kern.warn kernel: [ 309.102915] br-lan: received packet on eth0.10 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0) on the first device's System log. I understand that the two bridges have the same Mac address, but I'm not sure why that happens since they are on different VLANs. I'm also getting the same message for eth0.20, my other network.

Some wired devices on the first router seem to not be able to get an IP address. I can see in the logs that some of the get stuck in a loop between DHCPDISCOVER and DHCPOFFER and never actually getting an IP (the router seems to offer one though).

1 Like

Here's how I have setup the networks on my two devices:

First device:

[snipped]
config interface 'lan'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'
	option device 'br-lan'

config interface 'san'
	option proto 'static'
	option netmask '255.255.255.0'
	option ipaddr '192.168.2.1'
	option ip6assign '60'
	option device 'br-san'

config switch_vlan
	option device 'switch0'
	option vlan '20'
	option ports '0t 4t'
	option vid '20'

config switch_vlan
	option device 'switch0'
	option vlan '10'
	option ports '0t 4t 3 2 1'
	option vid '10'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0.10'
	option stp '1'

config device
	option name 'br-san'
	option type 'bridge'
	list ports 'eth0.20'
	option stp '1'
[snipped]

Note: Port 4 is LAN1 which connects to the second router's WAN port.

Second device:

[snipped]
config interface 'lan'
	option proto 'dhcp'
	option device 'br-lan'

config interface 'san'
	option proto 'dhcp'
	option device 'br-san'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0.10'
	list ports 'eth1.10'

config device
	option name 'br-san'
	option type 'bridge'
	list ports 'eth0.20'
	list ports 'eth1.20'

config switch_vlan
        option device 'switch0'
        option vlan '10'
        option ports '0t 3 2 1 5t'
        option vid '10'

config switch_vlan
	option device 'switch0'
	option vlan '20'
	option ports '0t 4 5t'
	option vid '20'
[snipped]

Note: I set LAN1 to be in VLAN 20 so I could test that it works (it did, I got an IP in the 192.168.2.x range).

1 Like

After investigation it seems that one solution would be using macvlan for the bridge interfaces, but I'm not totally sure this would fix the issue. I don't really understand why br-lan: received packet on eth0.10 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0) shows up in both bridges and VLANs.

1 Like

I've introduced a new device into the setup, a good old TP-LINK TL-WR1043ND. I enabled VLAN 20 to it and there weren't any issues (br-lan: received packet on eth0.10 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0) in the first's OpenWrt system log.

I suspect this is an issue with the IPQ40xx's switch as described here:

  1. IPQ40xx Switch Config "Strangeness"
  2. Asus RT-AC58U switch configuration in LuCi (/etc/board.json) vs swconfig command line
  3. Cant setup VLANS correctly; What am I missing? - #13 by ouifi
  4. Installation on ASUS lyra fails - ERROR in instruction! - #5 by slh

EDIT (2h later):

Since this is getting no responses I'm turning this into a captain's log to keep track of my tests and experiments.

First of all, I noticed that both bridges have the same IPv6 link-local address. Not sure if that's an issue, but somehow I managed to change the MAC address (and then their link-local addresses changed too after a reboot) but the issue persisted.

Secondly, I forwarded VLAN 20 to my second router and errors started appearing again. I upgraded my third one to 21.02.0-rc3 (no specific reason, just to keep them all on the same version) and noticed that its second interface couldn't get an IP. The first router's system log was full of the following:

Sat Jul  3 17:54:49 2021 daemon.info dnsmasq-dhcp[2481]: DHCPOFFER(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:52 2021 daemon.info dnsmasq-dhcp[2481]: DHCPDISCOVER(br-san) a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:52 2021 daemon.info dnsmasq-dhcp[2481]: DHCPOFFER(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:52 2021 daemon.info dnsmasq-dhcp[2481]: DHCPDISCOVER(br-san) a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:52 2021 daemon.info dnsmasq-dhcp[2481]: DHCPOFFER(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:54 2021 daemon.info dnsmasq-dhcp[2481]: DHCPDISCOVER(br-san) a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:54 2021 daemon.info dnsmasq-dhcp[2481]: DHCPOFFER(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:54 2021 daemon.info dnsmasq-dhcp[2481]: DHCPDISCOVER(br-san) a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:54 2021 daemon.info dnsmasq-dhcp[2481]: DHCPOFFER(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:57 2021 daemon.info dnsmasq-dhcp[2481]: DHCPDISCOVER(br-san) a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:57 2021 daemon.info dnsmasq-dhcp[2481]: DHCPOFFER(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:57 2021 daemon.info dnsmasq-dhcp[2481]: DHCPREQUEST(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx
Sat Jul  3 17:54:57 2021 daemon.info dnsmasq-dhcp[2481]: DHCPACK(br-san) 192.168.2.3 a0:f3:c1:xx:xx:xx third

If you remember I reported the same issue on my original message, with wired devices not able to get an DHCP-provided IP, but I believe it was an unrelated issue that I would fix last. Check the two last syslog messages, this is when I disabled the VLAN on the port (the one the second router is connected) from the first router's switch, effectively disabling it. This is when the third router took an IP in the second network and all error messages stopped.

I believe this is a loop happening somewhere in the second router, but I'm really stuck and not knowing how to proceed.

EDIT (5h later):

I just discovered that the same error message exists on the third (the newly added) router! This is the snippet the same time as the above (the one above is taken from the first router). Notice that it took a DHCP lease exactly the time I disabled VLAN 20 for the second router.

Sat Jul  3 17:54:34 2021 kern.warn kernel: [  523.959617] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:37 2021 kern.warn kernel: [  526.971192] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:40 2021 kern.warn kernel: [  529.983159] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:43 2021 kern.warn kernel: [  532.995531] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:46 2021 kern.warn kernel: [  536.007247] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:49 2021 kern.warn kernel: [  539.019138] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:52 2021 kern.warn kernel: [  542.031555] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:54 2021 kern.warn kernel: [  544.043126] br-san: received packet on eth0.20 with own address as source address (addr:a0:f3:c1:xx:xx:xx, vlan:0)
Sat Jul  3 17:54:57 2021 daemon.notice netifd: san (2090): udhcpc: sending select for 192.168.2.3
Sat Jul  3 17:54:57 2021 daemon.notice netifd: san (2090): udhcpc: lease of 192.168.2.3 obtained, lease time 43200

The dumbAP devices don't route, therefore there is no need to have an IP address on the secondary vlan20.

uci network.san.proto='none'
uci commit network
ifup san

Start with that and then we can further troubleshoot the error in the logs.

Yes, it doesn't route and in this case it doesn't really matter.

This is what I did:

root@second:~# uci set network.san.proto='none'
root@second:~# uci commit network
root@second:~# service network reload

Now I can see the interface is unmanaged:
image

I'm still getting the same errors on the first router's system log (note that I had the VLAN disabled, did the change above and then enabled the VLAN):

Sun Jul  4 21:23:48 2021 kern.warn kernel: [101040.670085] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 21:23:49 2021 kern.warn kernel: [101041.731366] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 21:23:50 2021 kern.warn kernel: [101042.772140] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 21:23:51 2021 kern.warn kernel: [101044.029585] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 21:23:52 2021 kern.warn kernel: [101045.091159] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 21:23:53 2021 kern.warn kernel: [101046.130727] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 21:23:56 2021 kern.warn kernel: [101048.700434] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)

I'm testing by having an SSID in san, in which I connect only a laptop and try opening a webpage. Could it be IPv6 related?

EDIT (6m later):
I disabled IPv6 on the laptop, checked that it doesn't get one, and I'm still getting the issues with the setup above

Can you verify what packets do you receive? Something like this would show:
tcpdump -i br-san -evn -Q in ether host 04:92:26:xx:xx:xx

Would it make sense to filter by source and destination having the same mac?

Your example command outputs a lot of traffic when I'm using the laptop and none when its disconnected (the error messages stop too). I tried tcpdump -i br-san -evn -Q in ether host 04:92:26:xx:xx:xx and ether dst 04:92:26:xx:xx:xx but I'm still getting a lot of packets (normal traffic as it seems). Not sure what the correct command is.

Here's some example output, while downloading a random app from MS. Seems totally normal to me:

22:49:39.077683 00:c2:c6:xx:xx:xx > 04:92:26:xx:xx:xx, ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 128, id 25367, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.2.100.53090 > 204.79.197.200.443: Flags [.], cksum 0x0014 (correct), ack 50733, win 1022, length 0
22:49:39.082875 00:c2:c6:xx:xx:xx > 04:92:26:xx:xx:xx, ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 128, id 25368, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.2.100.53089 > 204.79.197.200.443: Flags [.], cksum 0xb947 (correct), ack 7347, win 1023, length 0
22:49:39.083103 00:c2:c6:xx:xx:xx > 04:92:26:xx:xx:xx, ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 128, id 25369, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.2.100.53086 > 204.79.197.200.443: Flags [.], cksum 0x0281 (correct), ack 7347, win 1022, length 0
22:49:39.085263 00:c2:c6:xx:xx:xx > 04:92:26:xx:xx:xx, ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 128, id 25370, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.2.100.53085 > 204.79.197.200.443: Flags [.], cksum 0xf98b (correct), ack 7347, win 1022, length 0
22:49:39.085568 00:c2:c6:xx:xx:xx > 04:92:26:xx:xx:xx, ethertype IPv4 (0x0800), length 56: (tos 0x0, ttl 128, id 25371, offset 0, flags [DF], proto TCP (6), length 40)

What's really suspicious is that I've only got 2 of the error messages while testing this whole time. I have disabled IPv6 since my last message. While typing this sentence and while the traffic was stopped I noticed some weird packets and voila, 4 new errors arrived (I kept the rebind warning just in case it's DNS related):

Sun Jul  4 22:47:25 2021 kern.warn kernel: [106057.799447] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 22:47:26 2021 kern.warn kernel: [106058.388209] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 22:49:55 2021 daemon.warn dnsmasq[2481]: possible DNS-rebind attack detected: canarywesteu2.westeurope.cloudapp.azure.com
Sun Jul  4 22:50:42 2021 kern.warn kernel: [106254.517113] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Sun Jul  4 22:50:42 2021 kern.warn kernel: [106254.668165] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)

These are the packets in the bottom of my tcpdump session (2 of them, I would expect 4 -- I tried to remove any identifying info, sorry):

22:50:42.142231 04:92:26:xx:xx:xx > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 190: (hlim 1, next-header Options (0) payload length: 136) fe80::xxx:xxxx:xxxx:xxxx > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 6 group record(s) [gaddr ff02::1:ff00:1 to_ex, 0 source(s)] [gaddr ff05::1:3 to_ex, 0 source(s)] [gaddr ff02::1:2 to_ex, 0 source(s)] [gaddr ff02::1:ff00:0 to_ex, 0 source(s)] [gaddr ff02::1:xxxx:xxxx to_ex, 0 source(s)] [gaddr ff02::2 to_ex, 0 source(s)]
22:50:42.293113 04:92:26:xx:xx:xx > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 190: (hlim 1, next-header Options (0) payload length: 136) fe80::xxx:xxxx:xxxx:xxxx > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 6 group record(s) [gaddr ff02::1:ff00:1 to_ex, 0 source(s)] [gaddr ff05::1:3 to_ex, 0 source(s)] [gaddr ff02::1:2 to_ex, 0 source(s)] [gaddr ff02::1:ff00:0 to_ex, 0 source(s)] [gaddr ff02::1:xxxx:xxxx to_ex, 0 source(s)] [gaddr ff02::2 to_ex, 0 source(s)]

Multicast / IPv6 related issue?

Apologies, I forgot the src

OK this time I got only relevant messages I believe.

There were 4 total error messages:

Mon Jul  5 21:47:12 2021 kern.warn kernel: [188844.598275] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Mon Jul  5 21:47:12 2021 kern.warn kernel: [188844.600121] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
[...snipped...]
Mon Jul  5 21:49:49 2021 kern.warn kernel: [189001.491411] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)
Mon Jul  5 21:49:50 2021 kern.warn kernel: [189001.931368] br-san: received packet on eth0.20 with own address as source address (addr:04:92:26:xx:xx:xx, vlan:0)

Meanwhile, this is what I got on tcpdump. I booted the laptop after enabling the VLAN (first 2 messages) and then navigated to ipv6-test.com (two last messages occurred while the page was testing IPv6):

root@second:~# tcpdump -i br-san -evn -Q in ether src host 04:92:26:xx:xx:xx
tcpdump: listening on br-san, link-type EN10MB (Ethernet), capture size 262144 bytes
21:47:12.706814 04:92:26:xx:xx:xx > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 363: (tos 0xc0, ttl 64, id 43398, offset 0, flags [none], proto UDP (17), length 349)
    192.168.2.1.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 321, xid 0x3c6ea960, Flags [Broadcast]
          Your-IP 192.168.2.100
          Server-IP 192.168.2.1
          Client-Ethernet-Address 00:c2:c6:xx:xx:xx
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: ACK
            Server-ID Option 54, length 4: 192.168.2.1
            Lease-Time Option 51, length 4: 43200
            RN Option 58, length 4: 21600
            RB Option 59, length 4: 37800
            Subnet-Mask Option 1, length 4: 255.255.255.0
            BR Option 28, length 4: 192.168.2.255
            Default-Gateway Option 3, length 4: 192.168.2.1
            Domain-Name-Server Option 6, length 4: 192.168.2.1
            Domain-Name Option 15, length 3: "lan"
            FQDN Option 81, length 22: [SO] 255/255 "DESKTOP-XXXXXXX.lan"
21:47:12.708664 04:92:26:xx:xx:xx > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 363: (tos 0xc0, ttl 64, id 43399, offset 0, flags [none], proto UDP (17), length 349)
    192.168.2.1.67 > 255.255.255.255.68: BOOTP/DHCP, Reply, length 321, xid 0x3c6ea960, Flags [Broadcast]
          Your-IP 192.168.2.100
          Server-IP 192.168.2.1
          Client-Ethernet-Address 00:c2:c6:xx:xx:xx
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: ACK
            Server-ID Option 54, length 4: 192.168.2.1
            Lease-Time Option 51, length 4: 43200
            RN Option 58, length 4: 21600
            RB Option 59, length 4: 37800
            Subnet-Mask Option 1, length 4: 255.255.255.0
            BR Option 28, length 4: 192.168.2.255
            Default-Gateway Option 3, length 4: 192.168.2.1
            Domain-Name-Server Option 6, length 4: 192.168.2.1
            Domain-Name Option 15, length 3: "lan"
            FQDN Option 81, length 22: [SO] 255/255 "DESKTOP-XXXXXXX.lan"
21:49:49.601029 04:92:26:xx:xx:xx > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 190: (hlim 1, next-header Options (0) payload length: 136) fe80::xxx:xxxx:xxxx:xxxx > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 6 group record(s) [gaddr ff02::1:ff00:0 to_ex, 0 source(s)] [gaddr ff05::1:3 to_ex, 0 source(s)] [gaddr ff02::1:2 to_ex, 0 source(s)] [gaddr ff02::1:ff00:1 to_ex, 0 source(s)] [gaddr ff02::1:xxxx:xxxx to_ex, 0 source(s)] [gaddr ff02::2 to_ex, 0 source(s)]
21:49:50.040988 04:92:26:xx:xx:xx > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 190: (hlim 1, next-header Options (0) payload length: 136) fe80::xxx:xxxx:xxxx:xxxx > ff02::16: HBH (rtalert: 0x0000) (padn) [icmp6 sum ok] ICMP6, multicast listener report v2, 6 group record(s) [gaddr ff02::1:ff00:0 to_ex, 0 source(s)] [gaddr ff05::1:3 to_ex, 0 source(s)] [gaddr ff02::1:2 to_ex, 0 source(s)] [gaddr ff02::1:ff00:1 to_ex, 0 source(s)] [gaddr ff02::1:xxxx:xxxx to_ex, 0 source(s)] [gaddr ff02::2 to_ex, 0 source(s)]

The laptop was connected to lan or san? It seems to me that it is connected to lan, judging by the options.

No its connected to san, via this SSID on the second router:

config wifi-iface 'wifinet3'
        option device 'radio0'
        option mode 'ap'
        option ssid 'OpenWrtSan'
        option encryption 'psk2'
        option key 'password123'
        option network 'san'

I can also confirm its connected to san since its getting an IP in the 192.168.2.0 network (can also be seen in the DHCP packets above).

1 Like

Looks like some device is retransmiting the dhcp ack back to the original router. So the error message is true positive.
Try to run the tcpdump on the dumbAPs capturing the dhcp packets on the uplink. If you see a couple of them, you've found the culprit.

1 Like

I'm not really sure how to do that :sweat_smile:. There are no other devices in san (or VLAN 20 if that helps) apart from the first router and the third one (which I added after my original message). Everything works fine with the first and the third router, the issue appears right after I enable VLAN 20 to the second router, even though there are no devices connected to its san interface.

Also its not only DHCP packets, I can see some IPv6 multicasts too...

EDIT (12m later):

Just for sanity's sake, I retested this. On my third router which is not an IPQ40xx-based one, I created an SSID in the san network, VLAN 20 and it works perfectly. Laptop takes an IP in the 192.168.2.0 network, speed is awesome, no error messages in the log :frowning:

Same tcpdump command. You can narrow down the interfaces by using a specific one instead of any.
Normally you should see the packet going one-way only. But if on the same interface of e.g the third router you see the packet coming in and then going out again, it means that the router is broadcasting it mistakenly on all interfaces, although it should not.

Could be a bug of that particular architecture. Apart from the errors in the logs it doesn't seem to cripple the performance.

Thanks for the explanation. I tried this on the second and third router and third router and I could see the same packets arrive, while error messages were being logged... I'm sure its the second router that creates issues since it happens right after I enable the VLAN and stop exactly when I disable it. How can I check why that happens? Also it occurs when no devices exists in this VLAN / network.

Unfortunately in my case it blocks DHCP (I've shown it happening in earlier messages)

I am not sure about that, maybe someone else has a clue. However the switch should not send a broadcast out from the same interface it received it.

Maybe it is not directly connected. Do the devices receive the DHCP offers from the router? Do they reply? Does the issue disappear as soon as you leave only 1st router on? I've had a similar issue with banIP some time ago, so give it a try after disabling the firewall first.

All issues disappear when I turn off the second router, or I remove the cable, or I turn off the VLAN. It has been working fine with a third router (Atheros AR9132) for almost a month now. When I enable (option ports '0t 5t') VLAN 20 on the second ipq40xx router, error flood the syslog and some DHCP devices fail to obtain an address (DHCPDISCOVER and DHCPOFFER flood). As soon as I disable (option ports '0t') the VLAN, everything goes back to normal.

I'm pretty sure the second ipq40xx router is replaying some multicast messages back for some really weird reason (bug, I guess) since no other devices are connected to it. I tried disabling the first VLAN and left only VLAN 20 enabled, but the issue still persisted.

I also tend to believe that it is a bug of this particular architecture.
Unfortunately I cannot any further as I don't have any such device.

I had an 200 IQ moment yesterday after my message and moved the cable that comes from the first router. Instead of using the second router's WAN port, I plugged it into LAN1. After all I want it to behave like a switch. After enabling VLAN20 on port 4 (they are swapped in the config) instead of 5 (WAN) it worked fine. I have it running like this since yesterday and everything seems to be going as expected!

I lost an ethernet port, but I can live with that. Should I report this bug somewhere?