Forwarding traffic between two bridges won't work

I took the liberty to correct 'ls' to 'li' in the last cmd :wink:

# ip -4 addr; ip -4 ru; ip -4 ro; ip -4 ro li tab all
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
7: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    inet 10.6.67.1/24 brd 10.6.67.255 scope global br-lan
       valid_lft forever preferred_lft forever
9: br-untrusted: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    inet 10.6.65.1/24 brd 10.6.65.255 scope global br-untrusted
       valid_lft forever preferred_lft forever
11: eth0.2@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
    inet 10.6.66.5/24 brd 10.6.66.255 scope global eth0.2
       valid_lft forever preferred_lft forever
622: openvpn-server: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN qlen 100
    inet 10.1.0.1 peer 10.1.0.2/32 scope global openvpn-server
       valid_lft forever preferred_lft forever
0:	from all lookup local 
32765:	from 10.6.67.0/24 lookup 200 
32766:	from all lookup main 
32767:	from all lookup default 
default via 10.6.66.254 dev eth0.2  src 10.6.66.5 
10.1.0.2 dev openvpn-server scope link  src 10.1.0.1 
10.6.65.0/24 dev br-untrusted scope link  src 10.6.65.1 
10.6.66.0/24 dev eth0.2 scope link  src 10.6.66.5 
10.6.67.0/24 dev br-lan scope link  src 10.6.67.1 
default via 10.6.66.254 dev eth0.2  src 10.6.66.5 
10.1.0.2 dev openvpn-server scope link  src 10.1.0.1 
10.6.65.0/24 dev br-untrusted scope link  src 10.6.65.1 
10.6.66.0/24 dev eth0.2 scope link  src 10.6.66.5 
10.6.67.0/24 dev br-lan scope link  src 10.6.67.1 
local 10.1.0.1 dev openvpn-server table local scope host  src 10.1.0.1 
broadcast 10.6.65.0 dev br-untrusted table local scope link  src 10.6.65.1 
local 10.6.65.1 dev br-untrusted table local scope host  src 10.6.65.1 
broadcast 10.6.65.255 dev br-untrusted table local scope link  src 10.6.65.1 
broadcast 10.6.66.0 dev eth0.2 table local scope link  src 10.6.66.5 
local 10.6.66.5 dev eth0.2 table local scope host  src 10.6.66.5 
broadcast 10.6.66.255 dev eth0.2 table local scope link  src 10.6.66.5 
broadcast 10.6.67.0 dev br-lan table local scope link  src 10.6.67.1 
local 10.6.67.1 dev br-lan table local scope host  src 10.6.67.1 
broadcast 10.6.67.255 dev br-lan table local scope link  src 10.6.67.1 
broadcast 127.0.0.0 dev lo table local scope link  src 127.0.0.1 
local 127.0.0.0/8 dev lo table local scope host  src 127.0.0.1 
local 127.0.0.1 dev lo table local scope host  src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local scope link  src 127.0.0.1 

li and ls are both shortcuts to list. Anyway, you have a rule to use table 200 for traffic from lan, but I don't see any table 200.

Somehow my ip didn't like 'ls', but anyway, the table 200 rule was my fault. I added that rule in my feeble attempts to remedy the original problem, but it didn't seem to make a difference. I've since rebootet to get rid of any other non persistent changes I might have made, and that rule is gone now, but there is still no forwarding between the two bridges.

Ok one last thing before resetting to defaults. Capture the packets between these interfaces to see what is coming in and out. Stop the firewall just in case.
Start a ping from a host in lan to a host in untrusted.
tcpdump -i any -evn "icmp and (host 10.6.67.X or host 10.6.65.Y)"

# tcpdump -i any -evn "icmp and (host 10.6.67.197 or host 10.6.65.216)"
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
22:14:54.889531   P d4:6d:6d:ed:de:fb ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 24286, offset 0, flags [DF], proto ICMP (1), length 84)
    10.6.67.197 > 10.6.65.216: ICMP echo request, id 18771, seq 1, length 64
22:14:54.889573 Out d4:6d:6d:ed:de:fb ethertype IPv4 (0x0800), length 100: (tos 0x0, ttl 64, id 24286, offset 0, flags [DF], proto ICMP (1), length 84)
    10.6.67.197 > 10.6.65.216: ICMP echo request, id 18771, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Is 10.6.65.216 a Windows host?

No, it's an Android phone. I verified that it has the correct gateway set and I can ping it from the router.

Can you try with some other OS that can do some packet capturing? Because what I can see in the tcpdump is that packet is received and sent, but there is no response back.
It is known for a fact that Windows OS is blocking by default traffic from different networks. Maybe it is the same for Androids.

I tried with a regular Linux host, same result. Nothing gets there. Where ever those packets are going, they are not coming out of that bridge.

Try it one more time with the egress interface to clear any doubt. If there is a packet there it means it is on the wire.

tcpdump -i br-untrusted -evn icmp

I've tried that lots of times before. Still nothing there. Verified that the firewall was off. Not that it makes any difference, though.

You can either troubleshoot it more, or take a backup and start from scratch. This is not normal behaviour anyway.

I've backed up my settings and did a factory reset, then manually restored the settings in question one by one. Tried the raw wlan-interface and the bridge with the VLAN ethernet device. No openvpn or any other "advanced" features yet. Still the same result. Absolutely nothing that enters the router on br-lan leaves the router on that interface (either wlan or bridge).

I'm close to giving the term 'hacking' a more literal meaning and getting the axe from the shed...

Which device is this?

Model: Netgear Nighthawk X4S R7800
Architecture: ARMv7 Processor rev 0 (v7l)
Firmware Version: OpenWrt 19.07.2 r10947-65030d81f3 / LuCI openwrt-19.07 branch git-20.057.55219-13dd17f
Kernel Version: 4.14.171

I didn't notice that in the beginning but which port is the CPU?
Do you have a managed switch connected there and you have enabled tagging in vlan1 but not in vlan3?

That would be 6.

Nope, no managed switch in my net.

Anyway, this all doesn't matter, because the problem exists even without my custom vlan configuration, when I just try to forward between br-lan and wlan1-1.

I am not sure if it is connected, but sometimes the devices are not working well with low vlan ids.
Can you try to use higher Vids like this one here?

I changed vlan ids 2 and 3 to 200 and 300, respectively (vlan id 1 wasn't changed in the example). No visible effect.

As I said, the problem was already there when I hadn't bridged wlan1-1 with ethX.Y and was using only wlan1-1 with no mods to the switch config at all.

I am out of ideas unfortunately :frowning: