But I noticed one difference, I got rid of the bridge for LAN (thought it is not required). I did enable this bridge now again for LAN with just one interface eth0. And I did the same now for EXTERN with wlan0-1. So time for another test round...
Thanks jeff and tomtom Could one of you please post your switch and switch_vlan interface/network settings for reference?
I don't know if netifd will auto-associate a wireless interface with something that isn't a bridge (EXTERN). Relying on the name being wlan0-1 is, in my experience, not robust. The output of
should show that the wireless is associated with the bridge. Working with the "raw" wireless interface is likely possible with manual configuration, but may not be straightforward with LuCI, both as far as managing the interface goes, as well as associating it with firewall zones.
(As a matter of personal style, I use lower-case interface names. It is LuCI that, I believe, capitalizes them.)
Checking that the firewall zones associated with WIFI and EXTERN allow forwarding to WAN would be a good step to take.
(My config isn't going to be very helpful as I run a different IPQ4109-based device as a "dumb AP" -- no forwarding, only bridging).
Did some tests, but adding a bridge to each interface lan and EXTERN didn't help, Youtube dropped out my router as before after some time (20 min, reboot and then 3 min)
bridge name bridge id STP enabled interfaces
br-lan 7fff.f0b0147c80a1 no eth0
br-WIFI 7fff.f0b0147c80a3 no wlan0
br-EXTERN 7fff.f2b0147c80a3 no wlan0-1
Looks ok to me, firewall zones are correctly assigned and seem to work fine too for my wireless networks (at least until the drop out).
I assume I could try to follow your naming convention and I could name my firewall zones identically to the interfaces, but I do not believe this will change something.
As I have currently 2 routers around, I made the observation, that my router will drop out only when there are multiple devices connected, and I play Youtube videos via lan. If there is no device attached except the machine playing videos via lan, I couldn't reproduce the issue.
It’s sounding like it may be an ARP-table problem or something route-related. If you have multiple, interconnected switches, I’d enable STP on all of them. Past that, I would start looking at ARP tables and packets and their flow at the various nodes.
I do have the Fritz!Box 2 times now, both are connected directly to the DS-lite cable router. On one Fritz!Box my complete home network is attached, there I can reproduce the internet drop out if playing Youtube videos via lan. On the second one I just have a single machine attached and I cannot reproduce the issue.
As I do not have full control over the cable router, I might end of with double NAT for IPv4 connections, they didn't cause an issue in the past. But I will look at the config.
If you're a customer of "Unitymedia" you can check your connection type
after logging in their customer portal, mine is IPv4, which allows me to
operate the "connect box" in bridge mode. (modem only, no routing)
If your connection type is DS-Lite then you surely do double NAT.
I do have a DS-lite connection from Unitymedia with an extremely limited connect box router (firewall is off). I've delegated an IPv6 prefix to my OpenWrt router. This worked fine for years with the Buffalo router. After a lot of testing I still see no root cause why the Fritz!Box 4040 drops internet by just playing some Youtube videos via lan. I've removed now the interfaces for WIFI and EXTERN so there is now just one bridge with lan and the 2 wireless networks (both disabled for now), maybe that reduces the risks of an routing issue of IPv6. At this moment I believe it is not related to IPv4, as my second Fritz!Box doesn't get an IPv6 prefix delegated (PD) and that might be the reason why it seems to work more stable. But might be just coincidence, will run more tests.
Short update. I've tried now an openwrt development snapshort and it dropped as well the internet connection after 40 min playing a music youtube video. Even when testing multiple different configurations I was not able to find any cause for this issue. I assume that this might be a driver issue. But I have no clue how to verify this.
When downloading a bigger file with wget I get these internet drop outs as well. I just had to download in the last days several times an Arch Linux iso image and in many of these downloads the internet and connection to the router stopped completely again.
Last week the Unitymedia Connect Box got replaced with the new Docsis 3.1 Vodafone Station. This thing is a total disaster, it didn't allow prefix delegation and disallow the firewall is only possible for 24h only. So my complete network was not usable anymore and I had to replace this thing with a own Fritz!Box 6591. The setup was a little complicated for IPv6, prefix delegation and an exposed host, but at least now my complete network works again.
But sadly, even with this new Fritz!Box 6591 the situation is still the same... OpenWrt on the Fritz!Box 4040 is still creating the same internet drop outs!
Out of curiosity, I bought now a Linksys EA8300 that has a Qualcomm IPQ4019 instead of the IPQ2018. After flashing OpenWrt I only made sure that the new router got as well the IPv6-PD prefix delegated to the WAN6 interface and that the LAN interface got it's IPv6 address too.
But when playing the mentioned YouTube video, the Linksys OpenWrt router sadly went down very fast, no difference! The internet connection via a pc connected with a lan cable and IPv6 was broken as before. After doing many tests, I assume YouTube (or googlevideo) must open the socket via IPv6, only then the internet connection will go down.
I assume this is a IPQ40xx diver issue or OpenWrt doesn't do IPV6 reliable.
I do not have to delete the WAN6 interface, it is enough to make sure that the desktop with a lan connection is using "IPv4 only" to avoid this issue. I do see this issue only when IPv6 is enabled or with "IPv6 only", but as I use DS-lite disabling IPv4 is not an option.
But I understand most users will use IPv4 without manually making sure IPv6 works fine and therefore are mostly not affected by this bug.
I can basically confirm this, though some details differ.
I have DS-lite as well. Luckily, my OpenWrt/4040/19.07.02 seems to be a bit more stable than Rainer's most of the time. I usually experience arbitrary hangups every one or two days. On one day, several hangups occurred shortly after each other, but this happened only once. The hangups are not related to high load, they happen "out of the blue".
Except for once, though, I cannot access the box using WiFi. (I could once, but this didn't provide any insights.) I've tried the "standard" switch configuration and others. Especially, I've tried a configuration without WAN-Port/eth1, i.e. I've put the upstream link in a VLAN of its own. Doesn't help.
I haven't tried turning off ipv6 yet, though. I have just found this thread, maybe I'll give it try later.
I don't have any diagnostics to contribute. I've kept a "logread -f" running in a terminal, doesn't show anything when the box goes down. I've had collectd gather some values (okay, that 's far from "real-time" with one sample every 10s) and there isn't anything that "announces" the hangup beforehand.
As I've been sick and tired of the instabilities of OpenWrt on two Fritz!Box 4040 and the Linksys EA8300, I thought a give the ZyXEL NBG6817 a try and wow this thing just works stable as I was used to from the old Buffalo WZR-HP-AG300H. I can only assume that this issue might be related to the amount of RAM. The unstable devices had just 265 MB and I could get them down easily by just watching a YouTube video, downloading somehting with wget, git or Steam. The ZyXEL has 512 MB and I think this is the reason why it is so stable (but as it has another chipset, it might be related as well). The old Buffalo router was stable from my point of view because it could not download more than 25 MB/s with its weak CPU.
I wasted a lot of time, money and nerves, finally the ZyXEL is the solution for me and I can say after 2 weeks uptime that it is worth every penny.
We are running dual-stack OpenWrt 21.02 and have similar issues. So the described commit does not fix the situation. Maybe we should completely turn off segmentation (so ipv4 segmentation)? Any idea how to debug?