(alternate title: I know it sounds ridiculous, but bear with me)
Summary:
- Since switching ISPs, when a single particular device on my home network accesses a single particular website, anything downstream of the openwrt router struggles with internet access.
- This happens even if the device is upstream of the openwrt router.
- This is very repeatable and has happened 100% of the time, although the symptoms and severity of the issue do vary.
In detail:
My control setup, ie before my ISP switched, when everything worked and life was good:
devices ----- external switch ----- router ----- docsis modem
My instance of openwrt acts only as a router with wifi disabled. An AP is connected to the switch for wifi needs, so the router only has one lan downstream and one wan upstream connected. The model of the router is a GL.iNet GL-B1300 running OpenWrt 22.03.3. The ISP provided a 130/20 service.
After I switched ISP:
devices ----- external switch ----- router ----- fibre ONT
My new ISP offers fibre and I'm on a 150/150 package behind CGNAT. As I'm a naive soul, as soon as the installers left I disconnected the provided Linksys WHW03 V2 and reconnected the B1300 and the rest of my LAN. Things were fine... until the complaints started. The internet became slow, or rather laggy, with browsers stalling on request and video streaming becoming choppy, at seemingly random points in the day. My healthcheck.io pings from the B1300 started failing too.
Customer services suggested I put the Linksys back but since I needed the B1300 on the LAN I went for a double NAT setup:
devices ----- external switch ----- B1300 ----- velop ----- fibre ONT
This worked... until it didn't. I finally realised that the problems only occur when a user accesses a particular website to watch their Turkish dramas: https://osmanonline.co.uk/v11.
But this is where things get weird. When accessing the website from other devices on the LAN (wifi or otherwise), everything works fine. The only device that takes out the LAN is a Mi Max 2. I've suffered watching an episode for an hour on my Pixel 6a and everything remains stable.
So having the Linksys router in the chain actually makes no difference, but as an experiment, I connected the problematic device directly to the Linksys, and so upstream of the B1300. This resulted in the following quite frankly crazy results:
- The Mi Max 2 can now watch streams via the given website with no interruptions.
- Anything downstream of the B1300 still gets taken out in the same way.
Symptoms:
I've raced to my ethernet connected PC to run tests directly on the B1300 when problems arise. I have a curl running every 15 seconds or so:
curl -s http://ipv4.wtfismyip.com/text
And with my test just now as of writing I saw the following logged, which indicates that the issue would last for around 1-2 mins.
Tue 28 Mar 17:35:42 BST 2023 xxx.xxx.xxx.xxx
Thu 30 Mar 00:13:09 BST 2023
Thu 30 Mar 00:14:51 BST 2023 xxx.xxx.xxx.xxx
At first I thought pings just didn't work, but the actual problem seems to be DNS resolution. Resolution appears slow, and sometimes at the worst times I see this:
root@router:~# nslookup euronews.com 1.1.1.1
nslookup: read: Host is unreachable
Server: 1.1.1.1
Address: 1.1.1.1:53
I can't remember the exact behaviour of pings, but I think generally those using IP addresses succeed, while those using names don't even start:
root@router:~# ping euronews.com
ping: bad address 'euronews.com'
root@router:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=54 time=3.361 ms
64 bytes from 1.1.1.1: seq=1 ttl=54 time=3.978 ms
Applications that use IP addresses, or perhaps have cached a resolution, work fine (for example pings that were already running still continue to run fine), although video that was already running gets choppy which I would have thought had nothing to do with DNS once running.
Internal name resolution also works fine.
I have logread -f
running and don't see anything awry in the logs. Neither do I see any heavy memory or CPU usage. The times I've caught my ISP's technical support on the phone, they don't see any issues on their side - and indeed there isn't any for devices upstream of the B1300. If any wifi devices connect to the Linksys during these problematic windows, they regain fast and responsive internet access.
I can't help but feel the issue is upstream. It's not the Linksys, as the issue occurs even when it's not present (I've kept it there since it serves as an escape route for Turkish dramas). None of this makes sense to me and my flawed and easily dismissible pokes in the dark include:
- The Mi Max 2 is somehow broadcasting bad packets or UDP thingies that are making their way downstream.
- The website is somehow getting my IP throttled from DNS servers.
- Something about the device and website is messing with the CGNAT.
- This is all a crazy dream and not really happening.
To be honest I'm not expecting any immediate answers here (apart from banning Turkish dramas or replacing the Mi Max 2), and if this really is as weird as I think it is I would like to know what can be done to diagnose this. The Linksys router can be flashed with OpenWrt (snapshot), and as a next step I'm tempted to try that to see if it introduces the same issues for devices connected to it.
It's definitely a puzzle, and I'll be updating this post with any further information as I find it.