Help! Turkish Dramas Kill My Network

(alternate title: I know it sounds ridiculous, but bear with me)

Summary:

  • Since switching ISPs, when a single particular device on my home network accesses a single particular website, anything downstream of the openwrt router struggles with internet access.
  • This happens even if the device is upstream of the openwrt router.
  • This is very repeatable and has happened 100% of the time, although the symptoms and severity of the issue do vary.

In detail:

My control setup, ie before my ISP switched, when everything worked and life was good:

devices ----- external switch ----- router ----- docsis modem

My instance of openwrt acts only as a router with wifi disabled. An AP is connected to the switch for wifi needs, so the router only has one lan downstream and one wan upstream connected. The model of the router is a GL.iNet GL-B1300 running OpenWrt 22.03.3. The ISP provided a 130/20 service.

After I switched ISP:

devices ----- external switch ----- router ----- fibre ONT

My new ISP offers fibre and I'm on a 150/150 package behind CGNAT. As I'm a naive soul, as soon as the installers left I disconnected the provided Linksys WHW03 V2 and reconnected the B1300 and the rest of my LAN. Things were fine... until the complaints started. The internet became slow, or rather laggy, with browsers stalling on request and video streaming becoming choppy, at seemingly random points in the day. My healthcheck.io pings from the B1300 started failing too.

Customer services suggested I put the Linksys back but since I needed the B1300 on the LAN I went for a double NAT setup:

devices ----- external switch ----- B1300 ----- velop ----- fibre ONT

This worked... until it didn't. I finally realised that the problems only occur when a user accesses a particular website to watch their Turkish dramas: https://osmanonline.co.uk/v11.

But this is where things get weird. When accessing the website from other devices on the LAN (wifi or otherwise), everything works fine. The only device that takes out the LAN is a Mi Max 2. I've suffered watching an episode for an hour on my Pixel 6a and everything remains stable.

So having the Linksys router in the chain actually makes no difference, but as an experiment, I connected the problematic device directly to the Linksys, and so upstream of the B1300. This resulted in the following quite frankly crazy results:

  • The Mi Max 2 can now watch streams via the given website with no interruptions.
  • Anything downstream of the B1300 still gets taken out in the same way.

Symptoms:

I've raced to my ethernet connected PC to run tests directly on the B1300 when problems arise. I have a curl running every 15 seconds or so:

curl -s http://ipv4.wtfismyip.com/text

And with my test just now as of writing I saw the following logged, which indicates that the issue would last for around 1-2 mins.

Tue 28 Mar 17:35:42 BST 2023 xxx.xxx.xxx.xxx
Thu 30 Mar 00:13:09 BST 2023
Thu 30 Mar 00:14:51 BST 2023 xxx.xxx.xxx.xxx

At first I thought pings just didn't work, but the actual problem seems to be DNS resolution. Resolution appears slow, and sometimes at the worst times I see this:

root@router:~# nslookup euronews.com 1.1.1.1
nslookup: read: Host is unreachable
Server:		1.1.1.1
Address:	1.1.1.1:53

I can't remember the exact behaviour of pings, but I think generally those using IP addresses succeed, while those using names don't even start:

root@router:~# ping euronews.com
ping: bad address 'euronews.com'
root@router:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=54 time=3.361 ms
64 bytes from 1.1.1.1: seq=1 ttl=54 time=3.978 ms

Applications that use IP addresses, or perhaps have cached a resolution, work fine (for example pings that were already running still continue to run fine), although video that was already running gets choppy which I would have thought had nothing to do with DNS once running.

Internal name resolution also works fine.

I have logread -f running and don't see anything awry in the logs. Neither do I see any heavy memory or CPU usage. The times I've caught my ISP's technical support on the phone, they don't see any issues on their side - and indeed there isn't any for devices upstream of the B1300. If any wifi devices connect to the Linksys during these problematic windows, they regain fast and responsive internet access.

I can't help but feel the issue is upstream. It's not the Linksys, as the issue occurs even when it's not present (I've kept it there since it serves as an escape route for Turkish dramas). None of this makes sense to me and my flawed and easily dismissible pokes in the dark include:

  • The Mi Max 2 is somehow broadcasting bad packets or UDP thingies that are making their way downstream.
  • The website is somehow getting my IP throttled from DNS servers.
  • Something about the device and website is messing with the CGNAT.
  • This is all a crazy dream and not really happening.

To be honest I'm not expecting any immediate answers here (apart from banning Turkish dramas or replacing the Mi Max 2), and if this really is as weird as I think it is I would like to know what can be done to diagnose this. The Linksys router can be flashed with OpenWrt (snapshot), and as a next step I'm tempted to try that to see if it introduces the same issues for devices connected to it.

It's definitely a puzzle, and I'll be updating this post with any further information as I find it.

This doesn't sound like something OpenWrt related... but there is a key piece of information that is missing here (or if it was included, I missed it in your writeup)...

  • What happens if you remove the OpenWrt device entirely? Do you still experience the same issues?

It seems to me that the issue is strictly related to the Mi Max 2 device, which means that whatever it is doing, it's causing the problems. Maybe it is opening too many connections and overwhelming the NAT tables (possibly at the CG-NAT layer)? In the double-NAT situation, the OpenWrt lan would be firewalled relative to the upstream, so the Mi device cannot broadcast anything that would make it to the lan itself, although it could produce a bunch of unnecessary traffic at the OpenWrt WAN.

It's tricky (read: inconvenient) to take openwrt out of the equation this since a lot of my LAN relies on some static DHCP leases and local name resolution. I might stop it from routing as a next step, although the fact that it was the b1300 that couldn't resolve names indicated to me it was an issue there - not necessarily with openwrt, but with its configuration.

After writing up and marinating on it, I realised that:

  1. The website/device combo is somehow messing something up DNS related upstream, regardless of how it connects.
  2. The Linksys doesn't suffer from this fallout as it's using different nameservers.
  3. Given that's it's unlikely I'm crashing cloudflare or quad9, the implication is that the CGNAT is hijacking or routing all DNS requests strangely.

So my next step is to change the DNS servers on the b1300 to the same as the Linksys. Actually one other thing to try is name resolution outside of the router.

Is there a way to determine that 1.1.1.1:53 is actually taking me to cloudflare?

Because you already identified that, I would put the misbehaving device under control of the OpenWrt router, and enable/configure SQM (QoS).

If you want to know where that DNS is pointing you at, What about configuring it as your only DNS on Windows, and running nslookup?

And if you want to know the intermediate steps, there is traceroute (or tracert).

I would start by putting the mimax2 behind the openwrt router and get a packet capture during "drama streaming" and start poking in there to see whether anything odd is happening... I would also try pinging 1.1.1.1 from a different machine while streaming.
Finally are the streaming site of the mimax2 in anyway related to your ISP?

dnsleaktest.com is a nice place to check which dns-servers you're using.

2 Likes

A DNS leak test can give you some degree of certainty, but cannot prove that DNS requests have not been modified or subjected to content filtering.
They may preserve the destination but inspect or alter the content, unless you utilize DNS encryption.

2 Likes

I did actually try with SQM on and off, and it didn't make a difference - the turkish dramas are not saturating my line either.

So some progress:

  1. During the problematic times, running nslookup against the Linksys router works perfectly.
  2. Hence after removing any custom DNS servers from the B1300, the rest of the downstream LAN also become immune from the problem.
  3. At this point there's no need for the Linksys, so that's been removed - and the DNS server I now get from ONT has an ip address that belongs to my ISP.

So a much happier state of affairs, provided I don't want to use alternative DNS servers - nslookup against cloudflare still fails. At this point then the exercise becomes academic, but I'd still like to crack it.

Reverting back to 1.1.1.1 as my WAN DNS and running dnsleaktest.com:

Query round	Progress...	Servers found
1		......		5
2		......		5
3		......		5
4		......		4
5		......		4
6		......		4

IP 	Hostname 	ISP 	Country
141.101.70.168 	None 	Cloudflare 	London, United Kingdom
141.101.70.170 	None 	Cloudflare 	London, United Kingdom
172.70.161.116 	None 	Cloudflare 	London, United Kingdom
172.70.161.117 	None 	Cloudflare 	London, United Kingdom

etc

Which looks okay.

DNS fails in the same way everywhere (ie when not using the ISP's server).

I think this is the next real step - how would I do this in a useful way? I presume the video would make packet capture noisy?

I suppose the turkish video streaming is cyphered, so if you capture packets involving the mimax2 IP address during the playback, you will have to filter the many video packets out, and focus on what other traffic can be affecting your entire network. For instance, what other sites the other DNS was pointing the mimax2 at, and try to understand why this affected the whole network.

So if I understand correctly, your ISP seems to intercept DNS to 1.1.1.1 while streaming from osmanonline.co.uk? Do you see the same issue with any other non-ISP DNS (8.8.8.8, 9.9.9.9)?

However since you only see this with the mimax2 as client, it might be that the mimax2 is actually doing something nefarious here? Now, if you stream anything else on that device, or stream http://osmanonline.co.uk from other devices you do not see the issue, correct?

Or that "poison" you DNS servers cache somehow (pulling at straws here obviously, that seems not like a normal or expected thing, unless that mimax2 has been hacked and is not obeying your command veridically anymore).

If that would be the case would we not expect using the upstream linksys would result in the same problem? This same argument also implies that it is not your ISP playing DNS games here...

I would certainly have a close look at that device, something uncouth is happening here which might be a sign of deeper problems...
Side-question, you always access osmanonline.co.uk via a browser, and not via an app on mobiles? If the latter are all mobiles using the same version of that app, and can you confirm the app is "legit"?

Thanks for testing, this was a long-shot anyway, as SQM really does not interact with DNS in any significant way....

OK.

When you say "any" which did you try?

Good that you found a work-around, but I would not accept that as a solution, there is something quite fishy going on here.

The video should be to a single or small set of IP addresses and hence easy to ignore (it will also transport a lot of data so it should be easy to identify and ignore such flows.

I would look at DNS packets, DHCP packets, and potentially ARP packets as well just to be on the save side.

Try setting up DoT as it does not depend on UDP.
All UDP traffic can be affected by protocol-based traffic shaping.

@moeller0

Thanks for the comprehensive analysis. I had 1.1.1.1 (Cloudflare), 9.9.9.9 and 149.112.112.112 (Quad9) set up as my DNS servers. The website is via chrome, not an app. The other clients I tested with use script blocking and the like, which might explain why the mimax was the only device that causes the issue.

I don't know if my ISP is intercepting DNS requests or not. My current suspicion is that the website causes some issue that stops me from making DNS requests (maybe it's a throttling measure?):

(website makes spurious DNS requests to ISP) + (ISP DNS forwarding requests to upstream DNS) + CGNAT = some kind of temporary blacklisting of my shared IP

As this happens even if the mimax isn't using 1.1.1.1, the issue seems upstream of the ISP's DNS server, so it could be pretty serious, especially if it's causing issues for other users behind CGNAT. However since the ISP's DNS server remains unaffected, most users (who would have assigned a server via DHCP) won't even notice.

Since posting I have been taken out of behind CGNAT am no longer able to recreate the issue (which is evidence in itself!), which means that unfortunately this investigation ends here... but thanks to everyone who had a look!

2 Likes

Yepp, that looks like negative interference with CG-NAT... I just wish that IPv6 deployment will speed up so we can relegate CG-NAT to a few legacy use-cases, one day...

2 Likes