Linksys wrt3200 latency spikes & request timed out

yeah, I'm not sure about the frequency scaling either. I found only a read-only file:

~# find /sys -name "*gov*"
/sys/devices/system/cpu/cpuidle/current_governor_ro

~# cat `find /sys -name "*governor*"`
ladder

so I'm guessing you can't change it, but also, I'm guessing that more people would complain about wired routing and usually they just complain about the wifi on these devices.

EDIT: perhaps there's something that was introduced in recent versions. try downgrading to something like 18.06.5 https://downloads.openwrt.org/releases/18.06.5/targets/mvebu/cortexa9/ and then if that doesn't work, try something like 18.06.3

if that doesn't work... :frowning:

@dana44 That's really interesting, do you run all your devices over wifi or experience it with cable as well?

@dlakelan I do my best not to speculate as I feel that I'm not knowledgable enough to do so, but I can swear that it worked flawlessly until some time between 18.04 and 18.06. It's really subtle and for the longest time I thought it was my computer (even reinstalled from win7 to 10 thinking it could be related), then started reading about sqm and bufferbloat (which on the positive side has learned me a lot thanks to you, @moeller0 and many others on this forum), and lastly I was sure it was due to running wireguard at the time when the problems started. It's all so very confusing

Edit: Oh it's a great idea actually about the downgrade, I just wonder at what version to start. I see now that I incorrectly wrote 18.04, I must be thinking of 18.06.4. Somewhere between 18.06.4 and 18.06.6 I'm pretty confident that the problem started to occur

The releases were numbered 17.01 and 18.06 there was no 18.04, though there was 18.06.4

Try backing off to 18.06.3 which was released Jun 29 2019 and see if it works ok. if it does this is useful info for developers.

I do not believe this is a bufferbloat issue. it's a processor freeze, missed interrupts, or a switch disconnect or some thing like that

1 Like

Hmm, already around 10 minutes of running 18.06.3 :thinking:

:cry: :thinking:

The thing I think you could try if you have the energy for it is to search for which version does this first occur... For example you might try 18.06.1, and 17.01.5

If you have to go back before 17.01.4 (which was from 2017) then it's probably not worth it.

I don't know much about what has changed on this device at what time points / versions. It's becoming more of a hardware debugging problem than a latency / QoS type problem, which isn't my specialty at all :wink:

If you decide you want to replace this device you might be interested in the testing I did on the RPi4. You could potentially keep this device as an AP and put one of those in as a router. You could even configure the switch on this WRT device to provide the necessary VLAN tagging stuff.

the WRT32x/3200 was a device with a lot of promise ruined by bad manufacturer support, and now marvell has sold off this division... It might be time to give up on it. I plan to scrap mine (operating as an AP only) once enterprise APs with WiFi 6 are available.

Maybe my router has simply become broken? :thinking:

That wouldn't explain why you get hours of good results from the manufacturer firmware.

I have a sneaking suspicion that it's a switch bug when operating in 100Mbps mode that only affects a smallish number of people because most people have 1Gig ports these days. If you had a switch we could test it... I'd suggest the tp-link sg108e as a cheap choice with features you could use if you decide to replace your router with something like a Pi4 or an x86... for people with gaming demands who want QoS features I recommend wired routers + separate wifi AP anyway.

I have a WRT32x operating as an AP connected to a gigabit switch, and I'm running mtr on it now, I'll let it run for 1000 pings or so and see if I get any similar results. It's running 18.06.2 I tried upgrading to 19 series but had some issues so I'm waiting for the next point release... since you experience this even down in the 17 series if there's a problem under my own conditions, I should see it.

So far at 300+ pings no weird outliers like that.

1035 pings and no outliers.... so that explains why the forum isn't filled with people with pitchforks... It'd be really interesting if putting a switch between your "wall" plug and the WRT device fixed it.

Thank you so much for taking of your time to try with your own and of the same hardware. I tried to revert back to Linksys stock firmware directly after posting from the 17.01 build and so for I'm at 3300 pings with "worst" registered as 373ms at one of the later switches/nodes down the line (I don't know if this is normal?). In any case I haven't noticed any freeze/spike in game

Edit: Never mind, it just jumped to 2000ms with the stock firmware now also :open_mouth: I'll try to disconnect the router completely again and run directly to the wall, it sounds more and more that it's the routers built in switch that you talked about. Or if the problem continues now with just ethernet cable I guess it's flapps from my ISP?

Yeah, your ISP could have a bad cable / faulty hardware at that hop! That would explain all of the observed behavior. When it was only in OpenWrt not in stock firmware, that wasn't an option, but if you see the behavior in stock firmware, and it always happens at that particular hop, it's looking like your ISP should go swap out that cable and / or do some investigation of other hardware issues in that server cabinet.

Increased RTT on intermediary hops a long a network path are not important and most likely caused by ICMP rate limiting and/or ICMP prioritization (see https://archive.nanog.org/sites/default/files/10_Roisman_Traceroute.pdf for information on how to interpret traceroute/mtr results).

Yes, both decent hypotheses worth exploring.

I think it's almost safe to rule out the ISP flapps hypotheses. The stats below is with computer directly connected to the "internet-socket" on the wall.

> 
> Ping statistics for 216.58.207.238:
>     Packets: Sent = 9757, Received = 9757, Lost = 0 (0% loss),
> Approximate round trip times in milli-seconds:
>     Minimum = 10ms, Maximum = 24ms, Average = 10ms
WinMTR Statistics

WinMTR statistics

Host % Sent Recv Best Avrg Wrst Last
Request timed out. 100 1960 0 0 0 0 0
Request timed out. 100 1960 0 0 0 0 0
a258-gw.bahnhof.net 1 9778 9777 0 0 112 0
ume-ftp-dr1.svl-cr1.bahnhof.net 0 9782 9782 4 4 16 4
svl-cr1.sto-cr1.bahnhof.net 0 9783 9783 10 10 17 10
sto-cr1.sto-ixa-er1.bahnhof.net 0 9782 9782 10 10 26 10
72.14.211.124 0 9783 9783 10 10 24 10
Request timed out. 100 1960 0 0 0 0 0
arn09s19-in-f14.1e100.net 0 9783 9783 10 10 17 10

As a pure guess, how likely would it be that EEE is the problem and the need to only buy a switch versus the probability that it's the routers switch issue and start looking at Pi 4 and a switch?

Oh and @dlakelan it wasn't at a particular hop when it spiked to 2000ms, it showed up at all hops including in my "cmd ping google maximum stats" at the desktop computer

well, it starts at that 3rd hop and then all hops thereafter, suggesting that the packets don't get past that hop...

if that one hop stopped responding, while all the rest continued, it could be just because it doesn't like to respond to ping... but it still routes your packets... but in your case, it blocked all further progress of all packets... meaning either your ISP has a problem at that hop, or your router itself does.

1 Like

Aha yes I see! Does it somehow relate to that at times I don't notice the super high spikes (but have them recorded at the tests), while at other times it shows very clear freezing a second or two during games, and in last cases it freezes for 4-8 seconds (which seems more to correlate to the "destination host unreachable" and "request timed out" - from my very amateur observation at least).

Regarding the few router units that had the faulty "switch board", the store actually sold wrt3200 at a pretty hefty discount for a short period when I bought it :joy: But still I bought it in February 2018 and I think the model had been out for quiet a while then already? I'm thinking again if EEE could possibly be the problem or not?

Given that the first two hops do not respond at all (100% packet loss) I think it is safe to assume that the source of the delays is the router's uplink...

Well, that still leaves interferences between the switch ports of your router and the ISPs switch...

Well that is the pattern you see, if one hop truly introduces latency, an increased delay to all hops after that one.

The "request timed out" at the first 2 hops can't be related to my apartment being connected to a "open net/city network"? That is, I can choose among many different ISPs and the "apartment switch" isn't owned by bahnhof, but by a third-party company. So I'm guessing that "a258-gw.bahnhof.net" is further away in the link, but now I'm just speculating out of the blue.

Edit: But yea I'm guessing it shouldn't show 100% loss anyway?

Makes sense, also it makes sense for the city net to have their gear not respond to ICMP echo requests so that you keep your support questions with the ISP; in your case though that might not be ideal, as the issues seem to be between you and your ISP so the "city network" components might be involved.

I interpret that as an indicator that the administrator of those boxes configured them to not respond to ICMP requests, and in that case a 100% loss is to be expected.

Okay, so just so I understand, the last WinMTR + ping in CMD that I posted was with desktop computer directly to the "internet-socket" on the wall. Is it the "Worst 112" spike that makes you think this is all an ISP issue? Also on the "cmd ping" this spike isn't shown at all as maximum is registered say "24ms".

Compared to all other tests I've ran, this and the previous one without the router looks flawless to me in when put against the router being connected. What I'm trying to get at is if by almost 10,000 packets, one spike at 112ms, would ISPs even label that as flapps? I still can't shake the feeling that it can still be related to the router, but if you say otherwise I'll of course listen to you and contact my ISP. Also I hope this doesn't go off-topic even though the problem isn't directly related to Openwrt firmware, I'm really outside my knowledge-base for how to diagnose this issue alone :pensive:

No that MTR and ping result looks just fine, none of the intermediary hops guarantee to send ICMP responses at all, so the first thing to look at is the response pattern of the final destination:

and that is excellent. But this is with with your PC hooked up directly... So when you connect via your router the spikes appear, indicating that either the router is broken or that the router's switch ports and your city net's ports do not play nice with each other...

Aha, yes now I get it and understand what you meant with

And I'm guessing the only way to find out the answer to the quotes below is to either
a) buy a switch (and hope it's a problem with energy efficient Ethernet as dlakelan said)
b) buy a switch and a Pi 4 and only use the wrt3200 as an AP
c) buy a new router

I really do like Openwrt and it's interface, plus it's super easy to setup the wireguard client in the GUI, so I'm guessing to buy a new router may be the way to go?

Edit: plugged in the router running the stock firmware again just for fun (and gave it a try to move the routers power supply from a power strip to a power socket located on the wall) and within a few minutes it shows this

WinMTR Statistics

WinMTR statistics

Host % Sent Recv Best Avrg Wrst Last
Linksys04580 0 1083 1083 0 0 2 0
Request timed out. 100 220 0 0 0 0 0
Request timed out. 100 220 0 0 0 0 0
a258-gw.bahnhof.net 1 1071 1065 0 1 1006 1
ume-ftp-dr1.svl-cr1.bahnhof.net 1 1070 1064 4 5 1011 4
svl-cr1.sto-cr1.bahnhof.net 1 1070 1064 10 11 1017 10
sto-cr1.sto-ixa-er1.bahnhof.net 1 1070 1064 10 11 1017 10
72.14.211.124 1 1070 1064 10 11 1017 10
Request timed out. 100 220 0 0 0 0 0
72.14.238.12 1 1070 1064 11 12 1018 11
108.170.254.54 1 1070 1064 10 12 1017 12
108.170.253.177 1 1070 1064 11 12 1018 11
209.85.242.99 1 1069 1063 11 12 1018 11
arn09s19-in-f14.1e100.net 1 1070 1064 10 11 1017 10

The Pi runs OpenWrt, and that's a super popular download at the moment so lots of people seem to be doing that... not that you should definitely get a Pi but don't choose something else because you think the Pi can't do OpenWrt.

I suspect you have a hardware fault somewhere... since it doesn't happen with your laptop plugged direct to the wall... I'd say it's the WRT device. Only problem with using it as an AP is that it probably would still have the problem and now WiFi clients would be subject to the problem, while wired wouldn't...

I think if I were you I'd be ditching the WRT, buying a switch and a Pi4 and shopping around for access points.

Another option if you have relatively low speed internet (say below 150Mbps) is something from gl-inet with lightly customized OpenWrt installed out of the box.

2 Likes