Linksys wrt3200 latency spikes & request timed out

dmesg

[   15.163458] nf_conntrack version 0.5.0 (8192 buckets, 32768 max)
[   15.170903] mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
[   15.177901] mwifiex_sdio mmc0:0001:1: WLAN FW is active
[   15.189436] Bluetooth: RFCOMM TTY layer initialized
[   15.194372] Bluetooth: RFCOMM socket layer initialized
[   15.199549] Bluetooth: RFCOMM ver 1.11
[   15.203722] mwifiex_sdio mmc0:0001:1: Unknown api_id: 4
[   15.206439] xt_time: kernel timezone is -0000
[   15.215398] PPP generic driver version 2.4.2
[   15.220085] NET: Registered protocol family 24
[   15.225273] kmodloader: done loading kernel modules from /etc/modules.d/*
[   15.264991] mwifiex_sdio mmc0:0001:1: info: MWIFIEX VERSION: mwifiex 1.0 (15.68.7.p189)
[   15.273148] mwifiex_sdio mmc0:0001:1: driver_version = mwifiex 1.0 (15.68.7.p189)
[   16.517331] mvneta f1034000.ethernet eth0: configuring for fixed/sgmii link mode
[   16.524862] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   16.530779] mvneta f1034000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[   16.538656] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   16.547346] br-lan: port 1(eth0.1) entered blocking state
[   16.552799] br-lan: port 1(eth0.1) entered disabled state
[   16.558341] device eth0.1 entered promiscuous mode
[   16.563169] device eth0 entered promiscuous mode
[   16.569828] br-lan: port 1(eth0.1) entered blocking state
[   16.575270] br-lan: port 1(eth0.1) entered forwarding state
[   16.580937] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
[   16.593585] mvneta f1070000.ethernet eth1: configuring for fixed/rgmii-id link mode
[   16.601733] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[   16.607747] mvneta f1070000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
[   16.620115] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   17.609963] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready

EDIT: Also I can't post any more message for the next 8 hours apparently due to my new account :slight_smile:

EDIT2: Oh and @dlakelan I'm not ignore your proposal/messages, it's in fact when I tried to answer you that the post restriction kicked in. What I wrote quoting your message about reassigning the Wan to a different port was "Sorry if it's a stupid question but how do I do that? Is it under Switch and change the vlan tags somehow?"

Also I've ran my desktop computer for an hour with an ethernet cable directly to the ethernet wall socket and haven't noticed a single spike. Here's the WinMTR output from that session (with the router disconnected that is)

WinMTR Statistics

WinMTR statistics

Host % Sent Recv Best Avrg Wrst Last
Request timed out. 100 619 0 0 0 0 0
Request timed out. 100 619 0 0 0 0 0
a258-gw.bahnhof.net 1 3088 3087 0 0 94 1
ume-ftp-dr1.svl-cr1.bahnhof.net 1 3088 3087 4 4 9 5
svl-cr1.sto-cr1.bahnhof.net 1 3088 3087 10 10 15 10
sto-cr1.sto-ixa-er1.bahnhof.net 1 3088 3087 10 10 18 10
72.14.211.124 1 3088 3087 10 10 16 11
Request timed out. 100 619 0 0 0 0 0
108.170.232.34 1 3088 3087 11 12 105 12
209.85.246.57 1 3088 3087 10 11 28 11
arn11s04-in-f14.1e100.net 1 3087 3086 10 10 14 10

Yes, please, if there is something related to the freeze it should be close to the end, and it is a pain ot fully redact especially logread which might even contain plain text passwords, dmesg is less problematic IIRC.

You might be able to still add things to an existing post, but as far as i can tell neither dmesg nor logread contain clear indications of hardware issues...

Could there be a run-away process using all CPU cycles? What is the output of top -d 1 when the latency increases?

@tmomas is it possible to help him out here manually? Obviously this is not a bot/spammer etc :wink:

1 Like

I'm afraid, I don't see any obvious way in the forum admin section to remove that no-new-postings restriction.

1 Like

Thanks for checking at least. :+1:

1 Like

In the switch config, make a different port belong to the same vlan as the WAN, which is probably VLAN 2 (but you can check under your WAN interface definition it probably has something like eth1.2 as its physical interface, which means it's in VLAN 2 (after the period), make this port untagged, make it not belong to any other VLAN, then move the WAN cable to that port.

I've tried this now, but if I've set it up correct (see picture) it didn't make a difference sadly.

And here is some output from mtr -z -e -b -G 10 google.com

Host                                                                                                                         Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. (waiting for reply)
 2. (waiting for reply)
 3. AS8473   a258-gw.bahnhof.net (46.59.117.181)                                                                               0.0%  3107    0.8   1.6   0.7 1607.  34.3
 4. AS8473   ume-ftp-dr1.svl-cr1.bahnhof.net (176.10.180.222)                                                                  0.0%  3107    4.7   5.3   4.5 1520.  31.3
 5. AS8473   svl-cr1.sto-cr1.bahnhof.net (46.59.112.48)                                                                        0.0%  3107   10.7  10.9  10.2 1433.  28.5
    [MPLS: Lbl 28235 TC 0 S 1 TTL 1]
 6. AS8473   sto-cr1.sto-ixa-er1.bahnhof.net (46.59.112.81)                                                                    0.0%  3107   10.5  10.9  10.2 1341.  25.8
 7. AS15169  72.14.211.124 (72.14.211.124)                                                                                     0.0%  3107   10.4  10.7  10.3 1249.  23.3
 8. (waiting for reply)
 9. AS15169  74.125.252.62 (74.125.252.62)                                                                                     0.1%  3106   13.4  13.7  12.1 2079.  45.6
10. AS15169  209.85.246.27 (209.85.246.27)                                                                                     0.0%  3106   11.5  12.7  11.1 1986.  46.3
11. AS15169  arn11s04-in-f14.1e100.net (216.58.207.206)                                                                        0.0%  3106   10.7  11.7  10.6 1893.  43.2

Ooooh, did you connect LAN2 with the internet socket? LAN2 is operating at 100Mbps according to the switch GUI above, given your 100/100 tarif? that might be okay, but my hunch is the ISP's side wants to use 1000 Mbps and due to a hick-up link speed is not negotiated correctly, or and that would match your intermittent issues might flip between 100 and 1000 Mbps with traffic stalling during the speed change. Normally such a change should show up in dmesg or logread, but due to the fact that even the normal WAN port is connected via a switch the kernel does not know (the switch will know, but will not tell the kernel about this, so unless the kernel asks nicely it will never know)

To test this hypothesis do:
swconfig list
to get the device name of your switch, here I assume rtl8366s then query the switch
`swconfig dev rtl8366s show'
and look what it reports for link speed for the wan port (or temporary wan port like LAN2) repeatedly.

I tried to follow @dlakelan earlier advice about changing the WAN-port. Up until my last message I've had it connected to the physical WAN-port as the usual/standard setup and complete fresh install (not changing anything in regards of setting in Openwrt). I also tried yesterday evening to once again try new ethernet cables, to exclude any damage inside them even as they're newly bought, but again without any success. I also once again tried to disconnect the router completely and plug my desktop computer directly to the internet socket on the wall, and could confirm that the latency spikes/disconnects disappeared (ran this setup for nearly 3 hours).

root@OpenWrt:~# swconfig list
Found: switch0 - 10.mvsw61xx

root@OpenWrt:~# swconfig dev rtl8366s show
Failed to connect to the switch. Use the "list" command to see which switches are available.

root@OpenWrt:~# swconfig dev 10.mvsw61xx show
Global attributes:
	enable_vlan: 1
	enable_mirror_rx: 0
	enable_mirror_tx: 0
	mirror_monitor_port: 0
	mirror_source_port: 0
Port 0:
	mask: 0x0000: (0) 
	qmode: 3
	pvid: 1
	link: port:0 link:down
Port 1:
	mask: 0x0000: (1) 
	qmode: 3
	pvid: 1
	link: port:1 link:down
Port 2:
	mask: 0x0000: (2) 
	qmode: 3
	pvid: 2
	link: port:2 link:up speed:100baseT full-duplex 
Port 3:
	mask: 0x0000: (3) 
	qmode: 3
	pvid: 1
	link: port:3 link:up speed:1000baseT full-duplex 
Port 4:
	mask: 0x0000: (4) 
	qmode: 0
	pvid: 0
	link: port:4 link:down
Port 5:
	mask: 0x0000: (5) 
	qmode: 3
	pvid: 0
	link: port:5 link:up speed:1000baseT full-duplex 
Port 6:
	mask: 0x0000: (6) 
	qmode: 3
	pvid: 0
	link: port:6 link:up speed:1000baseT full-duplex 
VLAN 1:
	port_based: 0
	vid: 1
	ports: 0 1 3 5t 
VLAN 2:
	port_based: 0
	vid: 2
	ports: 2 6t

Yes, I see, that is why it comes up at

But note link: port:2 link:up speed:100baseT full-duplex, I am still wondering whether that is what your ISP expects (well possible 100Mbps fast ethernet uses supposedly less energy and might be the ideal configuration for a 100/100 plan), would be interesting to look at the link speed when you directly hook up your PC.

Good idea, but since the same cables seem to work when connecting your PC that should be not be the root cause, no?

Interesting, could you repeat this and check the interface speed on the PC's NIC that connects to the "internet-socket", if this is also 100Mbps fast ethernet then my pet hypothesis from above would be very unlikely, but if the PC connects with 1000 Mbps gigabit ethernet...

Here's with the PC directly connected to the "internet-socket", I used two different ethernet cables (one for each picture).

I also for the last two hours reinstalled Linksys stock firmware and ran two different games for 1 hour, here are the results

> Ping statistics for 172.217.22.174:
>     Packets: Sent = 4399, Received = 4398, Lost = 1 (0% loss),
> Approximate round trip times in milli-seconds:
>     Minimum = 10ms, Maximum = 15ms, Average = 10ms
WinMTR Statistics

WinMTR statistics

Host % Sent Recv Best Avrg Wrst Last
Linksys04580 0 4385 4385 0 0 5 0
Request timed out. 100 879 0 0 0 0 0
Request timed out. 100 879 0 0 0 0 0
a258-gw.bahnhof.net 1 4380 4379 0 0 88 1
ume-ftp-dr1.svl-cr1.bahnhof.net 1 4380 4379 4 4 14 4
svl-cr1.sto-cr1.bahnhof.net 1 4380 4379 10 10 23 10
sto-cr1.sto-ixa-er1.bahnhof.net 1 4380 4379 10 10 22 10
72.14.211.124 1 4380 4379 10 10 15 10
Request timed out. 100 879 0 0 0 0 0
108.170.232.34 2 4196 4148 12 12 91 12
108.170.254.50 1 4373 4370 11 11 250 11
108.170.253.161 1 4380 4379 11 11 61 11
216.239.58.147 1 4381 4380 10 11 15 11
arn09s11-in-f14.1e100.net 1 4380 4379 10 10 14 10

Then I reinstalled/did a clean install of Openwrt 19.07.1 and ran the same two games for 1 hour (with the exact same cables, connected the same way - in the physical WAN-port on the router, the only thing I installed was mtr). Here are the results

> Ping statistics for 216.58.207.206:
>     Packets: Sent = 4007, Received = 4007, Lost = 0 (0% loss),
> Approximate round trip times in milli-seconds:
>     Minimum = 10ms, Maximum = 282ms, Average = 10ms
WinMTR Statistics

WinMTR statistics

Host % Sent Recv Best Avrg Wrst Last
OpenWrt.lan 0 4039 4039 0 0 5 0
Request timed out. 100 809 0 0 0 0 0
Request timed out. 100 809 0 0 0 0 0
a258-gw.bahnhof.net 1 4034 4033 0 0 811 1
ume-ftp-dr1.svl-cr1.bahnhof.net 0 4038 4038 4 4 644 5
svl-cr1.sto-cr1.bahnhof.net 0 4038 4038 10 10 543 10
sto-cr1.sto-ixa-er1.bahnhof.net 0 4038 4038 10 10 441 10
72.14.211.124 0 4038 4038 10 10 444 10
Request timed out. 100 809 0 0 0 0 0
74.125.252.62 1 4035 4034 12 12 589 12
209.85.246.27 0 4039 4039 10 11 896 11
arn11s04-in-f14.1e100.net 0 4038 4038 10 10 463 10
> OpenWrt (81.170.xxx.xxx)                                                                       2020-02-03T12:34:29+0000
> Keys:  Help   Display mode   Restart statistics   Order of fields   quit
>                                                                                Packets               Pings
>  Host                                                                        Loss%   Snt   Last   Avg  Best  Wrst StDev
>  1. (waiting for reply)
>  2. (waiting for reply)
>  3. AS8473   a258-gw.bahnhof.net (46.59.117.181)                              0.0%  4090    0.8   0.7   0.7 962.0  15.6
>  4. AS8473   ume-ftp-dr1.svl-cr1.bahnhof.net (176.10.180.222)                 0.0%  4090    4.6   4.5   4.5 874.1  13.6
>  5. AS8473   svl-cr1.sto-cr1.bahnhof.net (46.59.112.48)                       0.0%  4090   10.4  10.3  10.2 788.0  12.2
>     [MPLS: Lbl 28235 TC 0 S 1 TTL 1]
>  6. AS8473   sto-cr1.sto-ixa-er1.bahnhof.net (46.59.112.81)                   0.0%  4090   10.7  10.3  10.2 696.1  10.7
>  7. AS15169  72.14.211.124 (72.14.211.124)                                    0.0%  4090   10.5  10.4  10.4 604.4   9.3
>  8. (waiting for reply)
>  9. AS15169  74.125.252.62 (74.125.252.62)                                    0.0%  4090   12.1  12.0  11.7 422.5   6.8
> 10. AS15169  209.85.246.27 (209.85.246.27)                                    0.0%  4090   11.1  10.9  10.7 329.5   5.2
> 11. AS15169  arn11s04-in-f14.1e100.net (216.58.207.206)                       0.0%  4090   10.3  10.3  10.2 236.9   3.5

Hmmm... well I think we can rule out hardware issues because stock firmware works well.. and we can rule out the 100/1000 Mbps Ethernet thing because your computer negotiates 100Mbps as well.

This leaves us with driver issues, which is strange because others don't report the same thing...

I'm confused

edit: perhaps OpenWrt doesn't negotiate with the onboard switch to handle the 100 Mbps properly and most people never have this issue because they have gigabit ports.

Okay, so Bahnhof seems to run the 100/100 links as fast ethernet, good for them, but unfortunately this also seems to deflate my hypothesis.
Next stop would be powersaving/frequency scaling options, if your router actually supports those.

good guess! I think it does support that.

also what about EEE?

@ixdinfinity do you have a switch you could throw between the wall and the wrt device. if it's a problem with energy efficient Ethernet, perhaps something between the two would smooth over that problem

1 Like

Unfortunately I don't have a switch at the moment to try with (I'd have to buy one). I took screenshots of the video I recorded just before I wrote this thread in Saturday. I noticed in the video that it's like a freeze/hiccup between the first and second picture, I don't know if it helps in any way?

is there anything in the logs when this happens?

Logs in this case does it refer to logread and dmesg? In that case I would say no, I didn't check the logs during the time of the video. But I did copy and posted logs as quick as I could (around 5-10 seconds) after the same type of latency spikes further up in the thread (logread and dmesg)

Edit: Oh and is there any way to try changing the powersaving/frequency scaling options? I read about it for the wrt3200 but didn't come to a conclusion if it's possible or not and not some clear instructions for how to do it

Following as I've seen the same thing. I just wrote it off as the device having bad firmware that makes it useless for gaming. I assumed it was related to having dozens of devices connected via wifi.

I hope you can find the culprit and this device can be fixed.