Random packet loss with OpenWrt

Update as of September 7th: currently it looks like OpenWrt is not the main issue in this case. Things are still quite unclear and I will keep this thread updated. Original post below.

Since a few months I experience seemingly random packet loss with OpenWrt. Unfortunately I only started digging today and cannot tell exactly when issues started to appear. I am quite certain though that the latest LEDE-builds worked, possibly earlier 18.x.x-builds, too.

  • setup ISP supplied cable modem <=> my own router with OpenWrt <=> other devices
  • router: Archer C7 v2
  • OpenWrt versions tested: 18.06.8, 19.07.x, snapshot r14394-252197f014

The issue:

SSHing into the router via ethernet and pinging google.com, the results frequently look like this:

64 bytes from seq=3 ttl=112 time=21.784 ms
64 bytes from seq=4 ttl=112 time=21.556 ms
64 bytes from seq=12 ttl=112 time=798.323 ms
64 bytes from seq=13 ttl=112 time=20.139 ms
64 bytes from seq=14 ttl=112 time=22.528 ms
64 bytes from seq=15 ttl=112 time=21.777 ms

--- google.com ping statistics ---
51 packets transmitted, 44 packets received, 13% packet loss
round-trip min/avg/max = 19.951/38.805/798.323 ms

Sometimes I get two or three minutes of a stable connection before issues start to appear again. It is less noticable during browsing and every day work. Streaming (upload direction mostly?) or games on the other hand frequently have issues.

What I tried:

  • connecting my computer directly to the cable modem using the same ethernet cable as the OpenWrt-device: issue solved. Clearly this is not what I am aiming for but it seems to rule out faulty cables.
  • enable/disable SQM, tinker with SQM config: nothing substantial changes. My setup used to work fine with SQM, before at some point it didn't. Since issues persist without SQM enabled, this is also not the issue.
  • disable all kinds of extra services like adblock or dyndns issues persist
  • reset OpenWrt firmware to factory defaults: issues persist
  • try older/newer OpenWrt releases (see above): issues persist
  • assorted tinkering with MTU-settings, firewall settings, interface settings: issues persist

I am not quite sure what else there is to try. If at all possible I'd like to avoid flashing back the original firmware which I did not use in ages - also it clearly wouldn't solve the problem with OpenWrt.

Any help and suggestions would be greatly appreciated. If anyone needs more info, let me know. I was hoping for the latested snapshot builds to fix the issue but apparently most people do not have any problems or noone is aware of it - hence this post.

Try the following on a linux computer in your internal network:

mtr -ezb4w -i 1.0 -c 240

and then post the output here. Then follow this by

mtr -ezb4w -i 0.1 -c 240

to see what happens with higher probe rates (also, please post the output here).

If you have sqm installed an enabled I would like to see the output of tc -s qdisc.

This might give an idication where along the path the loss happens.

EDIT: fixed mrt to the correct mtr spelling...

Thanks for the quick reply. Firstly, small typo. It should be mtr, not mrt. I was confused at first. :smiley: No big deal, just in case someone else wants to replicate this.

As for the results, I will post four results in chronological order. It will become apparent that at times the connection works well, at other times it does not.

1st run: (240 probes, 1 second interval)

[xxx@xxxxx ~]$ sudo mtr -ezb4w -i 1.0 -c 240
Start: 2020-09-06T16:03:28+0200
HOST: xxxxx
                                                                       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    _gateway (                                    0.0%   240    0.9   0.8   0.8   1.7   0.1
  2. AS3209   xxxxx.dynamic.kabel-deutschland.de (        0.0%   240    8.2  10.3   6.9  61.0   7.8
  3. AS3209   ip5886d7e6.static.kabel-deutschland.de (   0.0%   240   10.8  10.1   7.5  26.4   2.9
  4. AS3209   ip5886c17e.static.kabel-deutschland.de (   0.0%   240   11.7  11.7   9.8  20.7   1.1
  5. AS3209                                              0.0%   240   12.5  12.0  10.1  21.1   1.3
  6. AS3209                                             0.0%   240   18.4  19.6  17.7  55.1   2.5
  7. AS3209                                             0.0%   240   18.8  19.6  17.2  87.5   5.6
  8. AS15169                                             0.0%   240   22.7  23.4  21.8  32.4   1.3
  9. AS15169                                            0.0%   240   25.5  25.0  23.5  57.4   2.3
 10. AS15169                                            0.0%   240   24.5  23.6  22.3  31.2   0.9
 11. AS15169  dns.google (                                      0.0%   240   22.3  22.3  21.1  31.8   1.1

2nd run: (240 probes, 1 second interval)

[xxx@xxxxx ~]$ sudo mtr -ezb4w -i 1.0 -c 240
Start: 2020-09-06T16:08:02+0200
HOST: xxxxx                                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    _gateway (                                    0.0%   240    0.9   0.8   0.8   1.2   0.1
  2. AS3209   ip5f5b6bfe.dynamic.kabel-deutschland.de (   0.0%   240    8.0 378.0   7.0 3065. 779.3
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  3. AS3209   ip5886d7e6.static.kabel-deutschland.de (   0.0%   240    9.6 394.5   7.5 3063. 797.9
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  4. AS3209   ip5886c17e.static.kabel-deutschland.de (   0.0%   240   10.9 397.8  10.4 3063. 802.5
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  5. AS3209                                              0.0%   240   12.1 393.5  10.2 3059. 790.0
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  6. AS3209                                             0.0%   240   19.2 398.3  18.0 3058. 782.4
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  7. AS3209                                             0.0%   240   19.0 373.6  17.3 2966. 750.4
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  8. AS15169                                             0.0%   240   24.2 354.8  21.9 2875. 716.8
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  9. AS15169                                            0.0%   240   24.6 334.4  23.0 2784. 684.3
     AS3209  xxxxx.dynamic.kabel-deutschland.de (
 10. AS15169                                            0.0%   240   24.3 352.4  22.1 3075. 716.4
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
 11. AS15169  dns.google (                                      0.0%   240   21.8 381.4  20.8 3061. 757.4
     AS3209   xxxxx.dynamic.kabel-deutschland.de (

3rd run (240 probes, 0.1 second interval)

[xxx@xxxxx ~]$ sudo mtr -ezb4w -i 0.1 -c 240
Start: 2020-09-06T16:12:52+0200
HOST: xxxxx                                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    _gateway (                                   87.5%   240    1.0   1.0   0.7   1.4   0.1
  2. AS3209   ip5f5b6bfe.dynamic.kabel-deutschland.de (  80.4%   240  2633. 588.8   7.2 3051. 840.7
     AS3209   xxxxx.dynamic.kabel-deutschland.de (

4th run (240 probes, 0.1 second interval)

[xxx@xxxxx ~]$ sudo mtr -ezb4w -i 0.1 -c 240
Start: 2020-09-06T16:13:57+0200
HOST: xxxxx                                                            Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    _gateway (                                   87.5%   240    0.7   0.9   0.7   1.4   0.2
  2. AS3209   ip5f5b6bfe.dynamic.kabel-deutschland.de (  84.2%   240    9.6 244.8   7.2 834.6 308.3
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  3. AS3209   ip5886d7e6.static.kabel-deutschland.de (  13.3%   240    8.2  79.5   7.0 928.9 191.2
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  4. AS3209   ip5886c17e.static.kabel-deutschland.de (  13.8%   240   12.2  78.3  10.1 919.6 189.1
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  5. AS3209                                             13.8%   240   10.9  77.1  10.5 910.1 186.3
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  6. AS3209                                            13.8%   240   19.4  83.9  17.6 903.5 182.3
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  7. AS3209                                            13.8%   240   19.3  82.0  17.5 894.0 179.7
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  8. AS15169                                            13.8%   240   24.1  84.3  21.8 886.7 176.9
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
  9. AS15169                                           13.8%   240   24.5  84.6  23.1 878.6 174.0
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
 10. AS15169                                           13.3%   240   24.0  85.1  22.2 867.9 176.5
     AS3209   xxxxx.dynamic.kabel-deutschland.de (
 11. AS15169  dns.google (                                     13.3%   240   22.4  82.4  20.9 858.0 174.2
     AS3209   xxxxx.dynamic.kabel-deutschland.de (

So to me this looks as though the packet loss occurs right "after" the router, which is consistent with my previous observations. The question is: why? Again, if I connect directly to the cable modem, everything seems to be fine (tested for over 20 minutes at a time..).

Correction: there seems to be packet loss at the first step already. I overlooked that since latency seemed normal.

As for SQM: I currently do not have SQM installed/enabled. I wanted as little interference as possible. Right now everything you see is done with today's snapshot and almost no customization.

Ooops, sorry, yes, mtr not mrt... thanks for figuring that out and fixing it.

I agree. Also the Wrst delays look crazy 3065 ms or 3 seconds to the first upstream host (probably somewhere at/above the CMTS level). Testing was done over WiFi or over a wired connection?

With an interva of 100ms some packet loss is expected, as most intermediate devices are configured to deprioritize and rate-limit ICMP processing. So it seems the -i 0.1 was too aggressive, maybe -i 0.5 or might be a better way to probe things.

Good, that removes one possible culprit (ad also bad, because the only thing I have soe experience in is sqm).

I wonder, if you have access configured to the modem itself ( In that case running an mtr against that IP would also be interesting.

Mmmh, since this is a cable link maybe the issue is IPv4 related, could you also try:
sudo mtr -ezb6 -i 1.0 -c 300 2001:4860:4860::8888
Depending on how your IPv4 is implemented this might make a difference...

Everything was done over a wired connection.

I reran the tests, packet loss goes down to about 45% for the router, 20% to 30% for the next hop and 0% to 10% for the remaining hops. Latency on the other hand does not improve and is still around 3000ms for the first upstream host in worst case.

The modem exposes a web interface with some status info which I could share. There are no configuration options available. I ran mtr against it though and the results are basically the same as above (first hop after the gateway). In a quick test there was little to no packet loss for the modem itself, though very high latencies in the worst case. The router on the other hand showed high packet loss at very short intervals but consistently low latencies.

My connection is IPv4 only so this doesn't work. I never changed in the last, 7, 8 years maybe. Also the cable modem is from the stone age but used to work very well until recently. Actually it still seems to work, as long as there is no OpenWrt in between. I didn't call my ISP yet. They usually are like "Have you tried connecting your computer directly to the modem?", which in my case actually solves the problems...

Appendix (cable modem status info):

 WebSTAR EPX2203

Modem Serial Number
Cable Modem MAC Address

Hardware Version

Software Version

Receive Power Level
 2.2 dBmV

Transmit Power Level
 46.0 dBmV

Cable Modem Status

Downstream Status

Channel ID

Downstream Frequency
 162000000 Hz


Bit Rate
 55616000 bits/sec

Power Level
 2.2 dBmV

Signal to Noise Ratio
 37.7 dB

Upstream Status

Channel ID

Upstream Frequency
 58600000 Hz


Bit Rate
 10240000 bits/sec

Power Level
 46.0 dBmV

Cable Modem Status

Cable Modem IP Addess

Current Time
 Sun Sep 06 15:33:52 2020

Time Since Last Reset
 9 days 07h:29m:24s

Configuration File

Cable Modem Certificate
CPE Connections
The data shown in the table below provides information about the customer premise equipment (CPE) connected to your cable modem.

Connected to	MAC Address	IP Address
Ethernet	00:00:00:00:00:00	--.--.--.--
Ethernet	C4:6E:xF:73:3F:xE	--.--.--.--
Ethernet	C4:6E:xF:73:3F:xF
Ethernet	C8:5B:76:06:4C:xx	95.91.59.xx

try the same test with a laptop connecting the modem directly, how is it, same?
I usually use mtr with tcpdump together and count pkts to see the pattern
eg start tcpdump
tcpdump -i ethX -s0 -n -w /tmp/mtr.pcap

problem could be due to flaky eth port too....maybe
try to connect the WAN port with your laptop and set static ips on both side just for testing

It is substantially better. Running mtr against the modem gives little to no packet loss, depending on the interval. Latency is much more stable, eventhough there are some rare spikes up to about 500ms.

When using the router, I frequently get a straight "no route to host" for 2, 3 seconds when pinging the modem or running mtr against it. When directly connecting to the modem, I did not observe such outages yet. Its quite bizarre.

That seems possible, but then again the problems are (almost) gone or at least much, much better when directly connecting to the modem. Could be a faulty uplink-port on the router but why would that break? It sits in its corner and just does its thing. Also shaking it a bit during tests does nothing. For now im inclined to blame either the modem or OpenWrt.

Okay, that IMHO rules out the ISP/cable-segment as most likely cause of the issue, something you knew all along.

Yes, it seems the wrong tree to bark up on, sorry.

Next question, do you see anything that might be related to ethernet drop outs in either logread or dmesg output (make sure to redact sensitive information if you post excerpts of that data here in the forum, like passwords and/or your IP address)?

That sounds like there might be multiple issues at work here, but let's try to figure out the 3 second stalls first :wink:

1 Like

Yep, im gravitating towards some weird modem issue though. Something which in my case is still the ISPs business in some way.

Looks completely normal to me. logread lists my ssh-login as the last entry, while ping/mtr gives the usual weirdness at the same time. dmesg is also completely silent 35 seconds after a router reboot. Nothing to report there.

Mmmh, I wonder what $DEVICE do you get from
ifstatus wan
and what do
ethtool $DEVICE
ethtool --show-pause $DEVICE
ethtool --show-eee $DEVICE
return? (Replace $DEVICE by what ever you found as device from the ifstatus call), in case ethtool is not already installed on the router, do opkg update ; opkg install ethtool first.

I vaguely remember running into EEE (energy efficient ethernet) issues some years in the past, where an EEE enabled router did not harmonize well with older pre-EEE gear until I disables EEE for the router's port connected to the old device, but my memory is a bit dim and this could lead nowhere fast. Then again, your modem being old might make this at least worth a try.
Something like ethtool --set-eee $DEVICE eee off should help to disable this, assuming it is enabled in the first place...

The device is "eth0.2" (used to be eth0 in earlier releases I believe? but thats OpenWrt default now, some virtual device). Both eee and pause return not supported, for eth0.2 and eth0 respectively. Also trying to set anything doesnt work with the same message.

Nice idea though, I would never have thought about something like this.

Mmmh, too bad, but that brings me to a new idea potentially worth testing, if you have a ethernet switch you could borrow, try to place it between modem and router so that hopefully mismatches between their ethernet devices might be worked around. This is primarily intended as a testing vehicle and not necessarily as permanent solution (but it probably is not going to help anyway).

Nice idea again, unfortunately it didn't help.

I tried two things:

  1. I put an old switch (10/100mbit, probably older than the modem) between modem and router. Initially things seemed more stable, but that was probably coincidence. Still drop outs while pinging the modem, still the occasional very high latency with mtr.

  2. I removed the switch again but now replaced the cable between router and modem. No effect for better or worse aswell.

What is interesting though is that the maximum latency I observe with mtr is always around 3000ms, sometimes 2980, sometimes 3060, but never anything much more or less. Also I do not see a fixed pattern. Sometimes everything seems smooth for 3, 4, 5 minutes and then again there are several seconds of "outage" (including "host unreachable") at a time.

not sure if you got my hint correctly, if I understand correctly, you have
modem - router(owrt) - lan, right?
ok, forget modem for a moment, put together this network for testing
laptop(eth) - router(eth) with the same uplink cable
and do testing this way, do you get the same pkt drops? same rate etc?



  1. I connected the Laptop directly to the router, using the same cable (not the same port though..) and everything looked normal.

  2. I reran the test you suggested above. I connected the laptop directly to the cable modem. This time I let mtr run for 5 minutes (300 probes against the modem itself, 1 second interval) and voila - unstable connection. Again some packet drops, again around ~3000ms worst case latency.

I am sorry for the confusion above, previous (shorter) tests didn't reveal the issue. Also I ran some other tests previously today (upstream test to twitch, looking at dropped frames due to network issues) and while they gave me issues within 2 minutes with modem <=> router <=> laptop, they ran 10-15 minutes without a hiccup with modem <=> laptop. I have no idea whats going on.

My current thinking is that my router (and OpenWrt!) is probably not the (main?!) problem. I will call my ISP tomorrow and see what they suggest. If you have any other ideas, I am of course curious. Also I will keep this thread updated and correct my posts above as soon as I am definitely sure about what is happening here. Everything being smooth with old LEDE builds seems to be a temporal coincidence at this point.

So I think it looks like there is something already between router and modem that perform sub-optimally.

That something could be either one or more of:

  1. The routers OS.
  2. The router's wan port
  3. The network cable between router and modem
  4. The modem's LAN port.
  5. The modem's OS
  6. Potential issues between modem and CMTS that "knock" out the modem for longer periods of time

Might make sense to re-test these one by one, no? If you are dead certain about any of these just skip that test...

The router OS
Easy to test, by either switching back to an older release of OpenWrt or even the stoch firmware by the manufacturer, or by temporarily testing a different router.
Your laptop test seems to be equivalent to this.

The router's wan port
Reconfigure the router to use one of its LAN ports for wan, requires a bit of VLAN trickery but nothing insurmountable. Alternatively use a known good replacement router. Also follow @rpmoomin's idea and replace the MODEM with the laptop and repeat your test, router-WAN -> ethernet cable -> laptop (you probably need to configure both router-wan and laptop for static IP addresses outside of your internal network address range).

The network cable between router and modem
Just use a known good ethernet cable instead, might make sense to test both cross-over and straight through cables just in case there is an issue with auto-MDI-X.

The modem's LAN port & The modem's OS
The easiest test for these would be a replacement modem, which given the switch to DOCSIS 3.1 might be a good idea independent of whether your current modem is broken or not.

1 Like

Thanks again for all your suggestions. I called my ISP just now and to my great surprise they immediately agreed to send a replacement for the old modem. I'd assume that the new device is reasonably up to date and also DOCSIS 3.1 compatible, we will see. I will report back, hotline guy estimated 1-2 days.

In the meantime, let me reply to your last points.

That was my initial guess but having tested the last 18.06 release, every single stable 19.x-release, a recent snapshot and having tested the modem without any involvement of the router, I am now doubtful about OpenWrt being the issue. Of course if the issues persist with the new device, I will resort to stock firmware for final clarification.

Again, this seems unlikely to me but I will come back to this if everything else fails. I was already rattling and shaking the port (or rather the cable) slightly and you'd expect to see at least some effect if there are mechanical issues or faulty electronics. Also the issues are very similar or indeed the same when connecting my laptop directly to the modem. This is something I completely missed in the beginning due to good (or rather bad?) luck and which initially led me to believe that OpenWrt/the router is the problem.

This I already did yesterday with no apparent effect.

This is what we are left with and which I hope to be able to confirm/rule out within the next 2 or 3 days. Maybe it is even thinkable that the electronics degraded somewhat in the past 10, 15 years? This is speculation of course but we will see how the new device performs.

Once again, thank you so much for your support! As I said above, I will report back soon and hopefully clear OpenWrt's name in this case.

1 Like

EEE, do you mean Energy-Efficient Ethernet?

Unless you are referring to your device only, EEE is supported in OpenWrt; but:

  • it probably won't make your latency go down (there is a slight delay when re-powering the port)
  • it has to be supported by the device connected to the port

And pause return...do you mean hardware flow control???

Yea, I doubt many supported devices even had that built in, hardware NAT, yes - pause return, no.

I am back with an update. It took me some time because the replacement device only arrived here yesterday evening.

To make it short: everything seems to work as expected. OpenWrt seems to be innocent! In a few hours of testing, no packet loss has been observed. The only thing I changed was the cable modem (actually a cable router in bridge mode now). Cables, OpenWrt-router and everything else stayed the same. That seems to pretty conclusively point to the old modem as the culprit. The new device apparently is DOCSIS 3.1 compatible but I am not quite sure which mode is actually used.

For now I am quite happy. I might be back with a few questions concerning optimal SQM-settings in my scenario, especially link layer settings but that might be stuff for another thread maybe? In case moeller0 has any advice straight away, here is the setup:

downlink 32000kbps, uplink 2000 kbps, cable modem. The advertised and measured values pretty much match up. For now 31500, 1900, cake, piece of cake seems to work well, but if there is room for improvement, im always up for it.

So finally thank you all for your suggestions. Lots of these might be helpful in general to diagnose local network issues.

Sure for cable docsis, use for testing:

config queue 'eth1'
	option ingress_ecn 'ECN'
	option egress_ecn 'NOECN'
	option itarget 'auto'
	option etarget 'auto'
	option verbosity '5'
	option qdisc 'cake'
	option script 'piece_of_cake.qos'
	option qdisc_advanced '1'
	option squash_dscp '1'
	option squash_ingress '1'
	option qdisc_really_really_advanced '1'
	option eqdisc_opts 'nat dual-srchost ack-filter'
	option linklayer 'ethernet'
	option linklayer_advanced '1'
	option tcMTU '2047'
	option tcTSIZE '128'
	option linklayer_adaptation_mechanism 'default'
	option debug_logging '1'
	option enabled '1'
	option iqdisc_opts 'nat dual-dsthost ingress'
	option interface 'YOURWANINTERFACEHERE'
	option download '31500'
	option upload '1900'
	option overhead '18'
	option tcMPU '64'

Make sure to replace YOURWANINTERFACEHERE with the name of your wan interface (the output of ifstatus wan should tell you).
This will account for the correct per-packet-overhead (18 bytes) and minimal packet size (64 byte) for a typical cable link.
It will also use ack-filtering on egress, which seems to be the right thing to do on bursty links like docsis.
The "nat dual-xxhost" stanzas in that directionality will configure per internal IP fairness, in which all concurrently active machines get an equal share of the available bandwidth (that is each machine gets all bandwidth if nobody else sends receives something) which often works very well for home links as it isolates the bad consequences of say running a torrent client mainly to the machine running the client while other machines work reasonably well in spite of ongoing background torrenting. If you dislike that simple remove these keywords from the config file. (The nat keyword is required so that the dual-xxxhost options actually can see the correct IP addresses)
The ingress keyword helps to make up for the fact that for download traffic our shaper is technically on the sup-optimal end of the actual bottleneck and will result in less dependence of the shaper on numer of concurrent flows.