A question to the experts here. For a while now I have been experiencing significant latency spikes that affect VoIP and page loading. I tested by pinging 8.8.8.8 when the router (Netgear R7800) is pretty much idle. I would start 6..8 concurrent ping sessions and every minute I would see two or three spikes to 50..100 ms like below and they would most of the time happen synchronously in multiple sessions.

2018-03-03 15:26:07 64 bytes from 8.8.8.8: icmp_seq=525 ttl=60 time=58.013 ms
2018-03-03 15:26:35 64 bytes from 8.8.8.8: icmp_seq=553 ttl=60 time=37.198 ms
2018-03-03 15:27:07 64 bytes from 8.8.8.8: icmp_seq=585 ttl=60 time=76.856 ms
2018-03-03 15:28:06 64 bytes from 8.8.8.8: icmp_seq=643 ttl=60 time=60.067 ms

As suggested in this thread, I tried moving IRQ's to CPU1/0 in different permutations, tried both 17.01 and master, etc and all with no success. Then I noticed that while the IRQ's are running on their CPU exclusively, all the other processes (kernel workers, hostapd, dnsmasq, etc) are constantly jumping back and forth between the CPU's.

I used @hnyman's build env and built a firmware with (and without) isolcpus=1 (based on master code): the only differences from the original here are an additional boot param and a few more recent commits. The rest remained the same.

Then I tested four permutations below (using wired connection and the same source code):

  1. No isolcpus=1 and IRQ's on CPU0
  2. No isolcpus=1 and IRQ's on CPU1
  3. isolcpus=1 and IRQ's on CPU0
  4. isolcpus=1 and IRQ's on CPU1

The first three yielded no difference, but the last one dropped the size of the spikes to ~20ms and they are now 10+ minutes apart vs several each minute.

2018-03-03 15:34:03 PING 8.8.8.8 (8.8.8.8): 56 data bytes
2018-03-03 15:37:49 64 bytes from 8.8.8.8: icmp_seq=226 ttl=60 time=21.615 ms
2018-03-03 15:50:47 
2018-03-03 15:50:47 --- 8.8.8.8 ping statistics ---
2018-03-03 15:50:47 1000 packets transmitted, 1000 packets received, 0.0% packet loss
2018-03-03 15:50:47 round-trip min/avg/max/stddev = 11.082/11.963/21.615/0.475 ms
2018-03-03 15:50:47 PING 8.8.8.8 (8.8.8.8): 56 data bytes
2018-03-03 15:57:12 64 bytes from 8.8.8.8: icmp_seq=382 ttl=60 time=22.734 ms
2018-03-03 16:07:33 
2018-03-03 16:07:33 --- 8.8.8.8 ping statistics ---
2018-03-03 16:07:33 1000 packets transmitted, 1000 packets received, 0.0% packet loss
2018-03-03 16:07:33 round-trip min/avg/max/stddev = 10.913/11.839/22.734/0.492 ms
2018-03-03 16:07:33 PING 8.8.8.8 (8.8.8.8): 56 data bytes
2018-03-03 16:24:15 
2018-03-03 16:24:15 --- 8.8.8.8 ping statistics ---
2018-03-03 16:24:15 1000 packets transmitted, 1000 packets received, 0.0% packet loss
2018-03-03 16:24:15 round-trip min/avg/max/stddev = 11.192/11.921/15.246/0.326 ms

So CPU1 is now only for servicing IRQ's for eth0, eth1, wifi0, and wifi1 while everything else is running on CPU0.

Does this make sense or I am seeing things? I am not quite sure I can explain why there is such a difference.

2 Likes