Speedtest: new package to measure network performance

If I remember right, it would be something like
low = 10-15%,
high = 90% on one core, 15% on the other

So, cake caused one core to be rather fully utilised.

Thanks for the clarification. I honestly didn't think the CAKE overhead was that high from the little A/B testing I did in the past, but note that was done by "eyeballing" top and without the benefit of this script. I'm hoping @dtaht could share his observations...

@hnyman @egross Could you please confirm seeing CPU average frequency on your multi-core/freq. scaling boxes, since my last update to the script added this capability?

I've successfully tested on multi-core Ubuntu and single-core (no freq. scaling) OpenWrt, but don't have an OpenWrt router that does frequency scaling to check on. Thanks again!

@guidosarducci I did some testing of on the raspberry pi3+. first CPU set on performance and second ondemand

-----------------------------------------------------
 OpenWrt SNAPSHOT, r8467-dd02a19ff5
 -----------------------------------------------------
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
1400000
root@OpenWrt:~# speedtest.sh -H netperf-east.bufferbloat.net -p 1.1.1.1 --concurrent
2018-11-18 05:10:39 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-east.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
..................................................................................................................................
 Download:  34.33 Mbps
   Upload:   9.16 Mbps
  Latency: [in msec, 128 pings, 0.00% packet loss]
      Min:  13.693
    10pct:  14.644
   Median:  18.387
      Avg:  41.574
    90pct:  88.621
      Max: 350.122
 CPU Load: [in % busy (avg +/- std dev), 128 samples]
     cpu0:   1.1% +/-  1.4%
     cpu1:   1.2% +/-  1.6%
     cpu2:   1.9% +/-  2.5%
     cpu3:   1.4% +/-  1.7%
 Overhead: [in % total CPU used]
  netperf:  0.8%
root@OpenWrt:~# 
root@OpenWrt:~# speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --concurrent
2018-11-18 05:14:34 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  33.32 Mbps
   Upload:   9.28 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  15.867
    10pct:  19.486
   Median:  24.562
      Avg:  33.013
    90pct:  38.205
      Max: 278.781
 CPU Load: [in % busy (avg +/- std dev), 59 samples]
     cpu0:   1.8% +/-  1.3%
     cpu1:   2.9% +/-  1.9%
     cpu2:   3.1% +/-  1.8%
     cpu3:   2.6% +/-  1.7%
 Overhead: [in % total CPU used]
  netperf:  1.5%

root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
600000
root@OpenWrt:~# speedtest.sh -H netperf-east.bufferbloat.net -p 1.1.1.1 --concurrent
2018-11-18 05:38:36 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-east.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  41.61 Mbps
   Upload:   9.16 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:  18.280
    10pct:  22.199
   Median:  32.515
      Avg:  43.444
    90pct:  79.759
      Max: 221.882
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:   8.9% +/-  3.7%
     cpu1:   8.6% +/-  4.1%
     cpu2:   8.5% +/-  4.3%
     cpu3:   9.8% +/-  3.5%
 Overhead: [in % total CPU used]
  netperf:  5.5%
root@OpenWrt:~# speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --concurrent
2018-11-18 05:39:45 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  29.05 Mbps
   Upload:   9.31 Mbps
  Latency: [in msec, 62 pings, 0.00% packet loss]
      Min:  16.105
    10pct:  17.909
   Median:  24.875
      Avg:  52.382
    90pct: 116.738
      Max: 442.237
 CPU Load: [in % busy (avg +/- std dev), 59 samples]
     cpu0:   7.5% +/-  3.0%
     cpu1:   5.4% +/-  2.3%
     cpu2:   6.2% +/-  3.0%
     cpu3:   6.3% +/-  3.2%
 Overhead: [in % total CPU used]
  netperf:  4.2%

A quick update after a brief absence...

Thanks @m4r for the extra data points. The lack of frequency information still wasn't clear until the "raspberry pi3+" hint, and checking online for sample /proc/cpuinfo files. Apparently, the contents of /proc/cpuinfo aren't standardized and vary between Linux platforms, and is a point of complaint for many.

In particular, Linux/arm doesn't include CPU frequency in /proc/cpuinfo while Linux/x86_64 does. For more reliable output, I've updated speedtest to use the sysfs interface for CPU frequency monitoring, which should be more robust.

The OP has been edited with the new package link: speedtest_0.9-6_all.ipk

I'd be interested to hear if those users with multi-core (non-x86) routers now see CPU frequency information. Thanks everyone!

1 Like

Sorry for the delayed response.

I do see frequency in the output now. You will notice that I also started irqbalance first and that seems to help lower the cpu slightly (in the upload test, one cpu would get maxed):

root@OpenWrt:~# irqbalance
root@OpenWrt:~# speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1
2018-12-18 11:45:21 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download: 910.64 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:   4.924
    10pct:   5.323
   Median:   6.689
      Avg:   6.769
    90pct:   7.466
      Max:  18.762
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  77.7% +/-  9.6%  @ 1725 MHz
     cpu1:  59.7% +/- 10.0%  @ 1725 MHz
 Overhead: [in % used of total CPU available]
  netperf: 49.2%
...........................................................
   Upload: 815.01 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:   5.199
    10pct:  10.478
   Median:  21.578
      Avg:  27.282
    90pct:  51.101
      Max:  75.928
 CPU Load: [in % busy (avg +/- std dev), 52 samples]
     cpu0:  91.3% +/-  0.0%  @ 1725 MHz
     cpu1:  95.0% +/-  0.0%  @ 1725 MHz
 Overhead: [in % used of total CPU available]
  netperf: 39.2%

Concurrent test:

root@OpenWrt:~# speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --concurrent
2018-12-18 11:49:30 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
............................................................
 Download: 136.61 Mbps
   Upload: 672.10 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:   5.317
    10pct:  12.724
   Median:  22.947
      Avg:  24.738
    90pct:  39.668
      Max:  54.965
 CPU Load: [in % busy (avg +/- std dev), 55 samples]
     cpu0:  89.0% +/-  0.0%  @ 1725 MHz
     cpu1:  97.1% +/-  1.9%  @ 1725 MHz
 Overhead: [in % used of total CPU available]
  netperf: 42.8%
1 Like

It shows this way in ipq806x R7800:

root@router1:/tmp# ./speedtest.sh
2018-12-18 21:56:55 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.............................................................
 Download:  80.41 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  11.478
    10pct:  11.572
   Median:  11.838
      Avg:  11.901
    90pct:  12.298
      Max:  12.686
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  35.3% +/- 17.8%  @ 1283 MHz
     cpu1:  35.9% +/- 19.3%  @ 1260 MHz
 Overhead: [in % used of total CPU available]
  netperf: 21.0%
.............................................................
   Upload:   9.19 Mbps
  Latency: [in msec, 62 pings, 0.00% packet loss]
      Min:  11.489
    10pct:  11.617
   Median:  12.308
      Avg:  12.382
    90pct:  13.118
      Max:  14.770
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:  16.1% +/- 13.4%  @  967 MHz
     cpu1:  14.9% +/- 14.9%  @  866 MHz
 Overhead: [in % used of total CPU available]
  netperf:  2.5%
root@router1:/tmp# ./speedtest.sh -c
2018-12-18 21:59:09 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  80.40 Mbps
   Upload:   8.18 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  11.409
    10pct:  11.758
   Median:  12.505
      Avg:  12.708
    90pct:  13.526
      Max:  17.203
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  38.4% +/- 19.2%  @ 1294 MHz
     cpu1:  41.2% +/- 15.6%  @ 1297 MHz
 Overhead: [in % used of total CPU available]
  netperf: 20.8%
1 Like

Hi! I´ve done some very basic benchmarks to figure out if an Asus AC68U (ARM dual core @ 800MHz) has enough CPU to manage SQM in a 100 / 100 mbit connection. The aim is to use VLAN in the router as well, but I forgot to also try it out during the tests.

SQM settings in OpenWRT (18.06.1) was set to 100 000 for both upload and download. Queue Discipline was set to Cake and Piece of Cake.

The test setup consists of a local netperf server in my LAN which the Asus AC68U router was connected to from its WAN port. Not sure I get realistic results from limiting the throughput using SQM parameters in the router compared applying the limitations further upstream.

Finally I installed the "speetest package" from: https://github.com/guidosarducci/packages/tree/master-add-speedtest/net/speedtest/files

I then ran this command using different CPU frequencies : speedtest.sh -H -p 1.1.1.1 --concurrent

Below are test results:

CPU 400 Mhz
Memory: 800 Mhz

Download: 93.20 Mbps
Upload: 93.96 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
Min: 8.280
10pct: 8.966
Median: 10.038
Avg: 10.225
90pct: 11.475
Max: 16.441
CPU Load: [in % busy (avg +/- std dev), 55 samples]
cpu0: 96.2% +/- 2.0%
cpu1: 99.7% +/- 0.2%
Overhead: [in % used of total CPU available]
netperf: 69.5%

CPU 1200 Mhz
Memory: 800 Mhz

Download:  93.02 Mbps
   Upload:  93.27 Mbps
  Latency: [in msec, 59 pings, 0.00% packet loss]
      Min:   8.360
    10pct:   8.597
   Median:   9.043
      Avg:   9.281
    90pct:  10.396
      Max:  11.201
 CPU Load: [in % busy (avg +/- std dev), 56 samples]
     cpu0:  72.4% +/-  4.1%
     cpu1:  96.3% +/-  2.0%
 Overhead: [in % used of total CPU available]
  netperf: 35.2%

CPU 1400 Mhz
Memory: 800 Mhz

Download:  91.16 Mbps
   Upload:  93.19 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:   8.197
    10pct:   8.349
   Median:   8.856
      Avg:   9.079
    90pct:   9.600
      Max:  16.938
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  66.2% +/-  6.5%
     cpu1:  88.3% +/-  3.1%
 Overhead: [in % used of total CPU available]
  netperf: 37.3%

Is the setup of the test correct? If so, what conclusions can I draw from the results?

Thanks! Erik

1 Like

I would say the test show that shaping 100/100 works on your router even clocked down to 400 MHz, assuming the frequency stayed constant during the test (standard deviations would be nice to have for frequency as already implemented for the load). But you probably should not run heavy applications like netperf when running at 400 MHz (with load in the high nineties you basically have no reserve cycles left for say the wifi driver).

Thanks for the feedback moeller0! 1200Mhz CPU + 800Mhz memory seems to be a good balance between avoiding additional cooling yet staying below 90% for most cores.

Wifi is not a problem because I didn´t plan to use it.

Targeted a netperf server outside of my LAN, adding two samples to the post.

Local netperf server netperf-eu.bufferbloat.net sample #1 netperf-eu.bufferbloat.net sample #2

Download: 93.02 Mbps
Upload: 93.27 Mbps
Latency: [in msec, 59 pings, 0.00% packet loss]
Min: 8.360
10pct: 8.597
Median: 9.043
Avg: 9.281
90pct: 10.396
Max: 11.201
CPU Load: [in % busy (avg +/- std dev), 56 samples]
cpu0: 72.4% +/- 4.1%
cpu1: 96.3% +/- 2.0%
Overhead: [in % used of total CPU available]
netperf: 35.2%

Download: 86.95 Mbps
Upload: 93.61 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
Min: 8.264
10pct: 8.405
Median: 8.834
Avg: 8.952
90pct: 9.562
Max: 10.668
CPU Load: [in % busy (avg +/- std dev), 56 samples]
cpu0: 80.0% +/- 5.4%
cpu1: 86.4% +/- 5.6%
Overhead: [in % used of total CPU available]
netperf: 49.0%

Download: 89.26 Mbps
Upload: 93.50 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
Min: 8.431
10pct: 8.602
Median: 9.231
Avg: 9.554
90pct: 10.538
Max: 14.159
CPU Load: [in % busy (avg +/- std dev), 56 samples]
cpu0: 82.2% +/- 3.9%
cpu1: 89.8% +/- 4.3%
Overhead: [in % used of total CPU available]
netperf: 51.6%

Using a local netperf server the CPU load is less balanced beteen both cores, and netperf used significantly less CPU (35% vs 50%).

The results still suggests its possible to get a rough indication of where the CPU limit lies for a givven CPU.

EDIT #1: fixed typo

1 Like

The AC68U runs pretty cool and I see no realistic difference in temperature in regular use between different frequencies. Did you notice overheating at 1400 ? I played around with these too, and my temps always remain around 61 C for regular use. Also, I can see sqm working fine for my 150/10 connection with default clocks. Bumping it up to 1400 MHz hasn't really shown any improvement in anything other than synthetic compute tests.

Reading at snb forums some people mentioned issues with high temperatures but I was surprised to find the temperature of my AC68U rev A2 staying around 65 degrees.

Running it at 1400 Mhz I noticed barely any change in temperature at all but if I remember correctly, the router got slightly more unstable getting a reboot once in a while. Yet feeling with my hand the router didn´t get any warmer. Tried adding a 120mm fan to cool the router but I noticed no big change in temperature but the router stabilized. So apparently some component got cooler, although the router never felt hot nor reported high temps at all, regardless of using fan or not. Makes me question the accuracy of using /sys/class/thermal/thermal_zone0/temp to get temps - or if the stability was impacted by something else.

Then I totally lost my mind and went directly for 1600 Mhz. The router still booted but then got into a loop after some time. Managed to restore settings using Asus Recovery Utility after a period of cool-down.

I tend to spend way too much time to reach marginal benefits which probably isn´t very noticeable in real world usage. All these rabbits holes are just too tempting..! :slight_smile:

Yes, I don't think any freq more than 1400 is supported. I tried increasing that too, and had a similar experience. Another factor is that since there is no wifi with lede, at least those parts are not even powered on presumably. So that also helps keep the overall temp in check. The other 68U that I'm using as an AP with merlin fork shows a higher temp of 72C for the cpu with 1200 MHz. Also, this is actually a pretty good cheap setup for < $100 with two AC68Us. I imagine it will handle sqm fine up to 300 Mbps.

Almost, one issue the grand average CPU load hides is that the CPU might be cyclically overloaded, say at 50% load over a 10 second test the CPU might be running at 100% for 5 seconds and zero for the other 5 seconds; in that case expect bad shaping performance. Now, having the average load is really much better than having no measure of load at all, but it is not exactly temporally high-resolved enough to show "micro-stalls".

Hope everyone enjoyed the holidays and all the best to you in the new year!

The code has been fairly stable and I've only found a couple of minor issues, for which I've updated the package version and link in the top post. I also added some guidance on installing and running speedtest.sh from a LAN-connected Linux server, which can be useful if your router is heavily CPU-bound when running the script.

@gechu Erik, well done posting your detailed test results! I didn't realize the AC68U was so capable (reaching ~180 Mbps aggregate) although it's a shame the wireless chipset is Broadcom. I see them on sale cheap occasionally but never looked too closely. Do you normally run it at fixed frequencies or was that only for testing? Is there a problem with the Linux frequency scaling driver perhaps? According to WikiDevi the default CPU speed is 800MHz so that might be safer (if true) than 1200MHz. You've got me curious about this device now...

Glad to see you tried putting your own upstream netperf server on the WAN side. This definitely gives you more flexibility for testing. If you haven't already, you might also try using a "downstream" server/VM connected to your router LAN port and running both netperf and speedtest.sh (see notes in top post). This would let you test the limits of forwarding/SQM on your router without any on-board test overhead.

From the on-board tests you did however, you are right to think you're within your normal CPU limits, given the netperf overhead.

BTW, on OpenWRT you're safer to install the .ipk package from my repo, since that will take care of dependencies, permissions and upgrades.

I was thinking the same, or just use half the budget for a cheap wireless AP with great wifi (e.g. Atheros) but not-so-great NAT/routing/SQM.

Take care!

1 Like

Anytime! Happy to share!

When setting cpu frequency I think it sets the upper cap and the frequency is allowed to scale up n down (hence not fixed).

800mhz is the default speed for the revision of the router I own. Asus released a model with a trailing "P", ac68p which runs at 1400mhz by default.

I experienced no issues running the cpu at 1200mhz which is a crazy 50% above default frequency. Even clocked memory from default 666 to 800.

Thanks for the hint how to install the speedtest package in a better way.

@guidosarducci thanks for you contribution and keep up the good work!

2 Likes

Is this going to be added to the official repo any time soon? I'm on 18.06.1 and can't find it. Also when I add your feed to the custom feed list it fails (others work just fine):

Downloading https://raw.github.com/guidosarducci/papal-repo/master/Packages.gz
Updated list of available packages in /var/opkg-lists/papal_repo
Downloading https://raw.github.com/guidosarducci/papal-repo/master/Packages.sig
Signature check failed.
Remove wrong Signature file.

Oh the new raw link is here:

https://github.com/guidosarducci/papal-repo/blob/master/speedtest_0.9-7_all.ipk?raw=true

it would be awesome if we could get an accompaning luci interface for this! thanks so much great program

1 Like

Sorry I missed your post -- looks like you found the details in the top post for directly downloading the package in any case. :slight_smile: I regularly use the repo custom feed so it definitely works. The "Signature check failed" message suggests there was a problem importing the repo public key. Try double checking my repo instructions for Online Install. And check if one of your keys in /etc/opkg/keys matches papal-repo.pub from the Github repo.

Yes, but there are still a couple of small updates I'm looking into before making a PR (and other PRs in queue). Since there's a direct package link as well as a custom feed, hopefully everyone who wants can get the package in the meantime.

Heh heh, that was one of the things I'm thinking about...

3 Likes

You might do a PR and get this accepted as a regular package to OpenWrt packages repo. Much easier to just download it from the main package repo.

I think that the package is good enough already :wink:

Download speed reported is inaccurate.
I have Gigabit ISP connection.
Upload speeds are closer to actual.
Any comments why?
My hardware is PC-Engine APU2. Running 18.06.

./speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --sequential
2019-02-16 22:17:51 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.............................................................
Download: 82.55 Mbps
Latency: [in msec, 61 pings, 0.00% packet loss]
Min: 15.361
10pct: 15.410
Median: 15.569
Avg: 15.689
90pct: 15.922
Max: 17.397
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 52 samples]
cpu0: 2.9% +/- 1.9% @ 698 MHz
cpu1: 9.8% +/- 2.8% @ 698 MHz
cpu2: 10.2% +/- 3.8% @ 704 MHz
cpu3: 13.3% +/- 4.4% @ 749 MHz
Overhead: [in % used of total CPU available]
netperf: 3.3%
.............................................................
Upload: 526.89 Mbps
Latency: [in msec, 61 pings, 0.00% packet loss]
Min: 15.434
10pct: 15.558
Median: 15.852
Avg: 16.245
90pct: 17.257
Max: 22.723
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 52 samples]
cpu0: 4.3% +/- 2.2% @ 722 MHz
cpu1: 11.8% +/- 2.9% @ 748 MHz
cpu2: 19.1% +/- 4.1% @ 732 MHz
cpu3: 20.1% +/- 4.3% @ 762 MHz
Overhead: [in % used of total CPU available]