Speedtest: new package to measure network performance


#41

It shows this way in ipq806x R7800:

root@router1:/tmp# ./speedtest.sh
2018-12-18 21:56:55 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.............................................................
 Download:  80.41 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  11.478
    10pct:  11.572
   Median:  11.838
      Avg:  11.901
    90pct:  12.298
      Max:  12.686
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  35.3% +/- 17.8%  @ 1283 MHz
     cpu1:  35.9% +/- 19.3%  @ 1260 MHz
 Overhead: [in % used of total CPU available]
  netperf: 21.0%
.............................................................
   Upload:   9.19 Mbps
  Latency: [in msec, 62 pings, 0.00% packet loss]
      Min:  11.489
    10pct:  11.617
   Median:  12.308
      Avg:  12.382
    90pct:  13.118
      Max:  14.770
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:  16.1% +/- 13.4%  @  967 MHz
     cpu1:  14.9% +/- 14.9%  @  866 MHz
 Overhead: [in % used of total CPU available]
  netperf:  2.5%
root@router1:/tmp# ./speedtest.sh -c
2018-12-18 21:59:09 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  80.40 Mbps
   Upload:   8.18 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  11.409
    10pct:  11.758
   Median:  12.505
      Avg:  12.708
    90pct:  13.526
      Max:  17.203
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  38.4% +/- 19.2%  @ 1294 MHz
     cpu1:  41.2% +/- 15.6%  @ 1297 MHz
 Overhead: [in % used of total CPU available]
  netperf: 20.8%

Ubiquiti EdgeRouter X, Loading OpenWrt and performance numbers
#42

Hi! I´ve done some very basic benchmarks to figure out if an Asus AC68U (ARM dual core @ 800MHz) has enough CPU to manage SQM in a 100 / 100 mbit connection. The aim is to use VLAN in the router as well, but I forgot to also try it out during the tests.

SQM settings in OpenWRT (18.06.1) was set to 100 000 for both upload and download. Queue Discipline was set to Cake and Piece of Cake.

The test setup consists of a local netperf server in my LAN which the Asus AC68U router was connected to from its WAN port. Not sure I get realistic results from limiting the throughput using SQM parameters in the router compared applying the limitations further upstream.

Finally I installed the "speetest package" from: https://github.com/guidosarducci/packages/tree/master-add-speedtest/net/speedtest/files

I then ran this command using different CPU frequencies : speedtest.sh -H -p 1.1.1.1 --concurrent

Below are test results:

CPU 400 Mhz
Memory: 800 Mhz

Download: 93.20 Mbps
Upload: 93.96 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
Min: 8.280
10pct: 8.966
Median: 10.038
Avg: 10.225
90pct: 11.475
Max: 16.441
CPU Load: [in % busy (avg +/- std dev), 55 samples]
cpu0: 96.2% +/- 2.0%
cpu1: 99.7% +/- 0.2%
Overhead: [in % used of total CPU available]
netperf: 69.5%

CPU 1200 Mhz
Memory: 800 Mhz

Download:  93.02 Mbps
   Upload:  93.27 Mbps
  Latency: [in msec, 59 pings, 0.00% packet loss]
      Min:   8.360
    10pct:   8.597
   Median:   9.043
      Avg:   9.281
    90pct:  10.396
      Max:  11.201
 CPU Load: [in % busy (avg +/- std dev), 56 samples]
     cpu0:  72.4% +/-  4.1%
     cpu1:  96.3% +/-  2.0%
 Overhead: [in % used of total CPU available]
  netperf: 35.2%

CPU 1400 Mhz
Memory: 800 Mhz

Download:  91.16 Mbps
   Upload:  93.19 Mbps
  Latency: [in msec, 60 pings, 0.00% packet loss]
      Min:   8.197
    10pct:   8.349
   Median:   8.856
      Avg:   9.079
    90pct:   9.600
      Max:  16.938
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  66.2% +/-  6.5%
     cpu1:  88.3% +/-  3.1%
 Overhead: [in % used of total CPU available]
  netperf: 37.3%

Is the setup of the test correct? If so, what conclusions can I draw from the results?

Thanks! Erik


#43

I would say the test show that shaping 100/100 works on your router even clocked down to 400 MHz, assuming the frequency stayed constant during the test (standard deviations would be nice to have for frequency as already implemented for the load). But you probably should not run heavy applications like netperf when running at 400 MHz (with load in the high nineties you basically have no reserve cycles left for say the wifi driver).


#44

Thanks for the feedback moeller0! 1200Mhz CPU + 800Mhz memory seems to be a good balance between avoiding additional cooling yet staying below 90% for most cores.

Wifi is not a problem because I didn´t plan to use it.

Targeted a netperf server outside of my LAN, adding two samples to the post.

Local netperf server netperf-eu.bufferbloat.net sample #1 netperf-eu.bufferbloat.net sample #2

Download: 93.02 Mbps
Upload: 93.27 Mbps
Latency: [in msec, 59 pings, 0.00% packet loss]
Min: 8.360
10pct: 8.597
Median: 9.043
Avg: 9.281
90pct: 10.396
Max: 11.201
CPU Load: [in % busy (avg +/- std dev), 56 samples]
cpu0: 72.4% +/- 4.1%
cpu1: 96.3% +/- 2.0%
Overhead: [in % used of total CPU available]
netperf: 35.2%

Download: 86.95 Mbps
Upload: 93.61 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
Min: 8.264
10pct: 8.405
Median: 8.834
Avg: 8.952
90pct: 9.562
Max: 10.668
CPU Load: [in % busy (avg +/- std dev), 56 samples]
cpu0: 80.0% +/- 5.4%
cpu1: 86.4% +/- 5.6%
Overhead: [in % used of total CPU available]
netperf: 49.0%

Download: 89.26 Mbps
Upload: 93.50 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
Min: 8.431
10pct: 8.602
Median: 9.231
Avg: 9.554
90pct: 10.538
Max: 14.159
CPU Load: [in % busy (avg +/- std dev), 56 samples]
cpu0: 82.2% +/- 3.9%
cpu1: 89.8% +/- 4.3%
Overhead: [in % used of total CPU available]
netperf: 51.6%

Using a local netperf server the CPU load is less balanced beteen both cores, and netperf used significantly less CPU (35% vs 50%).

The results still suggests its possible to get a rough indication of where the CPU limit lies for a givven CPU.

EDIT #1: fixed typo


#45

The AC68U runs pretty cool and I see no realistic difference in temperature in regular use between different frequencies. Did you notice overheating at 1400 ? I played around with these too, and my temps always remain around 61 C for regular use. Also, I can see sqm working fine for my 150/10 connection with default clocks. Bumping it up to 1400 MHz hasn't really shown any improvement in anything other than synthetic compute tests.


#46

Reading at snb forums some people mentioned issues with high temperatures but I was surprised to find the temperature of my AC68U rev A2 staying around 65 degrees.

Running it at 1400 Mhz I noticed barely any change in temperature at all but if I remember correctly, the router got slightly more unstable getting a reboot once in a while. Yet feeling with my hand the router didn´t get any warmer. Tried adding a 120mm fan to cool the router but I noticed no big change in temperature but the router stabilized. So apparently some component got cooler, although the router never felt hot nor reported high temps at all, regardless of using fan or not. Makes me question the accuracy of using /sys/class/thermal/thermal_zone0/temp to get temps - or if the stability was impacted by something else.

Then I totally lost my mind and went directly for 1600 Mhz. The router still booted but then got into a loop after some time. Managed to restore settings using Asus Recovery Utility after a period of cool-down.

I tend to spend way too much time to reach marginal benefits which probably isn´t very noticeable in real world usage. All these rabbits holes are just too tempting..! :slight_smile:


#47

Yes, I don't think any freq more than 1400 is supported. I tried increasing that too, and had a similar experience. Another factor is that since there is no wifi with lede, at least those parts are not even powered on presumably. So that also helps keep the overall temp in check. The other 68U that I'm using as an AP with merlin fork shows a higher temp of 72C for the cpu with 1200 MHz. Also, this is actually a pretty good cheap setup for < $100 with two AC68Us. I imagine it will handle sqm fine up to 300 Mbps.


#48

Almost, one issue the grand average CPU load hides is that the CPU might be cyclically overloaded, say at 50% load over a 10 second test the CPU might be running at 100% for 5 seconds and zero for the other 5 seconds; in that case expect bad shaping performance. Now, having the average load is really much better than having no measure of load at all, but it is not exactly temporally high-resolved enough to show "micro-stalls".


#49

Hope everyone enjoyed the holidays and all the best to you in the new year!

The code has been fairly stable and I've only found a couple of minor issues, for which I've updated the package version and link in the top post. I also added some guidance on installing and running speedtest.sh from a LAN-connected Linux server, which can be useful if your router is heavily CPU-bound when running the script.

@gechu Erik, well done posting your detailed test results! I didn't realize the AC68U was so capable (reaching ~180 Mbps aggregate) although it's a shame the wireless chipset is Broadcom. I see them on sale cheap occasionally but never looked too closely. Do you normally run it at fixed frequencies or was that only for testing? Is there a problem with the Linux frequency scaling driver perhaps? According to WikiDevi the default CPU speed is 800MHz so that might be safer (if true) than 1200MHz. You've got me curious about this device now...

Glad to see you tried putting your own upstream netperf server on the WAN side. This definitely gives you more flexibility for testing. If you haven't already, you might also try using a "downstream" server/VM connected to your router LAN port and running both netperf and speedtest.sh (see notes in top post). This would let you test the limits of forwarding/SQM on your router without any on-board test overhead.

From the on-board tests you did however, you are right to think you're within your normal CPU limits, given the netperf overhead.

BTW, on OpenWRT you're safer to install the .ipk package from my repo, since that will take care of dependencies, permissions and upgrades.

I was thinking the same, or just use half the budget for a cheap wireless AP with great wifi (e.g. Atheros) but not-so-great NAT/routing/SQM.

Take care!


#50

Anytime! Happy to share!

When setting cpu frequency I think it sets the upper cap and the frequency is allowed to scale up n down (hence not fixed).

800mhz is the default speed for the revision of the router I own. Asus released a model with a trailing "P", ac68p which runs at 1400mhz by default.

I experienced no issues running the cpu at 1200mhz which is a crazy 50% above default frequency. Even clocked memory from default 666 to 800.

Thanks for the hint how to install the speedtest package in a better way.

@guidosarducci thanks for you contribution and keep up the good work!


#51

Is this going to be added to the official repo any time soon? I'm on 18.06.1 and can't find it. Also when I add your feed to the custom feed list it fails (others work just fine):

Downloading https://raw.github.com/guidosarducci/papal-repo/master/Packages.gz
Updated list of available packages in /var/opkg-lists/papal_repo
Downloading https://raw.github.com/guidosarducci/papal-repo/master/Packages.sig
Signature check failed.
Remove wrong Signature file.

#52

Oh the new raw link is here:

https://github.com/guidosarducci/papal-repo/blob/master/speedtest_0.9-7_all.ipk?raw=true

it would be awesome if we could get an accompaning luci interface for this! thanks so much great program


#53

Sorry I missed your post -- looks like you found the details in the top post for directly downloading the package in any case. :slight_smile: I regularly use the repo custom feed so it definitely works. The "Signature check failed" message suggests there was a problem importing the repo public key. Try double checking my repo instructions for Online Install. And check if one of your keys in /etc/opkg/keys matches papal-repo.pub from the Github repo.

Yes, but there are still a couple of small updates I'm looking into before making a PR (and other PRs in queue). Since there's a direct package link as well as a custom feed, hopefully everyone who wants can get the package in the meantime.

Heh heh, that was one of the things I'm thinking about...


#54

You might do a PR and get this accepted as a regular package to OpenWrt packages repo. Much easier to just download it from the main package repo.

I think that the package is good enough already :wink:


#55

Download speed reported is inaccurate.
I have Gigabit ISP connection.
Upload speeds are closer to actual.
Any comments why?
My hardware is PC-Engine APU2. Running 18.06.

./speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --sequential
2019-02-16 22:17:51 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.............................................................
Download: 82.55 Mbps
Latency: [in msec, 61 pings, 0.00% packet loss]
Min: 15.361
10pct: 15.410
Median: 15.569
Avg: 15.689
90pct: 15.922
Max: 17.397
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 52 samples]
cpu0: 2.9% +/- 1.9% @ 698 MHz
cpu1: 9.8% +/- 2.8% @ 698 MHz
cpu2: 10.2% +/- 3.8% @ 704 MHz
cpu3: 13.3% +/- 4.4% @ 749 MHz
Overhead: [in % used of total CPU available]
netperf: 3.3%
.............................................................
Upload: 526.89 Mbps
Latency: [in msec, 61 pings, 0.00% packet loss]
Min: 15.434
10pct: 15.558
Median: 15.852
Avg: 16.245
90pct: 17.257
Max: 22.723
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 52 samples]
cpu0: 4.3% +/- 2.2% @ 722 MHz
cpu1: 11.8% +/- 2.9% @ 748 MHz
cpu2: 19.1% +/- 4.1% @ 732 MHz
cpu3: 20.1% +/- 4.3% @ 762 MHz
Overhead: [in % used of total CPU available]


#56

This looks fantastic , Am keen to try it out when it hits the main repo.

(Some time down the road for next major release. No urgency whatsoever :wink:). On the discussion around avoiding skewing the numbers / overloading the cpu, one possible way (Non-Trivial) might be to create a similar interface to what Speedtest uses, where the client downloads the script into their browser , it executed, then it pushes the results back to the http server by the browser. Then the server side adds this datapoint into all the rest it has and displays the result from there back to the client.. this would allow possibility of some cool stuff like grouping speed test numbers by connected device /mac , WiFi channel Id, and time of day. These things obviously are outside the true benchmarking problem, but are nonetheless still relevant for real world usage. I’ve only dabbled in this kind of thing for small tasks/mini projects at work , but I’d be interested in helping anyone that wants to explore something like this. In my view , having both metrics would the the best, on-router and on-client , with all the data able to be displayed together in a neat and pretty luci interface.


#57

Looks like I am not CPU-bound so why the results are inaccurate?
Only @egross seems to have hit the right values.
Wonder what I am doing different?
Is it about distance from servers?


#58

In a Speedtests like this you basically determine the minimum of the available bandwidth and CPU cycles of the remote end, bandwidth and CPU cycles of the local end, and the available bandwidth along the network path the test packets are traveling over. Unfortunately, the only one you could rule out is CPU cycles of the local end...
In reality the Backend of the buffer lost testing network, payed for by volunteers out of their own pocket, simply is not prepared for reliably testing gigabit links. Partly this is because IMHO given that servers often are only connected at 1gbps themselves, one would need to only allow a maximum number of concurrent Speedtests to make sure the server's bandwidth is not the bottleneck. Something that for example ookla's speedtest.net also struggles with. I also note that Speedtest.net also is not the most reliable for fast links (but at least there you can try to find server locations with a good reputation for fast links, but I digress)


#59

That's a fair point. I suspect remote server being bottleneck as well, at least some times.
To rule that out I have setup a dedicated server with 1 Gbps and good cpu. On this server I am always getting upload speed 800 Mbps+. So I believe I should see at least that much when I am running speedtest from the router.
I shall post my results soon using this server.
However I should mention that, so far running simultaneous curl downloads large file from this server hasn't given promising results. Hopefully netperf should provide a better result.


#60

Results with the dedicated pvt server.
While download speed result improved , upload speed result dropped significantly; compared to netperf-west.bufferbloat.net.
PS: Same (pvt server) server reports 800+ Mbps when tested with speedtest.net in browser.
Results are consistent across tests.

Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
Download: 329.24 Mbps
Latency: [in msec, 61 pings, 0.00% packet loss]
Min: 15.352
10pct: 15.417
Median: 15.530
Avg: 15.559
90pct: 15.721
Max: 16.127
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 51 samples]
cpu0: 9.1% +/- 2.8% @ 759 MHz
cpu1: 29.0% +/- 6.8% @ 793 MHz
cpu2: 24.9% +/- 5.1% @ 779 MHz
cpu3: 33.1% +/- 6.7% @ 811 MHz
Overhead: [in % used of total CPU available]
netperf: 9.3%
.............................................................
Upload: 129.49 Mbps
Latency: [in msec, 61 pings, 0.00% packet loss]
Min: 15.177
10pct: 15.224
Median: 15.366
Avg: 15.377
90pct: 15.518
Max: 15.806
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 52 samples]
cpu0: 3.7% +/- 2.1% @ 700 MHz
cpu1: 13.9% +/- 5.0% @ 734 MHz
cpu2: 3.3% +/- 2.0% @ 701 MHz
cpu3: 19.4% +/- 4.5% @ 754 MHz
Overhead: [in % used of total CPU available]
netperf: 1.4%

=====================

Update:
With 8 simultaneous streams I was able to hit:
Download: 577.23 Mbps
Upload: 230.73 Mbps

Results are now getting closer to actual. Thanks for your support, folks!