Speedtest: new package to measure network performance

I think I figured out what goes wrong with the CPU utilization stats:
It does not take into account the CPU frequency scaling that is available on some targets, like ipq806x for R7800. Apparently the stats only read the utilisation at the current frequency.

Below is proof: first a run with the default "ondemand" CPU scaling governor, where CPUs idles at 384 MHz and under load can scale frequency up to 1700 MHz. Then a run with "performance" governor, where CPU is always at 1700 MHz.

On the first run the measured CPU load is 41/43%, while on the second run the reading is just 10/14% reflecting better the true utilisation of the CPU's power. On both runs the speeds and latency are identical, so the true CPU load is also identical.

This will be visible at least with ipq806x routers, plus likely also x86 ( and mvebu if you use the scaling driver patch).

root@router1:~# echo ondemand > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
root@router1:~# echo ondemand > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
root@router1:~# /tmp/speed.sh -c -H netperf-eu.bufferbloat.net
2018-11-07 22:05:16 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-eu.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
............................................................
 Download:  98.23 Mbps
   Upload:   8.34 Mbps
  Latency: (in msec, 57 pings, 0.00% packet loss)
      Min: 36.952
    10pct: 39.048
   Median: 46.507
      Avg: 45.769
    90pct: 49.395
      Max: 51.590
Processor: (in % busy, avg +/- stddev, 58 samples)
     cpu0: 41 +/-  4
     cpu1: 43 +/-  2
 Overhead: (in % total CPU used)
  netperf: 48


root@router1:~# echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
root@router1:~# echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
root@router1:~# /tmp/speed.sh -c -H netperf-eu.bufferbloat.net
2018-11-07 22:06:28 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-eu.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.............................................................
 Download:  98.23 Mbps
   Upload:   8.32 Mbps
  Latency: (in msec, 60 pings, 0.00% packet loss)
      Min: 10.915
    10pct: 40.384
   Median: 45.772
      Avg: 44.483
    90pct: 48.308
      Max: 49.213
Processor: (in % busy, avg +/- stddev, 59 samples)
     cpu0: 10 +/-  3
     cpu1: 14 +/-  3
 Overhead: (in % total CPU used)
  netperf: 33

CPU speed stats:

image

1 Like

I thought I would give it a try.. I just installed and ran the first command I saw... Getting 0 Zero's on the speedtest.. The hostname resolves. This is on a 3200acm.

root@lede:~# speedtest.sh -4 -H netperf-west.bufferbloat.net -p 1.1.1.1 --concurrent
2018-11-07 16:31:50 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
.
 Download:   0.00 Mbps
   Upload:   0.00 Mbps
  Latency: (in msec, 1 pings, 0.00% packet loss)
      Min: 13.650 
    10pct: 0.000 
   Median: 0.000 
      Avg: 13.650 
    90pct: 0.000 
      Max: 13.650
Processor: (in % busy, avg +/- stddev, -1 samples)
 Overhead: (in % total CPU used)
  netperf:  0
root@lede:~# nslookup netperf-west.bufferbloat.net
Server:		127.0.0.1
Address:	127.0.0.1#53

Name:      netperf-west.bufferbloat.net
netperf-west.bufferbloat.net	canonical name = flent-fremont.bufferbloat.net
Name:      flent-fremont.bufferbloat.net
Address 1: 23.239.20.41

EDIT

Thought I'd check since dependent on netperf, and it is there.

root@lede:~# opkg list-installed |grep netperf
netperf - 2.7.0-1
root@lede:~# which netperf
/usr/bin/netperf
1 Like

Thanks for testing that! I've written the script to be as POSIX-compatible as possible, with only the netperf dependency, but didn't have older systems or VMs handy to test against.

@hnyman Thanks, Hannu, for the thorough testing and also posting the detailed results. I'm catching up to your posts en masse so will consolidate my observations.

Yes, on seeing your results, the smoking gun for me was having similar aggregate network throughputs, but with different CPU utilization. This is equivalent to taking much more time to do the same amount of work, and commonly observed benchmarking different speed CPUs. So I think the root problem, as you suggest, is inconsistent CPU frequency scaling.

I expect this has been going on for a long time on R7800 and similar devices, but we've highlighted it again due to the script's CPU monitoring.

A fair question. The script itself is pretty reliable as it actually does no measurements itself, but rather relies on core Linux functionality and other utilities, while calculating summaries of the results. That means:

  1. CPU usage is based on the standard Linux cumulative statistics found in /proc/stat and /proc/<pid>/stat, which is the same information presented by our usual top and htop.

  2. Latency calculation uses the results of ping, the same as the manual tests you made above.

  3. Throughput calculations use the results output from netperf.

I see this very rarely too, during the course of much testing, and it boils down to a problem with netperf. The script kicks off netperf instances configured for a 60-second test, and then waits for them to complete.

If netperf fails to start, that can yield the zeros seen by @davidc502. Similarly, netperf can timeout after a while, or simply take a long time to start up and complete, which can lead to the long runs seen by @hnyman.

It's hard to reproduce (maybe some network problems or the servers?) so I don't precisely know the root cause, but I'll take a look at improving the logging in the script. If either of you can remember/compare notes on the circumstances around the long runs or zero results, please let me know.

@davidc502 Thanks for giving it a try too. Did you manage to successfully retry afterwards?

Thanks again to everyone for the invaluable feedback.

Actually, the root problem is reading the CPU utilisation percentage without adjusting for the current CPU frequency.

So, looks like the CPU utilisation stats will be wrong for the frequency-scaling devices (as the stats do not reflect the true utilisation of the full CPU capacity). There might be something mentioned in the help texts / advice, as I guess that this will hit many users and the underlying reason to the high CPU stats is not obvious.

Others have stumbled into this utlisation reporting inconsistency, and there is e.g. this kind of discussion:

This CPU utilisation is a side-track for the high-power devices, and really useful for CPU-constrained old devices, so the stats itself is good to have. So, I do not think that you need to spend time with trying to fix the calculation, but there should be some advice about the phenomenom.

In general, your package is nice.

1 Like

Yes, the utilization by definition is a time-based "duty cycle", and so independent of whether a Commodore64 or Cray-1 is doing the work. And I'd be wary of fudging it to create a new synthetic parameter (as I've seen some suggest in the past) and creating inconsistency with top and htop.

True, I agree the stat could provide good discriminating power with these older devices. Let me look into the feasibility of adding the stat without over-complicating things. I'm thinking of a simple average freq. per core over the test duration.

That's much appreciated. Many hands make lighter work, so thanks for your time testing and troubleshooting.

Best regards...

I've tried several times with no results.

I put the script into debug mode and ran it again. Maybe there is a clue as to what may be happening in the output.

root@lede:~# speedtest.sh
+ TESTHOST=netperf.bufferbloat.net
+ TESTDUR=60
+ PINGHOST=gstatic.com
+ MAXSESSIONS=5
+ TESTPROTO=-4
+ TESTSEQ=1
+ '[' 0 -gt 0 ]
+ date '+%Y-%m-%d %H:%M:%S'
+ DATE='2018-11-08 07:19:05'
+ echo '2018-11-08 07:19:05 Starting speedtest for 60 seconds per transfer session.'
2018-11-08 07:19:05 Starting speedtest for 60 seconds per transfer session.
+ echo 'Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.'
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
+ echo -n 'Download and upload sessions are '
Download and upload sessions are + '[' '1 ' -eq 1 ]
+ echo -n sequential,
sequential,+ echo ' each with 5 simultaneous streams.'
 each with 5 simultaneous streams.
+ trap kill_background_and_exit HUP INT TERM
+ '[' 1 -eq 1 ]
+ measure_direction Download
+ mktemp /tmp/netperfUL.XXXXXX
+ ULFILE=/tmp/netperfUL.pBCDEJ
+ mktemp /tmp/netperfDL.XXXXXX
+ DLFILE=/tmp/netperfDL.PcBAao
+ mktemp /tmp/measurepings.XXXXXX
+ PINGFILE=/tmp/measurepings.EMeaka
+ mktemp /tmp/measureload.XXXXXX
+ LOADFILE=/tmp/measureload.OafFaP
+ DIRECTION=Download
+ dots_pid=3881
+ '[' -4 -eq -4 ]
+ + ping_pid=3882
print_dots
+ + load_pid=3883
+ '[' Download '=' Bidirectional ]
+ spd_test=TCP_MAERTS
+ start_netperf TCP_MAERTS /tmp/netperfDL.PcBAao
:
+ printf .
+ seq 5
.+ sleep 1s
+ sample_load
+ cat /proc/3875/stat
+ wait_netperf
+ ping gstatic.com
+ + + netperf -4+ netperf -4 -H netperf.bufferbloat.net -t TCP_MAERTS -l 60 -v 0 -P 0
 -H netperf.bufferbloat.net -tpgrep TCP_MAERTS+  -l -P 60 3875 -v netperf 0
 -P 0
netperf+ netperf -4 -H netperf.bufferbloat.net -t TCP_MAERTS -l 60 -v 0 -P 0
+ :
+ sleep 1s
 -4 -H netperf.bufferbloat.net -t TCP_MAERTS -l 60 -v 0 -P 0
netperf -4 -H netperf.bufferbloat.net -t TCP_MAERTS -l 60 -v 0 -P 0
+ wait 3886
+ wait 3887
+ wait 3889
+ wait 3890
+ wait 3891
+ kill_load
+ kill -9 3883
+ wait 3883
+ load_pid=0
+ kill_pings
+ kill -9 3882
+ wait 3882
+ ping_pid=0
+ kill_dots
+ kill -9 3881
+ wait 3881
+ dots_pid=0
+ echo

+ '[' Download '=' Bidirectional ]
+ summarize_speed Download /tmp/netperfDL.PcBAao
+ awk '{s+=$1} END {print s}' /tmp/netperfDL.PcBAao
+ printf '%9s: %6.2f Mbps\n' Download 0
 Download:   0.00 Mbps
+ summarize_pings /tmp/measurepings.EMeaka
+ grep -v PING
+ sort -n
+ awk '
BEGIN {numdrops=0; numrows=0;}
{
        if ( $0 ~ /timeout/ ) {
                numdrops += 1;
        } else {
                numrows += 1;
                arr[numrows]=$1; sum+=$1;
        }
}
END {
        pc10="-"; pc90="-"; med="-";
        if (numrows == 0) {numrows=1}
        if (numrows>=10) {
                ix=int(numrows/10); pc10=arr[ix]; ix=int(numrows*9/10);pc90=arr[ix];
                if (numrows%2==1) med=arr[(numrows+1)/2]; else med=(arr[numrows/2]);
        }
        pktloss = numdrops/(numdrops+numrows) * 100;
        printf("  Latency: (in msec, %d pings, %4.2f%% packet loss)\n      Min: %4.3f \n    10pct: %4.3f \n   Median: %4.3f \n      Avg: %4.3f \n    90pct: %4.3f \n      Max: %4.3f\n", numrows, pktloss, arr[1], pc10, med, sum/numrows, pc90, arr[numrows] )
}'+ sed 's/^.*time=\([^ ]*\) ms/\1/'

  Latency: (in msec, 1 pings, 0.00% packet loss)
      Min: 46.035
    10pct: 0.000
   Median: 0.000
      Avg: 46.035
    90pct: 0.000
      Max: 46.035
+ summarize_load /tmp/measureload.OafFaP
+ cat /tmp/measureload.OafFaP /proc/3875/stat+ awk '
# total CPU of speedtest processes
$1 !~ /cpu/ {
        tot=$16+$17
        if (init_proc_cpu=="") init_proc_cpu=tot
        proc_cpu=tot-init_proc_cpu
}
# track aggregate CPU stats
$1 == "cpu" {
        tot=0; for (f=2;f<=NF;f++) tot+=$f
        if (init_cpu=="") init_cpu=tot
        tot_cpu=tot-init_cpu
}
# track per-CPU stats
$1 ~ /cpu[0-9]+/ {
        tot=0; for (f=2;f<=NF;f++) tot+=$f
        usg = tot - $5
        if (init_tot[$1]=="") {
                init_tot[$1]=tot
                init_usg[$1]=usg
                cpus[num_cpus++]=$1
        }
        if (last_tot[$1]>0) {
                sum_usg_2[$1] += ((usg-last_usg[$1])/(tot-last_tot[$1]))^2
        }
        last_tot[$1]=tot
        last_usg[$1]=usg
}
END {
        num_samp=(NR-2)/(num_cpus+1)-1
        printf("Processor: (in %% busy, avg +/- stddev, %d samples)\n", num_samp)
        for (i=0;i<num_cpus;i++) {
                c=cpus[i]
                if (num_samp>0) {
                        avg_usg=(last_tot[c]-init_tot[c])
                        avg_usg=avg_usg>0 ? (last_usg[c]-init_usg[c])/avg_usg : 0
                        std_usg=sum_usg_2[c]/num_samp-avg_usg^2
                        std_usg=std_usg>0 ? sqrt(std_usg) : 0
                        printf("%9s: %2.f +/- %2.f\n", c, avg_usg*100, std_usg*100)
                }
        }
        printf(" Overhead: (in %% total CPU used)\n")
        printf("%9s: %2.f\n", "netperf", tot_cpu>0 ? proc_cpu/tot_cpu*100 : 0)
}'

Processor: (in % busy, avg +/- stddev, -1 samples)
 Overhead: (in % total CPU used)
  netperf:  0
+ rm -f /tmp/netperfDL.PcBAao
+ rm -f /tmp/netperfUL.pBCDEJ
+ rm -f /tmp/measurepings.EMeaka
+ rm -f /tmp/measureload.OafFaP
+ measure_direction Upload
+ mktemp /tmp/netperfUL.XXXXXX
+ ULFILE=/tmp/netperfUL.LaaFDo
+ mktemp /tmp/netperfDL.XXXXXX
+ DLFILE=/tmp/netperfDL.OCHbEp
+ mktemp /tmp/measurepings.XXXXXX
+ PINGFILE=/tmp/measurepings.PFafBN
+ mktemp /tmp/measureload.XXXXXX
+ LOADFILE=/tmp/measureload.HDkBkj
+ DIRECTION=Upload
+ dots_pid=3909
+ '[' -4 -eq -4 ]
+ + ping_pid=3910
print_dots
+ load_pid=3911
+ '[' Upload '=' Bidirectional ]
+ spd_test=TCP_STREAM
+ start_netperf TCP_STREAM /tmp/netperfDL.OCHbEp
+ :
+ printf+ sample_load
+ cat /proc/3875/stat
 .
.+ sleep 1s+ ping gstatic.com

+ seq 5
+ wait_netperf
+ + :
+ sleep 1s
+ + netperf+ pgrep -P 3875 netperf
 -4netperf -H netperf.bufferbloat.net -4 -t -H TCP_STREAM -l 60 netperf.bufferbloat.net+ netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 60 -v 0 -P 0
 -v -t 0 TCP_STREAM -P -l 0 60
 -v 0 -P 0
netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 60 -v 0 -P 0
+ netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 60 -v 0 -P 0
+ wait 3915
+ wait 3916
+ wait 3917
+ wait 3918
+ wait 3919
+ kill_load
+ kill -9 3911
+ wait 3911
+ load_pid=0
+ kill_pings
+ kill -9 3910
+ wait 3910
+ ping_pid=0
+ kill_dots
+ kill -9 3909
+ wait 3909
+ dots_pid=0
+ echo

+ '[' Upload '=' Bidirectional ]
+ summarize_speed Upload /tmp/netperfDL.OCHbEp
+ awk '{s+=$1} END {print s}' /tmp/netperfDL.OCHbEp
+ printf '%9s: %6.2f Mbps\n' Upload 0
   Upload:   0.00 Mbps
+ summarize_pings /tmp/measurepings.PFafBN
+ grep -v PING
+ sort -n
+ awk '
BEGIN {numdrops=0; numrows=0;}
{
        if ( $0 ~ /timeout/ ) {
                numdrops += 1;
        } else {
                numrows += 1;
                arr[numrows]=$1; sum+=$1;
        }
}
END {
        pc10="-"; pc90="-"; med="-";
        if (numrows == 0) {numrows=1}
        if (numrows>=10) {
                ix=int(numrows/10); pc10=arr[ix]; ix=int(numrows*9/10);pc90=arr[ix];
                if (numrows%2==1) med=arr[(numrows+1)/2]; else med=(arr[numrows/2]);
        }
        pktloss = numdrops/(numdrops+numrows) * 100;
        printf("  Latency: (in msec, %d pings, %4.2f%% packet loss)\n      Min: %4.3f \n    10pct: %4.3f \n   Median: %4.3f \n      Avg: %4.3f \n    90pct: %4.3f \n      Max: %4.3f\n", numrows, pktloss, arr[1], pc10, med, sum/numrows, pc90, arr[numrows] )
}'
+ sed 's/^.*time=\([^ ]*\) ms/\1/'
  Latency: (in msec, 1 pings, 0.00% packet loss)
      Min: 62.481
    10pct: 0.000
   Median: 0.000
      Avg: 62.481
    90pct: 0.000
      Max: 62.481
+ summarize_load /tmp/measureload.HDkBkj
+ cat /tmp/measureload.HDkBkj /proc/3875/stat
+ awk '
# total CPU of speedtest processes
$1 !~ /cpu/ {
        tot=$16+$17
        if (init_proc_cpu=="") init_proc_cpu=tot
        proc_cpu=tot-init_proc_cpu
}
# track aggregate CPU stats
$1 == "cpu" {
        tot=0; for (f=2;f<=NF;f++) tot+=$f
        if (init_cpu=="") init_cpu=tot
        tot_cpu=tot-init_cpu
}
# track per-CPU stats
$1 ~ /cpu[0-9]+/ {
        tot=0; for (f=2;f<=NF;f++) tot+=$f
        usg = tot - $5
        if (init_tot[$1]=="") {
                init_tot[$1]=tot
                init_usg[$1]=usg
                cpus[num_cpus++]=$1
        }
        if (last_tot[$1]>0) {
                sum_usg_2[$1] += ((usg-last_usg[$1])/(tot-last_tot[$1]))^2
        }
        last_tot[$1]=tot
        last_usg[$1]=usg
}
END {
        num_samp=(NR-2)/(num_cpus+1)-1
        printf("Processor: (in %% busy, avg +/- stddev, %d samples)\n", num_samp)
        for (i=0;i<num_cpus;i++) {
                c=cpus[i]
                if (num_samp>0) {
                        avg_usg=(last_tot[c]-init_tot[c])
                        avg_usg=avg_usg>0 ? (last_usg[c]-init_usg[c])/avg_usg : 0
                        std_usg=sum_usg_2[c]/num_samp-avg_usg^2
                        std_usg=std_usg>0 ? sqrt(std_usg) : 0
                        printf("%9s: %2.f +/- %2.f\n", c, avg_usg*100, std_usg*100)
                }
        }
        printf(" Overhead: (in %% total CPU used)\n")
        printf("%9s: %2.f\n", "netperf", tot_cpu>0 ? proc_cpu/tot_cpu*100 : 0)
}'
Processor: (in % busy, avg +/- stddev, -1 samples)
 Overhead: (in % total CPU used)
  netperf:  0
+ rm -f /tmp/netperfDL.OCHbEp
+ rm -f /tmp/netperfUL.LaaFDo
+ rm -f /tmp/measurepings.PFafBN
+ rm -f /tmp/measureload.HDkBkj
1 Like

Looking this over I don't think it is a script issue. Netperf has an issue.

I've tried several variations of netperf commands and always get the following error.

"resolve_host: could not resolve host"

I run dnscrypt-proxy on port 5300, so I'm thinking netperf is expecting dns on port 53. This may be why the script is displaying 0's ....

*EDIT -- I do have the dns forward in place, so I'm now not sure.

1 Like

Thanks for sticking with this, David!

I agree there's no script issue I can see here, and it seems like a name resolution problem. Do you see problems with DDNS or NTP for example, which are also on-router services?

Could you try debugging DNS and netperf using the commands below on your router? I've included the output from my system for comparison if you could post yours as well.

First a few nslookup:

# nslookup netperf.bufferbloat.net
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:      netperf.bufferbloat.net
netperf.bufferbloat.net canonical name = netperf.richb-hanover.com
Name:      netperf.richb-hanover.com
netperf.richb-hanover.com       canonical name = atl.richb-hanover.com
Name:      atl.richb-hanover.com
Address 1: 23.226.232.80
netperf.bufferbloat.net canonical name = netperf.richb-hanover.com
netperf.richb-hanover.com       canonical name = atl.richb-hanover.com

and

# nslookup gstatic.com
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:      gstatic.com
Address 1: 172.217.1.163
Address 2: 2607:f8b0:400b:809::2003

and finally netperf with debug enabled:

# netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 10 -d

resolve_host called with host 'netperf.bufferbloat.net' port '(null)' family AF_INET
getaddrinfo returned the following for host 'netperf.bufferbloat.net' port '(null)'  family AF_INET
        cannonical name: 'atl.richb-hanover.com'
        flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
        sa_family: AF_INET sadata: 0 0 23 226 232 80 0 0 0 0 0 0 0 0 0 0
scan_omni_args called with the following argument vector
netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 10 -d
sizeof(omni_request_struct)=200/648
sizeof(omni_response_struct)=204/648
sizeof(omni_results_struct)=284/648
Program name: netperf
Local send alignment: 8
Local recv alignment: 8
Remote send alignment: 8
Remote recv alignment: 8
Local socket priority: -1
Remote socket priority: -1
Local socket TOS: cs0
Remote socket TOS: cs0
Report local CPU 0
Report remote CPU 0
Verbosity: 1
Debug: 1
Port: 12865
Test name: TCP_STREAM
Test bytes: 0 Test time: 10 Test trans: 0
Host name: netperf.bufferbloat.net

installing catcher for all signals
Could not install signal catcher for sig 32, errno 22
Could not install signal catcher for sig 33, errno 22
Could not install signal catcher for sig 34, errno 22
Could not install signal catcher for sig 128, errno 22
remotehost is netperf.bufferbloat.net and port 12865
resolve_host called with host 'netperf.bufferbloat.net' port '12865' family AF_INET
getaddrinfo returned the following for host 'netperf.bufferbloat.net' port '12865'  family AF_INET
        cannonical name: 'atl.richb-hanover.com'
        flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
        sa_family: AF_INET sadata: 50 65 23 226 232 80 0 0 0 0 0 0 0 0 0 0
resolve_host called with host '0.0.0.0' port '0' family AF_INET
getaddrinfo returned the following for host '0.0.0.0' port '0'  family AF_INET
        cannonical name: '0.0.0.0'
        flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
        sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
establish_control called with host 'netperf.bufferbloat.net' port '12865' remfam AF_INET
                local '0.0.0.0' port '0' locfam AF_INET
bound control socket to 0.0.0.0 and 0
successful connection to remote netserver at netperf.bufferbloat.net and 12865
complete_addrinfo using hostname netperf.bufferbloat.net port 0 family AF_INET type SOCK_STREAM prot IPPROTO_TCP flags 0x0
getaddrinfo returned the following for host 'netperf.bufferbloat.net' port '0'  family AF_INET
        cannonical name: 'atl.richb-hanover.com'
        flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
        sa_family: AF_INET sadata: 0 0 23 226 232 80 0 0 0 0 0 0 0 0 0 0
local_data_address not set, using local_host_name of '0.0.0.0'
complete_addrinfo using hostname 0.0.0.0 port 0 family AF_UNSPEC type SOCK_STREAM prot IPPROTO_TCP flags 0x1
getaddrinfo returned the following for host '0.0.0.0' port '0'  family AF_UNSPEC
        cannonical name: '0.0.0.0'
        flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
        sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0) port 0 AF_INET to atl.richb-hanover.com (23.) port 0 AF_INET : demo
create_data_socket: socket 4 obtained...
netperf: get_sock_buffer: send socket size determined to be 16384
netperf: get_sock_buffer: receive socket size determined to be 87380
send_omni_inner: 2 entry send_ring obtained...
recv_response: received a 656 byte response
remote listen done.
remote port is 34436
About to start a timer for 10 seconds.
netperf: get_sock_buffer: receive socket size determined to be 341760
netperf: get_sock_buffer: send socket size determined to be 376320
disconnect_data_socket sock 4 init 1 do_close 1 protocol 6
Adjusting elapsed_time by 0 seconds
recv_response: received a 656 byte response
remote results obtained
calculate_confidence: itr  1; time 10.296365; res  4.979503
                               lcpu -1.000000; rcpu -1.000000
                               lsdm -1.000000; rsdm -1.000000
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

174760  16384  16384    10.30       4.98
shutdown_control: shutdown of control connection requested.

@hnyman @davidc502 et al

Following up from my earlier post, I looked into the possibility of getting CPU frequency stats in addition to the normal processor utilization. Mostly for fun and curiosity...

In general, it's unnecessary since the % busy by itself is sufficient to determine whether a system is CPU-bound, independent of frequency scaling. That's because if one observes e.g. 98% average usage over 60 seconds, it means the maximum frequency upscaling (which kicks in at 95% even for the conservative default) will have already been reached. I also think it turns speedtest.sh into more of a processor monitoring tool than it's intended purpose of a latency-under-load tool. But I'll have to mull it over more...

But the fun and interesting stuff is a couple of examples below, run on an 8-core Ubuntu system.

First, a short concurrent test across the local LAN:

$ sh speedtest.sh -H 192.168.1.1 -p 192.168.1.1 -c -t 10
2018-11-09 01:52:40 Starting speedtest for 10 seconds per transfer session.
Measure speed to 192.168.1.1 (IPv4) while pinging 192.168.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
...........
 Download:  32.66 Mbps
   Upload:  33.09 Mbps
  Latency: (in msec, 11 pings, 0.00% packet loss)
      Min: 1.770
    10pct: 1.770
   Median: 303.000
      Avg: 249.706
    90pct: 365.000
      Max: 436.000
Processor: (as avg +/- stddev % busy @ avg freq, 9 samples)
     cpu0:  8% +/-  4%  @ 1529 MHz
     cpu1:  9% +/-  5%  @ 1502 MHz
     cpu2: 11% +/-  3%  @ 1449 MHz
     cpu3:  7% +/-  4%  @ 1396 MHz
     cpu4:  4% +/-  2%  @ 1862 MHz
     cpu5:  0% +/-  0%  @ 1197 MHz
     cpu6:  1% +/-  1%  @ 1303 MHz
     cpu7:  0% +/-  1%  @ 1223 MHz
 Overhead:  2% total CPU used by netperf

Next, a fun loopback test on the same server:

$ sh speedtest.sh -H 127.0.0.1 -p 127.0.0.1 -c -t 10
2018-11-09 01:25:12 Starting speedtest for 10 seconds per transfer session.
Measure speed to 127.0.0.1 (IPv4) while pinging 127.0.0.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
..........
 Download: 16248.00 Mbps
   Upload: 16722.80 Mbps
  Latency: (in msec, 11 pings, 0.00% packet loss)
      Min: 0.038
    10pct: 0.038
   Median: 0.061
      Avg: 0.065
    90pct: 0.082
      Max: 0.100
Processor: (as avg +/- stddev % busy @ avg freq, 8 samples)
     cpu0: 100% +/-  0%  @ 2794 MHz
     cpu1: 100% +/-  0%  @ 2794 MHz
     cpu2: 100% +/-  0%  @ 2794 MHz
     cpu3: 100% +/-  0%  @ 2794 MHz
     cpu4: 100% +/-  0%  @ 2794 MHz
     cpu5: 100% +/-  0%  @ 2794 MHz
     cpu6: 100% +/-  0%  @ 2794 MHz
     cpu7: 100% +/-  0%  @ 2794 MHz
 Overhead: 53% total CPU used by netperf

I like the last one. :wink:

A quick update: I've updated the OP with a new link for a speedtest version that also measures the average CPU frequency, as in the examples above.

This also includes some README and commentary updates, and will likely be the version for PR to openwrt-packages as "1.0".

@davidc502 Did you have any success understanding the DNS problem you encountered with netperf? Were the test suggestions above any help? That one's still bothering me...

1 Like

I made further tests with R7800, CPU scaling disabled (full CPU frequency). I ran the test (and netperf) in another device in the network. I looked at the router's CPU utilisation with htop during the test. I made an interesting observation:

The high CPU usage is caused by the Cake qdisc.

  • Without SQM: 99/8 Mbit/s with 44 ms latency, no CPU utilisation
  • with simple fq_codel: 90/7 Mbit/s with 15 ms latency, low CPU utilisation
  • with cake layer_cake: 88/8 Mbit/s with 16 ms latency, high CPU utilisation
  • with cake piece_cake: 89/7 Mbit/s with 16 ms latency, high CPU utilisation
1 Like

I see this very rarely too, during the course of much testing, and it boils down to a problem with netperf . The script kicks off netperf instances configured for a 60-second test, and then waits for them to complete.

If netperf fails to start, that can yield the zeros seen by @davidc502. Similarly, netperf can timeout after a while, or simply take a long time to start up and complete, which can lead to the long runs seen by @hnyman.

It's hard to reproduce (maybe some network problems or the servers?) so I don't precisely know the root cause, but I'll take a look at improving the logging in the script. If either of you can remember/compare notes on the circumstances around the long runs or zero results, please let me know.

I run the netperf-east server. I know that if the netperf server on my end goes down, the speedtest will return zeros. (I've had outages in the last month, and I think that netperf-west may also have: that may explain these reports of zero speeds.)

Perhaps the speedtest script could detect the "0.00 Mbps" state, and offer a warning that perhaps the server is off-line.

1 Like

I used betterspeedtest.sh to test connectivity between netperf-east (netperf.bufferbloat.net) and the two other publicly available netperf servers: netperf-west and netperf-eu. (There's no way I can test this from home: the best service I can get is DSL at 7mbps/768kbps. Praise be to fq_codel/cake!)

richb@netperf:~/src/OpenWrtScripts$ sh betterspeedtest.sh -t 30 -H netperf-west.bufferbloat.net
2018-11-11 08:46:26 Testing against netperf-west.bufferbloat.net (ipv4) with 5 simultaneous sessions while pinging gstatic.com (30 seconds in each direction)
...............................
 Download:  372.22 Mbps
  Latency: (in msec, 31 pings, 0.00% packet loss)
      Min: 0.974
    10pct: 1.000
   Median: 1.070
      Avg: 1.113
    90pct: 1.260
      Max: 1.520
...............................
   Upload:  752.77 Mbps
  Latency: (in msec, 31 pings, 0.00% packet loss)
      Min: 1.010
    10pct: 1.070
   Median: 1.480
      Avg: 3.565
    90pct: 7.430
      Max: 15.400
richb@netperf:~/src/OpenWrtScripts$ sh betterspeedtest.sh -t 30 -H netperf-eu.bufferbloat.net
2018-11-11 08:47:37 Testing against netperf-eu.bufferbloat.net (ipv4) with 5 simultaneous sessions while pinging gstatic.com (30 seconds in each direction)
................................
 Download:  603.21 Mbps
  Latency: (in msec, 32 pings, 0.00% packet loss)
      Min: 1.000
    10pct: 1.020
   Median: 1.130
      Avg: 1.179
    90pct: 1.330
      Max: 1.560
..................................
   Upload:  402.07 Mbps
  Latency: (in msec, 34 pings, 0.00% packet loss)
      Min: 0.988
    10pct: 0.993
   Median: 1.030
      Avg: 1.132
    90pct: 1.170
      Max: 3.110
richb@netperf:
1 Like

I'm going to do some more testing tonight.

Here are some more results. With these results I stopped forwarding to dnscrypt proxy and just forwarded to 1.1.1.1 and 1.0.0.1. Even so, netperf refuses to work.

root@lede:/usr/bin# speedtest.sh
2018-11-11 20:46:19 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.
 Download:   0.00 Mbps
  Latency: (in msec, 1 pings, 0.00% packet loss)
      Min: 35.123 
    10pct: 0.000 
   Median: 0.000 
      Avg: 35.123 
    90pct: 0.000 
      Max: 35.123
Processor: (in % busy, avg +/- stddev, -1 samples)
 Overhead: (in % total CPU used)
  netperf:  0
.
   Upload:   0.00 Mbps
  Latency: (in msec, 1 pings, 0.00% packet loss)
      Min: 16.721 
    10pct: 0.000 
   Median: 0.000 
      Avg: 16.721 
    90pct: 0.000 
      Max: 16.721
Processor: (in % busy, avg +/- stddev, -1 samples)
 Overhead: (in % total CPU used)
  netperf:  0
root@lede:/usr/bin# speedtest.sh
2018-11-11 20:46:22 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf.bufferbloat.net (IPv4) while pinging gstatic.com.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.
 Download:   0.00 Mbps
  Latency: (in msec, 1 pings, 0.00% packet loss)
      Min: 19.562 
    10pct: 0.000 
   Median: 0.000 
      Avg: 19.562 
    90pct: 0.000 
      Max: 19.562
Processor: (in % busy, avg +/- stddev, -1 samples)
 Overhead: (in % total CPU used)
  netperf:  0
....................................................................................................................................
   Upload:   0.00 Mbps
  Latency: (in msec, 133 pings, 0.00% packet loss)
      Min: 12.133 
    10pct: 12.649 
   Median: 13.677 
      Avg: 15.950 
    90pct: 14.354 
      Max: 66.922
Processor: (in % busy, avg +/- stddev, 130 samples)
     cpu0:  8 +/-  9
     cpu1:  7 +/-  8
 Overhead: (in % total CPU used)
  netperf:  0

nslookup

root@lede:/usr/bin# nslookup netperf.bufferbloat.net
Server:		216.165.129.158
Address:	216.165.129.158#53

Name:      netperf.bufferbloat.net
netperf.bufferbloat.net	canonical name = netperf.richb-hanover.com
Name:      netperf.richb-hanover.com
netperf.richb-hanover.com	canonical name = atl.richb-hanover.com
Name:      atl.richb-hanover.com
Address 1: 23.226.232.80
netperf.bufferbloat.net	canonical name = netperf.richb-hanover.com
netperf.richb-hanover.com	canonical name = atl.richb-hanover.com

and

root@lede:/usr/bin# nslookup gstatic.com
Server:		216.165.129.158
Address:	216.165.129.158#53

Name:      gstatic.com
Address 1: 64.233.185.120
Address 2: 64.233.185.94
Address 3: 2607:f8b0:4002:c09::5e

netperf with debug enabled

root@lede:/usr/bin# netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 10 -d
resolve_host called with host 'netperf.bufferbloat.net' port '(null)' family AF_INET
getaddrinfo returned the following for host 'netperf.bufferbloat.net' port '(null)'  family AF_INET
	cannonical name: 'atl.richb-hanover.com'
	flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
	sa_family: AF_INET sadata: 0 0 23 226 232 80 0 0 0 0 0 0 0 0 0 0
scan_omni_args called with the following argument vector
netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 10 -d 
sizeof(omni_request_struct)=200/648
sizeof(omni_response_struct)=204/648
sizeof(omni_results_struct)=284/648
Program name: netperf
Local send alignment: 8
Local recv alignment: 8
Remote send alignment: 8
Remote recv alignment: 8
Local socket priority: -1
Remote socket priority: -1
Local socket TOS: cs0
Remote socket TOS: cs0
Report local CPU 0
Report remote CPU 0
Verbosity: 1
Debug: 1
Port: 12865
Test name: TCP_STREAM
Test bytes: 0 Test time: 10 Test trans: 0
Host name: netperf.bufferbloat.net

installing catcher for all signals
Could not install signal catcher for sig 32, errno 22
Could not install signal catcher for sig 33, errno 22
Could not install signal catcher for sig 34, errno 22
Could not install signal catcher for sig 65, errno 22
remotehost is netperf.bufferbloat.net and port 12865
resolve_host called with host 'netperf.bufferbloat.net' port '12865' family AF_INET
getaddrinfo returned the following for host 'netperf.bufferbloat.net' port '12865'  family AF_INET
	cannonical name: 'atl.richb-hanover.com'
	flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
	sa_family: AF_INET sadata: 50 65 23 226 232 80 0 0 0 0 0 0 0 0 0 0
resolve_host called with host '0.0.0.0' port '0' family AF_INET
getaddrinfo returned the following for host '0.0.0.0' port '0'  family AF_INET
	cannonical name: '0.0.0.0'
	flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
	sa_family: AF_INET sadata: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
establish_control called with host 'netperf.bufferbloat.net' port '12865' remfam AF_INET
		local '0.0.0.0' port '0' locfam AF_INET
bound control socket to 0.0.0.0 and 0
successful connection to remote netserver at netperf.bufferbloat.net and 12865
complete_addrinfo using hostname netperf.bufferbloat.net port 0 family AF_INET type SOCK_STREAM prot IPPROTO_TCP flags 0x0
getaddrinfo returned the following for host 'netperf.bufferbloat.net' port '0'  family AF_INET
	cannonical name: 'atl.richb-hanover.com'
	flags: 0 family: AF_INET: socktype: SOCK_STREAM protocol IPPROTO_TCP addrlen 16
	sa_family: AF_INET sadata: 0 0 23 226 232 80 0 0 0 0 0 0 0 0 0 0
local_data_address not set, using local_host_name of '0.0.0.0'
complete_addrinfo using hostname 0.0.0.0 port 0 family AF_UNSPEC type SOCK_STREAM prot IPPROTO_TCP flags 0x1
complete_addrinfo: could not resolve '0.0.0.0' port '0' af 0
	getaddrinfo returned -11 System error

Seems it still can't resolve the hostname.

1 Like

Some results on my gigabit connection, no SQM and using 8430.

OpenWrt SNAPSHOT r8430-4d5b0efc09 / LuCI Master (git-18.311.58259-40de466)

I also have the following in /etc/rc.local to help with cpu scaling using ondemand governor:

echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

root@OpenWrt:~# speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1 --concurrent
2018-11-11 18:41:07 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are concurrent, each with 5 simultaneous streams.
............................................................
 Download:  85.27 Mbps
   Upload: 770.63 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.970
    10pct:  25.491
   Median:  36.041
      Avg:  37.778
    90pct:  48.356
      Max:  79.930
 CPU Load: [in % busy (avg +/- std dev), 55 samples]
     cpu0:  97.4% +/-  0.9%
     cpu1:  97.4% +/-  0.0%
 Overhead: [in % total CPU used]
  netperf: 46.2%

root@OpenWrt:~# speedtest.sh -H netperf-west.bufferbloat.net -p 1.1.1.1
2018-11-11 18:47:50 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-west.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are sequential, each with 5 simultaneous streams.
............................................................
 Download: 906.98 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  14.153
    10pct:  14.474
   Median:  15.279
      Avg:  16.254
    90pct:  18.377
      Max:  32.466
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:  77.5% +/-  0.0%
     cpu1:  47.9% +/-  3.9%
 Overhead: [in % total CPU used]
  netperf: 49.5%
............................................................
   Upload: 815.07 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  14.229
    10pct:  15.731
   Median:  29.835
      Avg:  33.368
    90pct:  52.642
      Max:  83.194
 CPU Load: [in % busy (avg +/- std dev), 55 samples]
     cpu0:  92.7% +/-  3.5%
     cpu1:  98.6% +/-  0.0%
 Overhead: [in % total CPU used]
  netperf: 54.4%

root@OpenWrt:~# speedtest.sh -H netperf-east.bufferbloat.net -p 1.1.1.1
2018-11-11 18:56:18 Starting speedtest for 60 seconds per transfer session.
Measure speed to netperf-east.bufferbloat.net (IPv4) while pinging 1.1.1.1.
Download and upload sessions are sequential, each with 5 simultaneous streams.
.............................................................
 Download: 493.72 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  13.705
    10pct:  13.997
   Median:  14.227
      Avg:  14.573
    90pct:  15.825
      Max:  17.088
 CPU Load: [in % busy (avg +/- std dev), 58 samples]
     cpu0:  47.1% +/- 14.3%
     cpu1:  24.0% +/- 10.5%
 Overhead: [in % total CPU used]
  netperf: 32.5%
............................................................
   Upload: 592.40 Mbps
  Latency: [in msec, 61 pings, 0.00% packet loss]
      Min:  14.145
    10pct:  17.219
   Median:  36.497
      Avg:  35.704
    90pct:  50.932
      Max:  67.179
 CPU Load: [in % busy (avg +/- std dev), 57 samples]
     cpu0:  73.8% +/-  8.1%
     cpu1:  97.7% +/-  0.0%
 Overhead: [in % total CPU used]
  netperf: 24.6%

Thanks!

1 Like

In the tests above, I tried using my ISP's assigned DNS servers. But I also tried 1.1.1.1 and 1.0.0.1.

That is why you see the IP of 216.165.129.158.

I checked the box - "Use DNS servers advertised by peer"

Thanks,
David

1 Like

My thanks to everyone for the continued and thoughtful feedback. After catching up following a brief absence...

That's really great as follow-up! I was going to suggest running speedtest.sh off the router to take its on-router load out of the equation. I'm afraid the CAKE CPU comes as not too much of a surprise, as @dtaht has been pointing out for a little while. As additional context, could you provide a ballpark for the "low CPU" and "high CPU" figures?

That's a good suggestion, and it's been bothering me since realizing there's little error checking for netperf. I'll look into making an update that does two things:

  1. Check netperf return status and warn/abort if detected.
  2. Check elapsed time-to-complete for all netperf processes and warn/abort if too far out of bounds.

These two items should catch the "0.0" speeds as well as the "run extra long" situations.

@davidc502 Thanks for sticking with the troubleshooting. Your DNS service does seem to work, both with nslookup and netperf itself resolving netperf.bufferbloat.net (note the correct IP 23.226.232.80 in sadata output below):

The problem you're seeing seems related to the local host name lookup. Refer to this snippet below from my working netperf example above:

And compare it to the corresponding error output in your posted netperf debug log:

Does any of this ring a bell with respect to your DNS/resolution setup? The error code appears to be EAGAIN from errno.h. @hnyman Something you've seen before perhaps?

One further suggestion to help narrow things down. If you could try what @hnyman did, install netperf on a Linux box in your LAN, and then run the same "debug" netperf command from there:

netperf -4 -H netperf.bufferbloat.net -t TCP_STREAM -l 10 -d

If this does work, then it might point to some discrepency between your on-router vs. LAN DNS resolution.

@egross Thanks for your results! Those are very interesting. Nice to see consistent speeds in the gigabit range, and also the potential variation among netperf servers (as @richb-hanover-priv also highlighted).

I notice you see aggregate throughput around 800 or 900 Mbps, whether testing sequentially or concurrent. It's not obvious if that's related to your network link or CPU exhaustion. Your CPU usage is high, but you have "headroom" available by taking netperf out of the picture (i.e. normal operation). If you could run speedtest.sh from a Linux server on your LAN with netperf installed, that would provide some good additional data.

One other odd thing I noticed is that you are tweaking the CPU frequency scaling on your 2-core router, but speedtest.sh doesn't find any CPU frequency information to display from /proc/cpuinfo. Are you certain scaling is working for you? Could you post the output of your /proc/cpuinfo file for my benefit?

Thanks again everyone for the feedback and testing!

If I remember right, it would be something like
low = 10-15%,
high = 90% on one core, 15% on the other

So, cake caused one core to be rather fully utilised.

Thanks for the clarification. I honestly didn't think the CAKE overhead was that high from the little A/B testing I did in the past, but note that was done by "eyeballing" top and without the benefit of this script. I'm hoping @dtaht could share his observations...

@hnyman @egross Could you please confirm seeing CPU average frequency on your multi-core/freq. scaling boxes, since my last update to the script added this capability?

I've successfully tested on multi-core Ubuntu and single-core (no freq. scaling) OpenWrt, but don't have an OpenWrt router that does frequency scaling to check on. Thanks again!