Shaping performance

I would love to have an expert help us collect these measurements. But it's always hard to crowd-source measurements, because it's always variable whether the contributors "do it right"...

  • Collecting the stats you list (router name, cpu type, cpu mhz, up & down bandwidth, idle percent) are only relevant if the latency is in good control. (I can get 100% utilization of my ISP link if I don't care about latency...) It would be good to find a way to measure latency increase during the test.
  • As the author of the OpenWrt scripts, I know that it's not useful to run the speed test script on the router. The netperf process pushes a lot of data, and takes away from the ability to route packets/run SQM. So people would need to generate traffic on their laptop/desktop computer, not the router.
  • Maybe it is good enough to use a DSL Reports bufferbloat grade in the stats above. The process might be a) Run DSL Reports speed test; b) while it's active, monitor top -d 1 and record the maximum and minimum %idle value during the test.
  • We might collect all measurements, but discount/ignore stats where the bufferbloat grade is below, say a B.
  • Is there any value to collecting the ISP's nominal download & upload rates as well as measured rates?

@dlakelan Would you be able to write up a proposed procedure for setting up the test conditions and collecting the data? Thanks.

You make some good points, my assumption was that people had already tuned SQM to their liking so that bufferbloat was acceptable, and we're just trying to measure the cpu idle remaining when those settings are in place... but we should be somehow explicit about this. If you click the results link on a dslreports test you can actually read off the idle, download and upload ping time averages from the graph. Those would be good to collect too I guess.

max idle during a test won't be particularly interesting, as it'll be nearly 100% in the pauses between up and download.

It wouldn't hurt.

Great topic, great idea; could I recommend to also store the kernel version? And how about instructing users to measure performance with a number a shaper settings, like 50, 75, 90% of contracted bandwidth for both upstream and downstream? And maybe add a field to denote measurements from the same router, like hash of theac address?

The table you would like to see should look like this, right?
Can you provide some real world example values, just to see how well this table behaves.

RouterName CPUType CPUMHz kernel version BWmbpsdown BWmbpsup Idlepercent
example example example example example example example

I'd suggest we keep it as simple as possible, but perhaps in the thread, people can post how they tested with SQM settings, internet bandwidth, etc. If necessary, we could prompt for additional testing. Perhaps a limited number of knowledgeable people could review and "approve" the data before putting it in the table.

Although the forum is a good place for generating this data, it might be a good idea to put a more curated version in the official wiki. A lot of people will never make it here, but will see the wiki.

Obvious to us, but maybe not to everyone - it must be explicit that people test over wired ethernet. May need to occasionally watch out for people testing connections over 100mbps with a router that only has 100mbps ethernet hardware.

I do think the core info should be kept simple so that running this test isn't seen as a hassle and participation is relatively good.

The statistical model will let us estimate max routing and SQM performance for every device in the TOH so I'd suggest we add a field there for wiki purposes (Estimated max BW for SQM)

@tmomas, I run a custom shaper so I don't think I'm the best person to try to put example data in the table. perhaps @moeller0 can give us an example entry?

Here's a start as to the testing methodology instructions:

  1. Configure SQM until you achieve an acceptable latency under load using dslreports speed test. At least a B or better for bufferbloat. Please don't provide test results where SQM is not yet performing acceptably well, as we are attempting to measure the ability to route and shape well.

  2. ssh into the router and run top -d 1 at the prompt

  3. Using an x86 computer attached to the LAN run a dslreports speed test while watching the results of top. Find near the top of the screen where the "top" program outputs " % idle"

  4. During the speed test record the smallest % idle shown by top -d 1 (on a piece of paper or whatever)

  5. Enter your data into the table, using the dslreports bandwidth measurements and the smallest percent idle you observed, make sure to record whether your test computer was connected to the router by wired or wifi.

Mmh, actually I believe most users will want to know shaping limits while using WIFI, so it might be better to ask wheter LAN or WLAN was used in the first round and later extend to WLAN tests that do not show wlan bottlenecking... Maybe that is of secondary concern to @dlakelan and since he is volunteering his time...

I agree that people will want to know wireless shaping performance, but there will be two issues there, since the wifi connection speed can be the bottleneck as well as the CPU... But let's add a column for SpeedTestOnWiredWifi, which has either "wired" or "wifi"

any other thoughts?

EDIT: I edited my instructions above to include instructions about recording wired vs wifi client. @tmomas can we add the "SpeedTestOnWiredWifi" column to the table?

It's getting a bit crowded already... but here we go:

Router Model CPU Type CPU MHz kernel version BWmbps down BWmbps up Idle % Speed Test On Wired Wifi
example example example example example example example example

I think we're iterating toward a good solution here. I had envisioned that there could be a script that ran on the router to collect much of the info automatically. Specifically, model, CPU type/MHz, hash of MAC address, and other things are best left to a script that always does it right...

It might work like this:

  1. Download script to router
  2. Start script, which begins to periodically sample the idle percentage, remembering the minimum value
  3. Separately, the person starts DSL Reports Speed Test on their computer
  4. When the speed test completes, they hit Ctl-C. The script then prompts to enter several values: measured download, measured upload, bufferbloat grade from DSL Reports, contracted download, contracted upload
  5. Script then displays results in a standard format (CSV?) with this info:

Router Name: xxx
CPU type: xxx
CPU MHz: xxx
OpenWrt version: xxx
Kernel version: xxx
ISP Contracted Download (kbps): xxx
ISP Contracted Upload (kbps): xxx
Measured Download (kbps): xxx
Measured Upload (kbps): xxx
Bufferbloat grade: A/B/C/...
Min idle percent: xxx
Hash of router MAC address: xxx
Test duration (sec): xxx

The next step in the process would be to understand how to process/record that data into our system. Thanks!

2 Likes

Sound like a really good idea.

Yeah, having a script would be a good idea, the script could even measure the bandwidth actually in use (parse stats from /proc/net/dev). Ideally you could have the script just write an entry into a MySQL/Mariadb table in a database running in someone's container-based server. Then the instructions would look something like:

  1. Log into router
  2. run ./ourscript.sh
  3. From a computer connected to the router on the LAN run a dslreports speed test
  4. After speed test press C-c on router... which would either upload the data point to a web site or directly connect to a database server or something

anyone have suggestions for how to collect the data? There are tons of options: upload to a web page, direct connect to mariadb instance, graphite/carbon instance, blablabla if someone has something kind of already set up they can bolt on an additional thing to that might be a way to go.

1 Like

Love the script idea!

Same way as https://openwrt.org/docs/guide-user/perf_and_log/benchmark.openssl: Script outputs the numbers enclosed in "|", so that $user just has to copy this line and paste it into [forumwiki|OpenWrt wiki]

1 Like

This sounds sweet, but as a user I would pretty much like to actually feel more informed and more in control. So having a script is fine, but it really should create text output that the user needs to upload manually, so that it is easy to check that the information truly is "innocent" and that the user can still after collecting the data decide to not upload it. I see no real way tp do the same with an automatic upload, that will require way more trust that the cript does what it claims...

Maybe also add the sqm configured bandwidths?

The point about the script being a security issue still remains... logging into your router and running a script as root... who's to say it's not forking a background shell and wgetting a rootkit or detecting all your IP cameras and redirecting their video stream to a secret lair, or uploading a compromised kernel to all your fembots?

The fact is the script should be very small and easily auditable by anyone who knows basic shell stuff... tens of lines of very normal shell script would be best.

it could also at the end ask you "here's the data we've collected, do you want us to automatically upload this data?"

if we don't want an auto-upload then I agree with @tmomas just have it output the data separated by | characters and they can copy and paste to the table.

I fully agree.

I am with @tmomas on this one, require explicit user action seems like the safest option here.
Now, I guess the proposed script needs to be prototyped... not sure when I will get around to do that (not that I am great in shell scripting to beginn with).

@richb-hanover, @moeller0, I'm not a real expert in shell either, I can do basic stuff, but esp in restricted shells like provided by busybox as opposed to bash I probably would be pretty slow getting all the info parsed out correctly, but I do know some nontrivial lua. I could probably write a lua script that parses the data in /proc/net/dev and /proc/cpuinfo and soforth, one issue seems to be that lua won't sleep for sub-second resolution without additional libraries (main one that LEDE has is "socket" from "luasocket" package, but it's not by default included)

shell seems to be unable to sleep for less than 1 second either without coreutils-sleep package, so measuring bandwidth and idle percent with reliable timing for bandwidth calculations is probably not that easy without this package.

/proc/stat seems to be the place to get info that will give you idle percentage, documented here https://www.kernel.org/doc/Documentation/filesystems/proc.txt

in particular I think every second you look at the change in user, nice, system, idle, irq, and softirq, and then calculate idle/(user+nice+system+idle+irq+softirq) you get idle percent. A loop that waits 1 second and then re-calculates those and keeps track of the minimum would be pretty easy. This isn't strongly tied to accurate timing. However /proc/net/dev gives bytes rx and tx and the change in that divided by the wait-time is the bandwidth usage, and if wait time is in increments of 1s with nontrivial jitter then you're not going to get the ability to get better than say 10 to 20% accuracy in bandwidth measurements.

the way to get around this seems to be to use coreutils-sleep package which is capable of sleeping for an amount of time specified in floating point.

Shell arithmetic without floats will be less than optimal. What do you think of a lua script and a coreutils-sleep dependency?