Transparent Cake Box

moeller0 · September 16, 2017, 8:25am

That looks good. I guess the only obvious improvement would be to specify the correct per-packet-overhead, which for fiber links unfortunately is somewhat hard to get right... (that and testing whether internal host isolation works*)

*) you do not actually need to run flent on multiple internal hosts; as long as you have one flent-capable host simply use the dslreports speedtest (configured for multiple streams on the windows hosts, make sure to extend the test duration to 30 seconds or more). Then expect the flent measured throughput to scale back to the (1/concurrently active hosts) fraction whenever the windows hosts have their speedtests running. BTW https://www.dslreports.com/forum/speedtestbinary gives command line clients for the dslreports speedtest, making testing a bit simpler.

Good luck & Best Regards

RanocchiO · September 16, 2017, 6:45pm

Could you explain me how to read those values? I'd like to be able to evaluate it.

I will try this way, much simpler than setting up multiple virtual machines to run flent.

Many thanks
Lorenzo

moeller0 · September 17, 2017, 6:28pm

Ciao Lorenzo,

Well, I can try to comment the few I look at:

    This is the most important one as this sort of summarizes how you ended up configuring cake. Since it says "raw" instead of "overhead NN" I believe that you did not configure the per packet overhead explicitly and hence mentioned it.

    The fact that there were drops tells me the shaper is doing its job...

    AFAIK, this shows the peak delay cake induced, it looks quite nice.

    This tells me only very few of your data flows use ECN, as otherwise there would be less drops and more marks.

And this shows that you do not suffer from large meta-packages (from GRO or GSO) as otherwise max_len would be larger, 1514 is typical for MTU 1500 packets as the linuk kernel skb structure will add the size of the ethetype and the two ethernet MAC addresses (the kernel fills these fields and hence 1514 is true from the kernel's perspective, even though it is not really suitable for any shaper).

I hope that helps a bit.

Best Regards

RanocchiO · September 18, 2017, 3:21pm

Thank you, I'll check my home setup' results now that I know what to look for.

Lorenzo

RanocchiO · September 19, 2017, 10:02am

Hi moeller0,
I tried running multiple dslr test in parallel and got unexpected results: no fair sharing and increased bufferbloat, so I'd like to try the dual-cake setup.

I think the right configuration is to set ingress or egress for both interfaces, because they face opposite direction (please correct me if I'm wrong), but I don't know which is better and how to apply dual_xxxsource options.

Many thanks
Lorenzo

moeller0 · September 19, 2017, 10:42am

Since you use flent, could you post the RRUL_CS8 all plot here in the thread and annotate at what times the other windows host ran their speedtests, please?

Correct, but ingress requires an IFB which incurrs some processing cost, so on case of using two interfaces, always instantiate sqm-scripts on egress (by setting the ingress bandwidth to 0, which denotes "do not shape" as "shape to 0" would end up with a non-functional link...)

Hope that helps

Best Regards

RanocchiO · September 19, 2017, 10:52am

I didn't keep the results, next time I'll try both configurations and post all the info.

Bye
Lorenzo

moeller0 · September 19, 2017, 11:06am

Just to confirm this; flent will automatically save a data file (even if you just requested a plot). So unless you actively deleted that file it should be somewhere on your linux machine, most likely in the directory from which you called flent.
The name would be (for a hypothetical rrul test performed at 2017-06-06):

rrul_cs8-2017-06-06T235936.159804.$YOURNAME.flent.gz

Maybe we are lucky

Best Regards

RanocchiO · September 19, 2017, 1:10pm

I deleted everything, too bad results
Dual queue setup seems better: http://www.dslreports.com/speedtest/22049399

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 90494382844 bytes 124737125 pkt (dropped 0, overlimits 0 requeues 10)
backlog 0b 0p requeues 10
maxpacket 1514 drop_overlimit 0 new_flow_count 26 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-sqm root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 800f: dev eth0.1 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-dsthost rtt 100.0ms raw
Sent 334242905 bytes 378520 pkt (dropped 10576, overlimits 466115 requeues 0)
backlog 0b 0p requeues 0
memory used: 1411200b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 0us 775us 382us
av_delay 0us 44us 61us
sp_delay 0us 8us 11us
pkts 0 388540 556
bytes 0 350124323 33914
way_inds 0 11158 0
way_miss 0 9256 21
way_cols 0 0 0
drops 0 10576 0
marks 0 1 0
sp_flows 0 1 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 90

qdisc cake 8011: dev eth0.2 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-srchost rtt 100.0ms raw
Sent 203765278 bytes 366162 pkt (dropped 1734, overlimits 260102 requeues 0)
backlog 0b 0p requeues 0
memory used: 185472b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 0us 2.1ms 263us
av_delay 0us 139us 30us
sp_delay 0us 9us 12us
pkts 0 366987 909
bytes 0 206316066 56386
way_inds 0 20898 0
way_miss 0 10093 14
way_cols 0 0 0
drops 0 1734 0
marks 0 0 0
sp_flows 0 1 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 167

qdisc mq 0: dev wlan0 root
Sent 4059264 bytes 14492 pkt (dropped 0, overlimits 0 requeues 179)
backlog 0b 0p requeues 179
qdisc fq_codel 0: dev wlan0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 49300 bytes 404 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 4009964 bytes 14088 pkt (dropped 0, overlimits 0 requeues 179)
backlog 0b 0p requeues 179
maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0

I'll post flent results in the next days, now I need to work!

Lorenzo

RanocchiO · September 29, 2017, 1:28pm

Hi all,
some news from my tests about link layer adaptation: I found an optimal value of 12 bytes by trial and error, I applied it only to the WAN iface.

rrul_cs8

double-layer-cake-lla12-all-scaled

Now I'm trying to get fair sharing, but running 2 parallel flent test from different machines against the same server (flent-london.bufferbloat.net) gave me this results:

parallell rrul_cs8

lla-12 is from the previous test
double-layer-cake-lla12-box-totals

What's going wrong?

p.s.
I have the flent results if it can be useful

Thanks
L.

moeller0 · September 29, 2017, 1:51pm

Interesting, could you post the output of "cat /etc/config/sqm", "tc -d qdisc" and "tc -s qdisc" again please?

Not sure, the bandwidth sharing looks roughly okay, but the latency skyrockets. I wonder, could you repeat that test again with both shapers set to 15000? Ingress shaping is a bit approximate and will generally need more playroom the more flows you have. The thing is qdisc shapers traditionally shape their output to the desired rate, but that most of the time there are more packets coming in that are dropped, but for ingress shaping that behaviour is not ideal. Cake's principal author believes he has a solution for that (by making cake attempt to shape its incoming rate) but that is still in testing.

Best Regards

RanocchiO · September 29, 2017, 2:26pm

I will try lowering the shapers, for now the output you asked for:

cat /etc/config/sqm

config queue 'eth1'
option ingress_ecn 'ECN'
option itarget 'auto'
option etarget 'auto'
option linklayer 'none'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option eqdisc_opts 'dual-dsthost'
option upload '18944'
option qdisc 'cake'
option script 'layer_cake.qos'
option download '0'
option interface 'eth0.1'
option enabled '1'

config queue 'eth2'
option ingress_ecn 'ECN'
option itarget 'auto'
option etarget 'auto'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option egress_ecn 'NOECN'
option qdisc_really_really_advanced '1'
option interface 'eth0.2'
option upload '18944'
option qdisc 'cake'
option script 'layer_cake.qos'
option eqdisc_opts 'dual-srchost'
option download '0'
option enabled '1'
option linklayer 'ethernet'
option overhead '12'

tc -d qdisc

qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc noqueue 0: dev br-sqm root refcnt 2
qdisc cake 802b: dev eth0.1 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-dsthost rtt 100.0ms raw
qdisc cake 802d: dev eth0.2 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-srchost rtt 100.0ms raw
linklayer ethernet overhead 12
qdisc mq 0: dev wlan0 root
qdisc fq_codel 0: dev wlan0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev wlan0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev wlan0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
qdisc fq_codel 0: dev wlan0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn

tc -s qdisc

qdisc noqueue 0: dev lo root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 294566352341 bytes 428909789 pkt (dropped 0, overlimits 0 requeues 61)
backlog 0b 0p requeues 61
maxpacket 1514 drop_overlimit 0 new_flow_count 496 ecn_mark 1
new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-sqm root refcnt 2
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc cake 802b: dev eth0.1 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-dsthost rtt 100.0ms raw
Sent 4117621179 bytes 4669927 pkt (dropped 51697, overlimits 5501306 requeues 0)
backlog 0b 0p requeues 0
memory used: 1538208b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 0us 13.9ms 427us
av_delay 0us 2.9ms 58us
sp_delay 0us 11us 12us
pkts 0 4696334 25290
bytes 0 4188830147 1533464
way_inds 0 378899 0
way_miss 0 155289 103
way_cols 0 0 0
drops 0 51697 0
marks 0 144 0
sp_flows 0 2 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 460

qdisc cake 802d: dev eth0.2 root refcnt 2 bandwidth 18944Kbit diffserv3 dual-srchost rtt 100.0ms raw
Sent 1549530083 bytes 3993040 pkt (dropped 3802, overlimits 1695814 requeues 0)
backlog 0b 0p requeues 0
memory used: 401184b of 4Mb
capacity estimate: 18944Kbit
Bulk Best Effort Voice
thresh 1184Kbit 18944Kbit 4736Kbit
target 15.3ms 5.0ms 5.0ms
interval 110.3ms 100.0ms 10.0ms
pk_delay 78.1ms 417us 237us
av_delay 8.6ms 23us 20us
sp_delay 8us 9us 9us
pkts 14841 3915078 66923
bytes 9174840 1510343623 35764375
way_inds 0 287762 0
way_miss 6 167085 123
way_cols 0 0 0
drops 459 2571 772
marks 0 49 0
sp_flows 0 0 0
bk_flows 0 1 0
un_flows 0 0 0
max_len 1526 1526 1526

qdisc mq 0: dev wlan0 root
Sent 5486303 bytes 19662 pkt (dropped 0, overlimits 0 requeues 204)
backlog 0b 0p requeues 204
qdisc fq_codel 0: dev wlan0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 66106 bytes 504 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 5420197 bytes 19158 pkt (dropped 0, overlimits 0 requeues 204)
backlog 0b 0p requeues 204
maxpacket 66 drop_overlimit 0 new_flow_count 1 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc fq_codel 0: dev wlan0 parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0

Again, many thanks for your support!

Lorenzo

moeller0 · September 29, 2017, 2:55pm

and

As I expected, you are not using cake's overhead compensation, but tc's stab option. That is not bad in itself (it actually is quite fine), but stab does not account for the amount of overhead the kernel automatically adds for ethernet interfaces, namely 14 bytes (6 dst mac, 6 src mac, 2 ethertype). In essence this expands your specified overhead of 12 bytes into a more reasonsable 26 bytes. Why do I say 12 bytes is unreasonable? Because ethernet overhead alone takes more than that...

Best Regards

RanocchiO · October 6, 2017, 2:57pm

Hi moeller0,
I've noticed some strange behaviours during my tests, so I reverted to LinkLayerAdaptation=none to keep it simple. I also disabled ingress_ecn for LAN iface to reflect single cake setup' defaults.

/etc/config/sqm

config queue 'eth1'
option itarget 'auto'
option etarget 'auto'
option linklayer 'none'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option qdisc_really_really_advanced '1'
option eqdisc_opts 'dual-dsthost'
option qdisc 'cake'
option download '0'
option interface 'eth0.1'
option upload '16384'
option enabled '1'
option squash_dscp '0'
option squash_ingress '0'
option egress_ecn 'NOECN'
option ingress_ecn 'NOECN'
option script 'layer_cake.qos'

config queue 'eth2'
option itarget 'auto'
option etarget 'auto'
option debug_logging '0'
option verbosity '5'
option qdisc_advanced '1'
option squash_dscp '1'
option squash_ingress '1'
option qdisc_really_really_advanced '1'
option interface 'eth0.2'
option qdisc 'cake'
option eqdisc_opts 'dual-srchost'
option download '0'
option upload '16384'
option enabled '1'
option linklayer 'none'
option egress_ecn 'NOECN'
option ingress_ecn 'ECN'
option script 'layer_cake.qos'

Results:

rrul

rrul_double_piece_of_cake

rrul_cs8

rrul_cs8_double_piece_of_cake

Same settings for both test except queue setup script, but opposite ping results, could it be related to squash_dscp / sqash_ingress settings?
If I understand correctly your post here piece_of_cake and layer_cake should have different defaults, is it correct?

Many thanks
Lorenzo

moeller0 · October 6, 2017, 6:17pm

Quick question, I was just trying to understand your request, when the data ans the request disappeared. I hope you found a solution to your question. If so, would you mind sharing that, if you want per PM.

Best Regards

RanocchiO · October 6, 2017, 6:27pm

Hi moeller0,
the post is back again.
I double checked and found a misconfigured SQUASH setting that was causing that strange behaviour:

RanocchiO · October 11, 2017, 4:34pm

Hi moeller0,
given that I'm not a network admin and until a few ~~weeks~~ ~~days~~ hours ago I didn't know anything about QoS, DSCP, ECN etc.. I'm trying to understand how different cake scripts and options affect traffic shaping.

I summarise what I think I've understood:

I can't take advantage of layer_cake because I use this transparent approach and the main router that eventually sets DS field is after the cake-box. I could only rely on applications setting DS bits at the source.

Layer_cake is also heavier and it introduces a bit of latency trying to shape into different queues, so the best approach in my scenario is piece_of_cake and squash/ignore DSCP on ingress, is it correct?

I also have a couple of questions about flent tests:

What's the difference between the rrul and rrul_cs8 test? which one is better suited during troubleshoot?
Do you have a good example of box_ping test using layer_cake? Just to know how it looks like.

Thank you very much.

Lorenzo

moeller0 · October 12, 2017, 10:33am

But that is the charming idea about DSCP markings, ideally the end points of a connection set them and the intermediary networks either honor or ignore those. Unfortunately in reality what often happens is that intermediary networks actually re-map the DSCP fields to different values for their internal usage. But the idea is very much that the applications are the ones requesting a specific DSCP and it is up to the network to either honor or ignore this. Some applications actually set meaningful DSCPs already (I believe ssh does) so layer_cake might improve things even if only the egress packets have meaningful markings...

For ingress you would need to run a few packet captures to figure out whether you want to trust the incoming DSCPs or not, if you do set both squash_dscp and squash_ingress to 0. (The first instructs cake to remap the DSCPs to all zero, the default for the TOS field that is universally interpreted as best effort).

Not necessarily, yes layer_cake is more computationally expensive, but I have not quantified how much more expensive, and it might still do the right thing for you assuming your internal applications set the "correct" DSCPs. I guess you need to try it out?

rrul uses four flows per direction all with different DSCP markings. rrul_cs8 uses 8 flows per direction each using one of the the 8 dscp class selector (CS) markings. So rrul_cs8 will simply offer more flows and will also sample the priority band strategy of the whole end-to-end link a bit better. For fast links having 8 instead of 4 flows will make the measured total bandwidth come closer to the real limit (more interleaving of the different TCPs probing for the bandwidth limit).
I like the CS system as it a)only uses 3 of the 6 DSCP bits and b) I strongly believe that 8 different priority bands should be sufficient for most home users. (Heck many ISPs use the 3 priority bits in the VLAN tags so do just fine with just 8 priority bands and wifi/wmm uses just four different priority classes; so the full 6 bit of dscp markings seem quite overkill. I also would love if everybody would agree to split the 6 bits into two groups of three each, one group for the endpoints to code their intention, and one group for each intermediary network to use for real, that way at least the intention would be carried end-to-end; but this is just a dream).

Not at the moment, also I typically see more effects in local tests that when I go though the internet. I will see whether I can create one later. (Typically I see a stronger effect on the bandwidth, as the latency probes are still sparse and will be typically be boosted in comparison to the bulk TCP packets in each of the priority bands that cake uses, so the pings are often flat even though the bandwidth graphs show differences)...

Best Regards

RanocchiO · October 12, 2017, 5:02pm

Thank you moeller0, your answers are very useful for me to understand how things works!

Just to be sure:
ingress -> cake on wan iface
egress -> cake on lan iface

There isn't a squash_egress option, right?

For now this is beyond my skills

I've already tried, and in every test layer_cake showed higher avg (few ms) and peak (CS1_BK up to more than double) ping values. I didn't report all my result, because I'm not sure I'm testing correctly.
In fact I'm almost sure I always miss something!

I'm also having problem with the various DSCP marking standard, understanding how they overlap and/or work together.

while (!fully_understand) { try(); fail(); learn(); }

Best regards
Lorenzo

moeller0 · October 12, 2017, 7:45pm

I had forgotten about your exact topology, but I meant packets that pass from your internal network to the internet. But now that you remind me, cake will naturally be attached to the egress side of an interface (for ingress shaping you need the ifb device), so in your case the shaper on the wan side effectively shapes egress, and the one on the LAN egress side handles packets coming from the internet. I hope this clears things up?

With a heavy emphasis on "now", you are making a lot of progress in understanding these things in a very short amount of time (it took me way longer).

Well, CS1_BK is the background "scavenger" class, so it is intended to only use up left-over bandwidth and yield quickly to more important packets, so higher RTT values for probes marked CS1 is to be expected and just shows things to be working as intended. I would be more interested in the relative RTTs of the other classes. Could you maybe post the "all" plots here, as they allow a decent first glimpse into the general sqm performance?

Wellcome to the club As far as I can tell eveybody nowadays hates strict precedence, but other than that there is not a really strict consensus what to use when. There are some heuristics based on some DSCP markings that actually are used in the wild (e.g. by VoIP applications and VoIP servers), but all in all it is a mess. IMHO not to the least because the DSCP bits are not guaranteed to be stable end-to-end, instead they are free for everybody to use and (re-)set when ever they please. That said, on the egress side you have full control (well potentially) over which applications use which markings (I believe in windows that can be set with a group policy, so might not need to be configured explicitly ion each machine, but zero actual erperience myself) and how the AQM interprets those... (Okay, by using cake you will need to make your applications use those DSCP markings that cake actually handles...)

Best Regrds