Ultimate SQM settings: Layer_cake + DSCP marks

dlakelan · December 6, 2018, 5:20pm

It might, but it would also add a bunch of cpu overhead compared to just shoving it in a separate namespace I believe.

hisham2630 · December 6, 2018, 5:24pm

i have another idea but not sure if it's right or will work properly, the idea is:
1.create a new bridge lets call it br-all.
2.add other br-vlan's to this br-all.
3. use ebtables to prevent each br-vlans to not talk to each other.
4.add veth1 to br-all.
5.add veth0 to a new routing table, then use sqm on veth0.
can you kindly post the commands that you have used to create the second name space?!

dlakelan · December 6, 2018, 5:26pm

also I think this could work, but becomes much more of a security issue, and also will kinda break the structure of the OpenWrt firewall I think.

For most people, I think just create the bridge between the wan ethernet and veth0 and off you go, no namespace needed and no strange ebtables manipulation unless a lot of services like squid or something run on the router, in which case, just shove the prewan into a separate namespace and it works. It's not more than a few commands to do the namespace thing, like 5 or 6, to set up the namespace, to move the devices, and then create the bridge in the other netns and load the DSCP tagging rules.

hisham2630 · December 6, 2018, 5:30pm

so if someday i created a guest network, then is it enough to just bridge between the wan ethernet and veth0.
then establish a wan connection on veth1.
after this i have to use sqm on veth0 and veth1 ? is this correct!
EDIT:
do you bridge the second namespace to the original machine default space.

dlakelan · December 6, 2018, 5:34pm

Yes, because all the traffic will go through veth0-veth1 and then be distributed out to your guests or LAN or whatever, so it will all get prioritized.

Of course you want to set your DSCP values in the firewall too! I think that requires you to have bridge netfilters enabled and a rule that allows forwarding packets from the prewan firewall zone to the prewan firewall zone. SO you'll want to put the br-prewan into its own prewan firewall zone and set your firewall to allow forwarding from prewan to prewan and prewan to wan. The prewan to prewan rule needs to be its own traffic rule because normally that doesn't happen so the firewall doesn't think about it.

dlakelan · December 6, 2018, 5:35pm

In my config which requires the namespace, I put veth0 into the prewan namespace, and leave veth1 in the main namespace. Now veth1 appears to be my WAN.

hisham2630 · December 6, 2018, 5:39pm

but strange when i run:

ip netns add prewan

it will show this error:
Failed to create a new network namespace "prewan": Invalid argument

dlakelan · December 6, 2018, 5:41pm

Hmm, I'm doing this stuff on a Debian server, I only tried the bridge without the netns on OpenWrt, perhaps there are kernel mods needed to be installed?

hisham2630 · December 6, 2018, 5:44pm

i have a lot of kmod's installed, so not sure what is needed for this to work!

ip netns help
Usage: ip netns list
       ip netns add NAME
       ip netns set NAME NETNSID
       ip [-all] netns delete [NAME]
       ip netns identify [PID]
       ip netns pids NAME
       ip [-all] netns exec [NAME] cmd ...
       ip netns monitor
       ip netns list-id

i think the problem is laying here

CONFIG_KERNEL_NAMESPACES=y
CONFIG_KERNEL_NET_NS=y

At the moment we can't use network namespace!?!
BTW: you can see interested things if you run
bridge fdb

dlakelan · December 6, 2018, 7:09pm

Further testing shows that when I put iptables / ip6tables into my prewan netns and just use mangle table to adjust DSCP, things don't work at all, I get highly erratic speeds, with upload eventually dying out entirely due to massive numbers of TCP resets...

I am not sure what goes on there, it seems unclear, for the moment I get Fabulous results without the DSCP, and of course I am still reordering packets because I have a shaper on LAN, so for now I am not using iptables to dscp tag in the prewan namespace. So, you might ask, why use it at all? I definitely do get better results with the prewan shaper, so I think this is down to some quirks of my particular setup which is in fact very quirky.

I'll open a new thread on this once I've done debugging and can report a good working config.

Here is speedtest with prewan namespace but no iptables operating in the prewan namespace. It has a shaper on veth0 and veth1 and LAN

as you can see, pretty darn good! This definitely requires the full power of an x86 at these speeds.

Also, this is about the best I've ever gotten in my setup. Clearly the veth method is pretty good, even if I can't figure out how to do the bridge netfilter dscp tagging. The DSCP tagging does happen in routing in the main namespace, and a shaper on the LAN output then prioritizes things properly. I guess somehow maybe the pre-wan shaper does a better job of modeling the WAN connection than the LAN shaper does because my LAN is faster than my WAN.

anyway, I'll take it for the moment

hisham2630 · December 7, 2018, 7:02pm

@dlakelan will Macvlan help us in the multiple br-vlans without using namespace or in your case!

dlakelan · December 8, 2018, 4:20am

Don't think so. I've been working on nftables all day, changed my whole firewall, now going to look at nftables ingress filters.

moeller0 · December 8, 2018, 10:01am

BTW, if you try nftables, have a look at the new nft-qos package which promises per IP/subnet throttling via a luci interface, which might be a good starting point for a dscp remarker app, if instead of using the rate limiter one optionally rewrites the dscps. Note, I have not tested yet how this behaves in regards to latency under load/bufferbloat since I have not managed to get it installed yet (would need to compile firmware from source and am lacking diskspace to do so).

hisham2630 · December 8, 2018, 12:38pm

@dlakelan good luck with configuring nft.
but what is the real benefit of nft, simplicity,less rules?

but will this force you to use tc, in this case we can't use sqm diffserv classes?!
@moeller0 but how we would tag packets with nft qos?
is there any classes,prioritization?
this mean we will loss sqm bufferbloat, or we should make sqm work with nft instead of iptables?
for dscp will we use something like this?

ip forward ip dscp set 42

also is there any attempts to make openwrt firewall based on nft instead of iptables, or any new package
like luci-app-nftfirewall?

dlakelan · December 8, 2018, 2:02pm

I run a Debian router with fireqos, converting to nftables made my config easier to understand and much simpler, I suspect faster too, fireqos had hundreds of rules in tens of chains, my nftables has around 20 lines of code

When I tried to use iptables to tag dscp in my prewan namespace my speed dropped from 750 megabits to about a megabit, and erratic. So this motivated me to look into it.

I run openwrt on my access points, who have a lighter duty.

Nftables let's you select classes by using the priority action, so it can be used instead of TC, which is a big benefit because TC is terrible to use. I think it means while tagging dscp also you can select classes.

hisham2630 · December 8, 2018, 2:15pm

Too funny, that's much smaller and will take less space from disk.

hhhaah,really strange.use nft to tag packets with dscp.

it will make things easier, i think this will allow sqm to run on nft instead of tc!.
tc is not well documented and hard to understand, still i can't understand what does priomap do or how to config it

Emtee · December 10, 2018, 3:35pm

Hi guys! Everyone still nicely tweaking QoS and all i see

I have a question, I'm running at 15/15 bandwidth right now conservative as my 4G randomly drops once in a while to sub 20's occasionally. But sometimes I just want to go to the max of 40Mbit~ for a quick download. I was thinking of using a shortcut on desktop and use ssh or something to send a command to my router.

I assume i have to use a command to re-set bandwidth on veth0 SQM instance and restart SQM, but not sure how to do it. Would I need to use tc command? Other solution would be to swap config file and then restart through init.d but not sure what the best approach is.

Thanks

Another thing which is weird is when i set it less conservative, on full load sometimes the WAN connection disconnects for a second or two, but the only thing I found on the web is about LCP echo starvation and PPPoE DSL connection, but I dont have such a connection. I wonder if its because of Link layer adaption perhaps?

dlakelan · December 10, 2018, 4:31pm

Ahh the smell of fresh QoS in the morning...

so, coming back from some nftables tweaking. I've changed my firewall from fireQOS using iptables to a custom nftables firewall. This seems to have improved my throughput a little, in that with the same bandwidth settings the cpu idle drops only to say 25% instead of 5-10%. In the prewan namespace where i've shoved the inbound ethernet vlan device and the veth0 into a bridge, here is what I am loading into nftables to do tagging. (ip/ip6 addresses and ethernet device names obscured because they're not important to the overall point)

table netdev retag {
    chain tagin {
          ## mangle priority
          type filter hook ingress device <MYEthDevHere> priority -149; policy accept;

          ip dscp set cs2 ## convert all to cs2 first
          ip6 dscp set cs2
          
          ## VOIP servers UDP traffic
          ip saddr {x.x.x.x, y.y.y.y} ip protocol udp ip dscp set cs6
          ip daddr {x.x.x.x, y.y.y.y} ip protocol udp ip dscp set cs6
          
          ip6 saddr {aaaa:aaaa::aaaa, bbbb:bbbb::bbbb} ip6 nexthdr udp ip6 dscp set cs6
          ip6 daddr {aaaa:aaaa::aaaa, bbbb:bbbb::bbbb} ip6 nexthdr udp ip6 dscp set cs6

          ## icmp/icmpv6 matches game traffic priority
          ip protocol icmp ip dscp set cs5
          ip6 nexthdr icmpv6 ip6 dscp set cs5 

          ## game traffic on ip and ipv6
          udp dport {7000-9000, 27000-27200} ip dscp set cs5
          udp sport {7000-9000, 27000-27200} ip dscp set cs5

          ip6 nexthdr udp udp dport {7000-9000, 27000-27200} ip6 dscp set cs5
          ip6 nexthdr udp udp sport {7000-9000, 27000-27200} ip6 dscp set cs5
          
          ## large transfers over 3 seconds long at 100Mbps (37.5
          ## MBytes) get deprioritized this doesn't work yet because
          ## conntrack during ingress isn't a thing, we do this later
          ## in the main namespace
          
#         ip protocol tcp ct bytes ge 37500000 ip dscp set cs1
#         ip6 nexthdr tcp ct bytes ge 37500000 ip6 dscp set cs1

    }
}

Conntracking isn't available at the netdev/ingress hook so I commented out the last two rules. I do use a rule like that in the main namespace where I'm doing regular forwarding/routing. It will further delay the long running transfers. It occurs to me I should do that cs1 tagging before the VOIP specific tagging because some calls might last long enough to be down-prioritized, so I just changed that in the main namespace.

So, the upshot is that the system of shoving the inbound device into a prewan namespace and bridging it to a veth works well. I get very good results, with an A+ rating at 700+ megabits using a j1900 motherboard as router. Before doing this I was getting an A rating, and had occasional latency spikes under high load. In part this is due to the fact that I run a proxy on the router so packets coming into the proxy were un-queued, and in part due to the LAN download being set higher than the WAN because of bonded interfaces and the fact that the box has other functions that I didn't want to throttle.

Now, as for @Emtee and variable speeds. There is no great way to automatically handle variable speeds from outside the qdisc. It's definitely possible for the qdisc itself to have an algorithm that "learns" the bandwidth and constantly probes for higher available bandwidth until it hits latency problems and then backs off... but it requires a level of sophistication that would cause a lot of computational overhead and basically isn't currently available in any qdisc. The biggest issue is how does the qdisc measure round-trip latency increase directly? It's not like packets arrive with a timestamp included so you have to do some kind of indirect inference, like looking at the time between when tcp segments come in your interface vs when acks go back out that interface... or something like that, and it's not obviously easy. It's conceivable someone could create some kind of latency monitoring conntrack module which could then provide current latency estimates to sophisticated qdiscs... but yeah for the moment no.

So, how can you take advantage of your bandwidth more fully? It's a big drop from 40 to 15 Mbps ! I think your best bet for simplicity is to swap between two config files, and then just restart the sqm. Create two config files that you like, one for normal and one for fast download, and then copy whichever one you want into the /etc/config/ directory and restart SQM. It's quick and dirty but it's easy.

Emtee · December 10, 2018, 5:00pm

Yea I think for my latency sensitive gaming needs It would be a bad idea to try to 'compensate' with a weird algorithm. I will probably use the config swap method for the time being.

One thing Im still unsure of when doing my iptables and set DSCP rules, i'm using Diffserv4 now, and layer_cake should have 4 tins, but from EF to CS0 to CS6 how would I know which values go into what? or does Cake change this adaptively?

I'm currently using EF for highest priority, CS6 for right below that, CS3 for mid-priority, CS1 for bulk and CS0 for undefined/best effort.

Other than that I just hope the prioritization works 'right' at this point.

moeller0 · December 10, 2018, 5:13pm

Here is the relevant section from sch_cake.c:

static int cake_config_diffserv4(struct Qdisc *sch)
{
/*  Further pruned list of traffic classes for four-class system:
 *                                   
 *          Latency Sensitive  (CS7, CS6, EF, VA, CS5, CS4)
 *          Streaming Media    (AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1)
 *          Best Effort        (CS0, AF1x, TOS2, and those not specified)
 *          Background Traffic (CS1)
 *
 *              Total 4 traffic classes.
 */

And here is the mapping for the 64 diffserve codepoints:

static const u8 diffserv4[] = {
        0, 2, 0, 0, 2, 0, 0, 0,
        1, 0, 0, 0, 0, 0, 0, 0,
        2, 0, 2, 0, 2, 0, 2, 0,
        2, 0, 2, 0, 2, 0, 2, 0,
        3, 0, 2, 0, 2, 0, 2, 0,
        3, 0, 0, 0, 3, 0, 3, 0,
        3, 0, 0, 0, 0, 0, 0, 0,
        3, 0, 0, 0, 0, 0, 0, 0,
};