Nftables: how to clear ecn bit

Hi!
Does someone know how to clear the ecn bit with nftables?
Thanks.

Is there some requirement to do this via the firewall?

Because this is usually done in sysctrl.

sysctl -w net.ipv4.tcp_ecn=0

3 Likes

I think this setting only affects local connections of the router itself.
Because I have it set to 0 and ECN enabled on one of my devices in the network (for testing) and I can see a bunch of marked packets in the cake status dump.
Also a side note here, tcp_ecn firewall setting doesn't work with fw4.

Can ECN even work this way on a router/firewall?
ECN signals the sender to throttle down.
On a common setup cake is running on the wan interface.

  • LAN Device (Sending) > Router with cake/fq_codel on wan interface > Receiver.
  • egress gets congested
  • cake starts to mark the packets with ecn (ece, Echo of Congestion Encountered?)
    The receiver now gets the ECN/ECE packets but actually can't do anything because it is not sending?
    Or I'm mistaken here?
    So it would be better to clear the ecn bit on the LAN side and just drop the packets on the wan side...?

Yes, the ECN bits are part of the IP headers of packets traversing your router. The sys-control only affects whether locally terminating TCP connections will try to negotiate ECN or not...

That signal is called CE congestion experienced and is carried in the IP header...

Well that receiver now echo's that CE information back to the sender as the ECE flag in the reverse ACK packets TCP header; ECE is shoer for ECN-echo. The receiver will keep sending ECE until it receives a data packet from the sender carrying the CWR flag (congestion window reduced) which tells the receiver that the sender got the "message" and reacted to it. One consequence of this design is that rfc3168 style ECN will only be able to react to a single CE mark per RTT.

This really depends on why you want to clear the ECN bits in the first place...

2 Likes

Thanks moeller0,
this makes somewhat sense.
I want to clear it because with ECN latency skyrockets.
And on egress path it makes more sense to instantly drop packets from the queue?
As you described, one CE mark per RTT...

That should not be the case. TCP is supposed to react to a CE mark just as it would react to a dropped packet (actually since dropped packets can only be deduced post-hoc, after getting >= 2 ACKs with the same sequence number, the CE signal should be slightly faster).
What kind of traffic do you have that does not react correctly to the CE marks?

That is what we did in pre-cake SQM as default, e.g in simple.wos/simplest.qos. my rationale was that egress often is very narrow and it can make a noticeable difference which packet we send. Cake however always uses ECN (if a packet is deemed actionable, cake will send a CE mark for ECT(0)/ECT(1) marked packets and drop NOT-ECT packets in accordance with RFC3168). On my nominal 100/40 link I never ran into problems with cake (and I try to enable ECN negotiation in my end points).

Yes, but drop has the same signaling frequency it also takes at least a full RTT for the result of a drop to manifest at the place that dropped the packet...
But really for rfc3168 compliant flows a drop and a CE mark will elicit an identical response, a reduction in the congestion window and hence an effective reduction in sending rate.
So could I convince you to get a packet capture to figure out which traffic uses ECT(0) or ECT(1) while not responding properly to CE marks?

1 Like

Normal speedtest, http/s streams.
I mean, there is a possibility that there is a gateway on the path that clears ecn bits?

I can't follow here...
How is it not faster to instantly remove the packet from the own queue than to wait for the other side to acknowledge the congestion?

So here is the thing, once a link is congested and the queue runs full or even over, imnediately the only thing that hop can do is drop packets quickly to shed the (over) load, but the real relieve comes when the flows traversing that path reduce their sending frequencies, and for the effect of that rate reduction to reach the overloaded node, be it as a response to a drop or a CE mark, it will take >= 1 RTT.

How do you measure latency in thst case?

speedtest.net gives a latency figure.
So, how to clear the ecn bits with nftables, is it even possible?

Ah, so if you disable ECN on the host machine you run the speedtest from that would be the quickest way to test how/if ECN degrades latency...

I am not sure whether nftables allows that, even for iptables this mainly worked via interpreting the whole 6bit DSCP=2bit ECN bitfields as the legacy 8bit TOS bitmap... I think that using tc should allow to manipulate any arbitrary bit...

1 Like

Mmmh have a look at:
https://man7.org/linux/man-pages/man8/tc-pedit.8.html

" To rewrite just part of a field, use the retain directive. E.g.
to overwrite the DSCP part of a dsfield with $DSCP, without
touching ECN:

          tc filter add dev eth0 ingress flower ... \
               action pedit ex munge ip dsfield set $((DSCP << 2)) retain 0xfc

   And vice versa, to set ECN to e.g. 1 without impacting DSCP:

          tc filter add dev eth0 ingress flower ... \
               action pedit ex munge ip dsfield set 1 retain 0x3"

this might be the closest we get to on-router ECN manipulations, however tihs is clearly NOT using nftables.

1 Like

I don't know if it's what you're looking for, but you can take a look at nftables raw payload expression

looking more in the man page under IPV4 HEADER EXPRESSION, it looks like ecn is defined. so maybe something like ... ip ecn set 0 would work.

2 Likes

Thanks for your suggestions.
I tried it with the nftables before.
But with tcp ecn set 0 because the site you linked lists ecn under tcp flags but I can't get it working.
Always throws an error at the ecn part.
iptables was much easier :smiley:

// edit
hmm
ecn is also listed unter ipv4/6 headers...

//edit2
something like this?
nft add rule inet fw4 mangle_postrouting oifname "eth2" ip ecn != 0 ip ecn set 0
nft add rule inet fw4 mangle_postrouting oifname "eth2" ip6 ecn != 0 ip6 ecn set 0

//edit3
I can't get this working...
I tried various different chains but cake still shows marked packets.
nftables output shows that packets hit this rule but doesn't seem to do anything :expressionless:

//edit4
i'm dumb, should have added an ip6 rule too!! :rofl:
updated rules in my second edit, so I can have ECN enabled for the local network + local network to the router itself but disable ECN on outgoing connections...
And set the ECN mode on the router back to 2.
Thanks @boredhominid, for the hint to use ip instead of tcp!

1 Like

Note though that for ingressing packets, nftables will see the packets after a qdisc on an wan-ifb already did its thing... so at worst, assuming this actually affects ingress traffic at all, this can convert a CE signal into not-ECT, which is pretty undesirable, as cake would have dropped that packet if it was marked not-ECT...

Since I am still on OpenWrt21 and hence on iptables: how does one integrate your command into fw4?
Do you just manually issue this from the commandline, and if so, will this survive a firewall restart, or did you integrate that into a file that is automatically picked up by fw4?

(As you can see I am an unfrozen caveman in regards to nftables, but I realize that this is inevitable)

I think this will not work for inbound traffic.
But since ECN has to be negotiated by both sides, it should still work?

I created a file containing the following:

oifname "eth2" ip ecn != not-ect ip ecn set not-ect counter
oifname "eth2" ip6 ecn != not-ect ip6 ecn set not-ect counter

In /etc/firewall/config:

config include
	option type 'nftables'
	option path '/etc/firewall/clear-ecn.nft'
	option position 'chain-pre'
	option chain 'mangle_postrouting'

A side note/off topic:
This nftables include thing doesn't work with rules that contain meters.
It will throw an error on reload (resource busy) and eventually break everything firewall related.

1 Like

No, the negotiation happens "inside" the TCP header and the IP ECN field is left at 00 until the negotiation was successful, only after that will the packets be marked ECT(0)/ECT(1) to indicate responsiveness to the CE signal. See RFC3168 for the details. So forcing the ECN bitfield to 0 does not interfere with rfc3168 style ECN negotiation for TCP...

Thanks for the detailed instructions, this will come in handy once the L4S crap will see more usage (assuming it will actually see much usage, it works badly enough that I see not much utility).

As the unfrozen caveman I am I need to ask: does the counter keyword in your instructions above act as a meter in that sense, and if yes is the remedy simply to either:
a) not restart the firewall (if that is actually possible)
b) omit the counter keyword

wikipedia states bits 01 and 10 indicate an ecn capable stream.
When there is a congestion, the bits are set to 11/CE.
And the rest is handled by tcp.
So as long as both sides don't agree on ecn and setting ce bits it should work?
Assuming there is an incoming connection with 01/10 ecn bit set.
The Router or a host from the internal network has also ECN enabled and set bits 01/10 .
They get nulled by the router when leaving.
So the host who initiated the incoming connection thinks ECN is not supported and doesn't echo back via TCP.
Or am I mistaken here?

the counter keyword should work fine.
meters are like sets, but they don't need to especially defined (they are created "on-the-fly").
I think they don't get released properly on fw4 reloaded and that's why fw4 throws an error.
(because the set/meter already exists)

1 Like

My point is that the negotiation that ECN is to be used happens within the TCP header, without using/evaluating the IP ECN bitfield (actually during negotiation it is supposed to be set to not-ECT). So your nftables rule will not interfere with ECN negotiation.
So both endpoints can negotiate ECN usage, now assume typical download traffic, where the upload traffic is essentially pure ACK packets, these will also send as not-ECT (see rfc3168, the rationale is that legacy TCP will not reduce the ACK rate so setting CE on ACK packets will not help reduce the load, so the congested node can and should drop the packet instead to get some relief) now our download packets are using ECT(0) because the endpoints negotiated ECN, and now cake on ifb4wan needs to act on a packet from our ECN flow, since the packet is still ECT(0) is goes and changes that to CE, but now this CE marked packet gets into nftables hands and the CE gets reset to not-ECT, the receiver will not know that there was congestion along the path and hence will not be able to tell the sender, which will keep increasing its congestion window and send more and more instead of slowing down... eventually cake's BLUE component will take over and drop from that hash bin, but for the affected flow life is miserable as cake still limits its rate into your network but due to the cleared CE marks the sender keeps pushing and that flow's queue will get longer and longer and the delay for that flow will also increase with that queue.

So the problem here is the potential for silent removal of congestion signals... and IMHO it is a bit unfortunate that for cake we did not make ECN usage optional/togglable...

if tcp doesn't make use of ip ecn header, why does this rule work then?
I get what you are explaining..
But I'm a bit sleepy and will have a deeper look later.
And maybe figure out a way to clear the ecn bits with ntables.