Effect of "Set tcp_ecn to off" on ECN

Based on recommendations from bufferbloat.net to enable ECN, I have been trying to figure out which devices on my network support it and ultimately enable it where possible.

Here's some background on my environment:

  • I have enabled ECN (via sysctl -w net.ipv4.tcp_ecn=1 in my rc.local) on my OpenWRT x86 router and my OpenWRT AP (R7800). These are two separate devices, obviously.
  • Both devices are running custom builds off the Master branch.
  • Only the x86 box is running the firewall service. I do NOT have a firewall installed on my AP build at all.

A couple questions for the smart people (@dtaht, @tohojo, and team) out there... :wink:

  1. Was it correct to enable ECN in both devices?
  2. On my x86 box, when I reload my firewall I see this in the output:
 * Set tcp_ecn to off
 * Set tcp_syncookies to on
 * Set tcp_window_scaling to on

What is the effect of Set tcp_ecn to off in relation to having enabled ECN via sysctl? Are they negating each other? Are they unrelated?

Thanks in advance for helping me and others gain wisdom on this topic!

After the firewal has been started.

what value does
sysctl net.ipv4.tcp_ecn
show?

root@OpenWrt:~# sysctl -w net.ipv4.tcp_ecn=1
net.ipv4.tcp_ecn = 1
root@OpenWrt:~# sysctl net.ipv4.tcp_ecn
net.ipv4.tcp_ecn = 1
root@OpenWrt:~# service firewall reload
...
...
 * Set tcp_ecn to off
 * Set tcp_syncookies to on
 * Set tcp_window_scaling to on
 * Running script '/usr/lib/bcp38/run.sh'
 * Running script '/etc/firewall.nat6'
root@OpenWrt:~# sysctl net.ipv4.tcp_ecn
net.ipv4.tcp_ecn = 0

So it does look like the firewall setting is winning.

UPDATE
I feel like a dummy. Found it in the docs. I glazed over it previously because of falsely assuming the sysctl setting would persist.

Is there any downside to enabling it via /etc/config/firewall?

option  tcp_ecn               '1'
3 Likes

Doesn't matter where you enable it; the sysctl just takes effect for any TCP flows started after it is set. And yeah, you need it enabled on both sides, since the initiator of the connection is the one who sets the flag...

1 Like

Is there a reason why the OpenWrt firewall configuration in 2020 still defaults to disabling ECN? I know that’s probably a loaded question with lots of politics and history behind it. :slight_smile:

No idea. It doesn't matter that much for openwrt, though, as that setting only affects TCP connections terminated on the router, not traffic from clients behind it (which, presumably, is most of the traffic) :slight_smile:

3 Likes

Ah, this helps. I had been under the impression that ECN awareness/handling had to be enabled on each hop in order for it to pass successfully end-to-end between a source and destination. Thank you for setting the record straight for me.

So the state of the net.ipv4.tcp_ecn setting on an OpenWrt router has no effect on SQM’s ability to mark, as long as the source and destination endpoints have negotiated ECN. If I’m still misunderstanding, please let me know.

Thanks Toke!

Yup, exactly. The qdiscs have separate switches to turn off ECN marking but those should default to on :slight_smile:

2 Likes

I'm just curious since it seems that the OP (and "team" :smiley:) have researched this...

So I checked my client, and I see this:

user@machine:~$ sysctl net.ipv4.tcp_ecn
net.ipv4.tcp_ecn = 2

What does the value of 2 mean?

Nevermind:

  2 Enable ECN when requested by incoming connections
    but do not request ECN on outgoing connections.
1 Like

Indeed, so I have been actively setting it to ‘1’ on all my hosts (where possible). :+1:

In SQM scripts, IIRC we default to use ECN marking for ingress, but not for egress. The rationale from my side is/was that for very low-rate asymmetric access links, the upstream often is/was so narrow that the decision to send one packet has/had noticeable effect on the latency of all other packets. The rationale for downstream is different, here the packet already got past the real bottleneck so dropping now it seems a waste of energy/effort. Given that upstream rates are slowly increasing, it might be time to revisit the default choice for the upstream?

Thanks for calling that out. I had noticed the same and was curious about it. In the SQM settings, egress ECN defaults to NOECN. I had set it to ECN enabled, thinking that was the right thing to do for my 400x20Mbps (more like 480+x24Mbps before SQM) cable connection.

What is the definition of "low-rate" asymmetric access links that were in play when the SQM scripts were developed?

The problem with ECN is, all devices on the path needs to support it.

The tcp_ecn option (values 0-2) overwrites the sysctl setting.
So there is no point in modifying this setting via sysctl.

It is better to use no ECN on the egress side, because dropping packets is faster in negating a congestion.
Because the device is in control of its own queue and can directly drop its own packets at the bottleneck.
Also I'm not sure if ECN even works for egress?
Because the receiver has to signal the congestion?

//edit
Nvm, the receiver has to receive a CE marked packet to signal congestion.
And this "CE signal" packet can be send by any (ECN capable) device on the path.
Then the receiver sends back a ECE marked packet to signal the sender that there is congestion somewhere on the path and that the sender should throttle its sending rate.
Well, I guess, it is still faster to drop the packets directly at the bottleneck.

If you enable ECN, qdisc like fq_codel, pie can detect the ECN bits and act accordingly.
(And if you enable ECN support in the qdisc)

I think the ECN setting in sqm scripts have no effect on cake.

Most Client systems still default to use no ECN.

1 Like

Or just condition it on the rate?

Oh, IMHO, your network, your rules, so setting ECN to on for egress is totaly fine IMHO, if you know and accept the trade-off.

So as a rule of thumb a full MTU packet will take around 1ms transmission time on the medium at ~12Mbps. So at 512 Kbps, which was not uncommon in ADSL, more than 20 ms. And 20 ms additional delay is IMHO unpleasant enough to not use ECN at such low bandwidths.
Now, the exact cutoff bandwidths is somewhat subjective and rather hard to define, IMHO, as it is a somewhat murky trade-off.
That said, @tohojo's idea might make for a better default, if we can agree on a threshold.... Any proposals for such a threshold rate with a rational, anybody?

moeller0 via OpenWrt Forum mail@forum.openwrt.org writes:

That said, @tohojo's idea might make for a better default, if we can
agree on a threshold.... Any proposals for such a threshold rate with
a rational, anybody?

1 ms?

Hi Toke,

Toke Høiland-Jørgensen tohojo
May 18
moeller0 via OpenWrt Forum mail@forum.openwrt.org writes:

That said, @tohojo's idea might make for a better default, if we can
agree on a threshold.... Any proposals for such a threshold rate with
a rational, anybody?

1 ms?

Okay, but why? I realize that others like L4S went all in for 1ms, but mostly based on the marketability of that number :wink:
But, I guess as long as we really just use this as our default threshold without assigning any significance the the exact number, we should be fine.

moeller0 via OpenWrt Forum mail@forum.openwrt.org writes:

Hi Toke,

Toke Høiland-Jørgensen tohojo
May 18
moeller0 via OpenWrt Forum mail@forum.openwrt.org writes:

That said, @tohojo's idea might make for a better default, if we can
agree on a threshold.... Any proposals for such a threshold rate with
a rational, anybody?

1 ms?

Okay, but why? I realize that others like L4S went all in for
1ms, but mostly based on the marketability of that number :wink:

Completely arbitrarily: it's 20% of the CoDel target (and yeah, it's a
nice round number). My larger point was just that it should be a number
measured in time units, not in bytes. We can quibble about the actual
number; I guess the question is "what is the maximum amount of queueing
latency we want to sacrifice to avoid a packet drop". And any answer we
come up with is going to be arbitrary :slight_smile:

2 Likes

As far as I can tell, the tome to transfer a packet of a given size and the bandwidth/transfer rate are strongly correlated ;). So we can calculate the minimum required rate from packet size and acceptable queuing increase, but as long as the later is fixed, so will be the former. I guess this actually does not require too much finesse or precision, so that just accepting 1ms delay aka access rates >= 12Mbps to enable ECN for egress by default then.
The value equation for ingress is still different, because we still have the fact that using ECN should result in a tighter feed-back loop and that flow's sender slowing down quicker, which might be a win, even if the queueing delay increases by more than 1ms. But we can probably ignore this, 12 Mbps is not that common anymore, and we will keep a manual over-ride anyways.

1 Like

moeller0 via OpenWrt Forum mail@forum.openwrt.org writes:

As far as I can tell, the tome to transfer a packet of a given size
and the bandwidth/transfer rate are strongly correlated ;). So we can
calculate the minimum required rate from packet size and acceptable
queuing increase, but as long as the later is fixed, so will be the
former. I guess this actually does not require too much finesse or
precision, so that just accepting 1ms delay aka access rates >= 12Mbps
to enable ECN for egress by default then.

Well, one refinement could be to take the MTU into account. But as you
say it probably doesn't need that much precision for a default, so just
setting a threshold at 12 Mbps (or even rounding that to 10) would be
fine with me.

The value equation for ingress is still different, because we still
have the fact that using ECN should result in a tighter feed-back loop
and that flow's sender slowing down quicker, which might be a win,
even if the queueing delay increases by more than 1ms. But we can
probably ignore this, 12 Mbps is not that common anymore, and we will
keep a manual over-ride anyways.

For ingress I think we should just keep the current default (ECN on).

2 Likes