Notes on enabling ECN (slightly OT)

So explicit congestion notification (ECN for short) is a nifty method to allow to decouple congestion signalling from packet loss (in typical situations, once sh!t hits the fan packet loss will happen and needs to be interpreted even with ECN active). So instead of dropping a packet on overload, we now can 'mark' a packet with a congestion experienced (CE) mark, which the receiver will interpret as sign of congestion and it will signal back to the sender to slow down. That has the obvious advantages of quicker congestion signalling and often without packet loss and hence the need to retransmit the dropped packet.
Linux qdisc's like OpenWrt's default fq_codel and cake support ECN signalling making it easy to employ this on an OpenWrt router. However ECN is something the end points need to negotiate and evaluate to actually help. Which brings me to the reason for this thread, give shoret instructions how to do that for different operating systems and how to confirm it is working as intended:

Enabling active (rfc3168) ECN negotiation (inbound and outbound):
windows:

netsh interface tcp show global
as admin:
netsh interface tcp set global ecn=enabled
while you are at it consider enabling TCP timestamps:
netsh interface tcp set global timestamps=enabled

macos:*

sysctl -w net.inet.tcp.ecn_initiate_out=1
sysctl -w net.inet.tcp.ecn_negotiate_in=1
sysctl -w net.inet.tcp.disable_tcp_heuristics=1 # if this is not disabled macos will decide when to use ecn and when not to

linux# (see https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt):

sysctl -w net.ipv4.tcp_ecn=1

Confirming ECN on the IP level:
Note, this command is intended to be run on your OpenWrt router (might require to install tcpdump first: opkg update ; opkg install tcpdump) my wan interface is pppoe-wan, so replace with your own interface of interest. When trying to confirm CE marks generated by an AQM on your OpenWrt router, please note that you will see outgoing/egress CE marks on the wan interface, but incoming/ingress only on a lan interface, so consider capturing on br-lan for those:

# ECN IPv4/6
tcpdump -i pppoe-wan -v -n '(ip6 and (ip6[0:2] & 0x30) >> 4  != 0)' or '(ip and (ip[1] & 0x3) != 0)' # NOT Not-ECT
tcpdump -i pppoe-wan -v -n '(ip6 and (ip6[0:2] & 0x30) >> 4  == 1)' or '(ip and (ip[1] & 0x3) == 1)' # ECT(1)
tcpdump -i pppoe-wan -v -n '(ip6 and (ip6[0:2] & 0x30) >> 4  == 2)' or '(ip and (ip[1] & 0x3) == 2)' # ECT(0)
tcpdump -i pppoe-wan -v -n '(ip6 and (ip6[0:2] & 0x30) >> 4  == 3)' or '(ip and (ip[1] & 0x3) == 3)' # CE

Confirming ECN on the TCP level:
Most ECN action will happen on TCP connections, so it can be helpful to also capture ECN related TCP packets. Note, since the endpoints set the TCP flags, these should both be visible independent on whether you capture on the WAN or a LAN interface.:

# TCP ECN IPv4/6: (for IPv6 see see https://ask.wireshark.org/question/27153/i-am-trying-to-capture-tcp-syn-on-ipv6-packets-but-i-only-get-ipv4/)
tcpdump -i pppoe-wan -v -n '(tcp[tcpflags] & (tcp-ece|tcp-cwr) != 0)' or '((ip6[6] = 6) and (ip6[53] & 0xC0 != 0))' # TCP ECN flags, ECN in action
tcpdump -i pppoe-wan -v -n '(tcp[tcpflags] & tcp-ece != 0)' or '((ip6[6] = 6) and (ip6[53] & 0x40 != 0))' # TCP ECN flags, ECE: ECN-Echo (reported as E)
tcpdump -i pppoe-wan -v -n '(tcp[tcpflags] & tcp-cwr != 0)' or '((ip6[6] = 6) and (ip6[53] & 0x80 != 0))' # TCP ECN flags, CWR: Congestion Window Reduced (reported as W)

Quick note of the expected action sequence for an individual flow for rfc3168 ECN:

  1. AQM marks a packet CE in the IP header (the packet will have been either ECT(0) or ECT(1), the AQM will not mark packets with Not-ECT, instead they will be simply dropped on congestion)
  2. The receiver TCP stack will see the CE mark and assert the ECN Echo TCP flag (short ECE) on all outgoing reverse packets (ACK or data*ACK)
  3. the original sender will receive that ECE mark and in response will reduce its congestion window (CWIN) and set the Congestion window reduced TCP flag (short CWR)
  4. the receiver will stop asserting ECE once it receives that CWR.

rfc3168 has a more detailed description, but this here should suffice to get an idea about what to look for and what sequence of flags to expect.

*) Macos allows to query some TCP statistics with the following command sudo netstat -sp tcp (note: without sudo this will return all zeros). Here is an example of the ECN related data points:

        128984 client connections attempted to negotiate ECN
                29652 client connections successfully negotiated ECN
                86977 times graceful fallback to Non-ECN connection
                10011 times lost ECN negotiating SYN, followed by retransmission
                2157 server connections attempted to negotiate ECN
                2157 server connections successfully negotiated ECN
                0 time lost ECN negotiating SYN-ACK, followed by retransmission
                6498 times received congestion experienced (CE) notification
                190 times CWR was sent in response to ECE
                117716 times sent ECE notification
                261 connections received CE atleast once
                81 connections received ECE atleast once
                7216 connections using ECN have seen packet loss but no CE
                204 connections using ECN have seen packet loss and CE
                136 connections using ECN received CE but no packet loss
                91 connections fell back to non-ECN due to SYN-loss
                156 connections fell back to non-ECN due to reordering
                0 connection fell back to non-ECN due to excessive CE-markings
                2 connections fell back caused by connection drop due to RST
                5 connections fell back due to drop after multiple retransmits 
                0 connection fell back due to RST after SYN

#) Linux offers some ECN related information in the output of netstat -s (thanks @timur.davletshin ):

IpExt:
    InMcastPkts: 2310212
    OutMcastPkts: 143473
    InBcastPkts: 380057
    InOctets: 45238861245
    OutOctets: 76413395773
    InMcastOctets: 960413814
    OutMcastOctets: 13513741
    InBcastOctets: 126574293
    InNoECTPkts: 107561727
    InECT0Pkts: 1016946
    InCEPkts: 432
10 Likes

So 20-some years later 6/7 of the internet does not utilize ECN?

It depends, as I said most servers will happily use it... client OS however do generally not default to actively trying to negotiate ECN usage...
That is slowly changing though, and if I look at the IPv4/6 "transition" maybe "decades" is the time-constant we need to expect on our maturing internet?

Sidenote: that slow transition somehow made some jokers believe that the low percentage of clients negotiating rfc3168 ECN could be used to justify re-defining how a sender is supposed to react to a CE mark, likely resetting that decades long "transition timer" back to zero...

ss -tio in Linux and check whether ecn and ecnseen present.

netstat -s allows the same in Linux.

1 Like

Do you still need it if you enable ECN support in SQM?

1 Like

Apple uses it for over a decade, AFAIK. Linux is just too conservative in its defaults.

1 Like

Yes, SQM will allow to toggle fq_codel's ECN support (cake will always use ECN independent of the settings in the GUI/UCI config). But the AQM will only CE-mark packets that carry either the ECT(0) or the ECT(1) codepoint, which is only set as part of ECN negotiation by the end points (that is you can for that codepoint, but unless the endpoints are prepared to actually use ECN that codepoint alone will not help).

1 Like

I wish, alas Apple dies not consistently use ECN, but uses it off and on as part of the TCP heuristics. That might have been different in the past, but today it takes a bit of persistence to make macos actually negotiate ECN reliably.

That does not work for me right now (will need to look deeper once I am back home), but netstat -s does Thanks!

In OpenWrt, the defaults seem to be:

net.ipv4.tcp_ecn = 2
net.ipv4.tcp_ecn_fallback = 1

What does this mean?

That's odd. I get output like this:

ts sack ecn ecnseen cubic wscale:8,8 rto:204 rtt:0.085/0.032 ato:40 mss:65483 pmtu:65535 rcvmss:536 advmss:65483 cwnd:10 ssthresh:35 bytes_sent:70969880 bytes_retrans:12491 bytes_acked:70957389 bytes_received:6086382 segs_out:36636 segs_in:19779 data_segs_out:33083 data_segs_in:7539 send 61.6Gbps lastsnd:792 lastrcv:5880 lastack:792 pacing_rate 123Gbps delivery_rate 52.4Gbps delivered:33084 app_limited busy:45856ms retrans:0/6 dsack_dups:6 reord_seen:1 rcv_rtt:5942.31 rcv_space:65535 rcv_ssthresh:130357 minrtt:0.026 snd_wnd:2096640

1 Like

... those are mere defaults. Best explained here: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

1 Like

See here:

and here:

In short net.ipv4.tcp_ecn = 2 means accept ECN requested by the peer but do not actively try to negotiate ECN for outgoing connections. net.ipv4.tcp_ecn = 1 mean accept incoming ECN and actively request ECN for outgoing connections.
net.ipv4.tcp_ecn_fallback = 1 means what it says on the tin, if a nominal ECN connection misbehaves, demote it to not use ECN any longer. I have no idea how that is actually implemented.

2 Likes

Oh, I believe you, that is why I said I need to look a bit deeper once I am at home.

Thanks for the comment. I had come to the same conlusion but wanted to make sure that a pro like you confirms my humble observation.

1 Like

Useful BTW to debug effectiveness of TCP congestion control - another popular topic in optimizing or breaking (more probable) Linux performance.

Please note that these setting on the router only affect connections terminating at the router itself, these do not apply to forwarded packets. Hence the inconvenient need to individually configure every client to use ECN. To be clear, that mandatory opt-in is inconvenient but also safe by design, so all in all IMHO the right choice.

Sidenote: this is a hobby horse not my profession :wink:

2 Likes

Yeah, I remember those manuals of optimizing TCP congestion control in your network by setting BBR on... router - probably the only excuse of kmod-tcp-bbr's existance in OpenWrt package base.

I'm a brewer according to my first diploma :rofl:

2 Likes