Incoming TCP MSS clamping and large packets in outgoing direction

So, what should I check in my linux devices to make that their TCP stack is respecting this 540 MSS limit?

Or should I create an interface with lower MTU and route outgoing packets through it? (So that TCP has no other option than to comply with it) Would that cause network failure?

my incoming mss is ok, 1404.
I want my outgoing mss to be lower than that, thus I must intercept the incoming TCP handshake packets to lower the MSS. (which I think it's done properly, but it seems that some applications ignore it)

« SpeedGuide.net TCP Analyzer Results » 
Tested on: 2024.10.01 10:13 
IP address: 
Client OS/browser: Linux (Firefox 115.0) 
 
TCP options string: 0204057c0101040201030306 
MSS: 1404 
MTU: 1444 
TCP Window: 47808 (not multiple of MSS) 
RWIN Scaling: 6 bits (2^6=64) 
Unscaled RWIN : 747 
Recommended RWINs: 64584, 129168, 258336, 516672, 1033344 
BDP limit (200ms): 1912 kbps (191 Kilobytes/s) 
BDP limit (500ms): 765 kbps (76 Kilobytes/s) 
MTU Discovery: ON 
TTL: 49 
Timestamps: OFF 
SACKs: ON 
IP ToS: 00000000 (0) 

It should go without saying that this is not the same as official openwrt. Is there a reason you need to run Turris OS? Have you tried with official openwrt? It is possible that the advice here is not directly applicable to the turris fork which works considerably differently than the way the official firmware does.

I would just set the MSS in the firewall for both directions and then run a TCP capacity test on an endpoint and do packet capture there as well, you really want to see TCP packet sizes around 540 plus some overhead...

I managed to make some progress on this.
In client device the packets were aggregated and they were send in multiples of 540.
The culprit was the default TCP setting
net.ipv4.tcp_min_tso_segs=2
When I set this to 1, tcpdump on client device reports that packets are all <540 now.

However when I perform the same tcpdump on router I see some packets are aggregated again. Client is connected from wifi and the router is using a bridge to aggregate 2 wlan nics (LANWIFI)

10:27:52.196183 IP clientip.port > server1.port: Flags [.], seq 72361:72901, ack 68, win 182, length 540
10:27:52.211117 IP clientip.port > server2.port: Flags [.], seq 136081:136621, ack 136, win 182, length 540
10:27:52.233399 IP clientip.port > server2.port: Flags [.], seq 136621:137161, ack 136, win 182, length 540
10:27:52.240156 IP clientip.port > server2.port: Flags [.], seq 137161:137701, ack 136, win 182, length 540
10:27:52.243860 IP clientip.port > server1.port: Flags [.], seq 73981:75061, ack 68, win 182, length 1080
10:27:52.249732 IP clientip.port > server2.port: Flags [.], seq 137701:138241, ack 136, win 182, length 540
10:27:52.260070 IP clientip.port > server3.port: Flags [.], seq 150121:150661, ack 170, win 201, length 540
10:27:52.260070 IP clientip.port > server3.port: Flags [.], seq 151201:151741, ack 170, win 201, length 540
10:27:52.271144 IP clientip.port > server4.port: Flags [.], seq 120421:120961, ack 187, win 180, length 540
10:27:52.286884 IP clientip.port > server4.port: Flags [.], seq 120961:121501, ack 187, win 180, length 540
10:27:52.286989 IP clientip.port > server4.port: Flags [.], seq 121501:122041, ack 187, win 180, length 540

How can I check if the wifi nic of router aggregates packets? (and maybe disable it?)

You climbing wrong tree, you insist outgoung packets exit unmodified. Nothing to do with network offload.

I tried to DROP or REJECT large outgoing packets, however application performance is hurt, because there is no REJECT message in iptables to force fragmentation on client.
MSS clamping is enforced also on incoming packets, it's 1452.

let me try to picture it via ascii so it's clearer:

Internet -> router (packets should be 1452 wide)
router -> internet (packets should be 580 wide)

When a TCP connection is initialized, I clamp both parts.
Router replies to the internet: "Send me 1452 sized packets please" and when internet packets arrive, tell to router & clients, "send me 580 sized packets".

The physical mtu of the wan nic is 1452.
it's not a coincidence that large packets that I see are 2*540 and not some random higher value.

If I get it wrong, please elaborate more.

How that? That is quite uncommon for a MTU?

Maybe disable GSO GRO on the router? (and TSO, UFO,...)

Well,

I recall something like this:

1492 is not optimal for ppoe-wan

This only applies to ADSL links, or more precise to links using ATM/AAL5 encapsulation.

If that is true for your link, ignore me, otherwise reconsider the choice of MTU, please.

this is what I get from my modem:

 > adsl info --show
adsl: ADSL driver and PHY status
Status: Showtime
Retrain Reason: 8000
Last initialization procedure status:   0
Max:    Upstream rate = 1156 Kbps, Downstream rate = 10712 Kbps
Bearer: 0, Upstream rate = 1023 Kbps, Downstream rate = 9014 Kbps

Link Power State:       L0
Mode:                   ADSL2+ 
TPS-TC:                 ATM Mode
Trellis:                U:ON /D:ON
Line Status:            No Defect
Training Status:        Showtime

No clue how to interpret that.

This implies ATM/AAL5... you can try to empirically deduce this:

but it looks like your MTU has a rational basis :wink: but please check the exact encapsulation following the link above, as the whole logic depends on knowing exactly how large the overhead truly is.

1 Like

Thanks I will try it in the upcoming days.

Meanwhile I disabled gso on router and now client's packets are small!!!

ethtool -K wlan0 gro off

in the next days I will debug other clients as well and report back.

Hi, I did the analysis on my line, can you please help me to interpret?
(sqm was disabled, and the default mtu was left in the interface for both directions, 1492)
If I understand correctly atm is used and the overhead is 40 bytes?

Now I read the text...

Elapsed time is 342.214 seconds.
Minimum size of ping payload used: 16 bytes.
Saved figure (1) to: 
lower bound estimate for one ATM cell RTT based of specified up and downlink is 0.4971 ms.
estimate for one ATM cell RTT based on linear fit of the ping sweep data is 0.4971 ms.
Starting brute-force search for optimal stair fit, might take a while...
Best staircase fit cumulative difference is: 7.6075
Best linear fit cumulative difference is: 17.5165
Quantized ATM carrier LIKELY (cummulative residual: stair fit 7.6075 linear fit 17.5165
remaining ATM cell length after ICMP header is 13 bytes.
ICMP RTT of a single ATM cell is 0.4971 ms.

Estimated overhead preceding the IP header: 40 bytes
Saved figure (2) to:

According to http://ace-host.stuart.id.au/russell/files/tc/tc-atm/
40 bytes overhead indicate
Connection: PPPoE, LLC/SNAP RFC-2684
Protocol (bytes): PPP (2), PPPoE (6), Ethernet Header (14), ATM LLC (3), ATM SNAP (5), ATM pad (2), ATM AAL5 SAR (8) : Total 40

Add the following to both the egress root qdisc:
A) Assuming the router connects over ethernet to the DSL-modem:
stab mtu 2048 tsize 128 overhead 40 linklayer atm

Add the following to both the ingress root qdisc:

A) Assuming the router connects over ethernet to the DSL-modem:
stab mtu 2048 tsize 128 overhead 40 linklayer atm

Elapsed time is 362.963 seconds.
Done...

Yes, overhead 40 bytes is what the analysis confirmed, which is a quite common encapsulation.

That leaves us with

ceil((1492+40)/48) * 53 = 1696
But since ((1492+40)/48) = 31.9166666667 this means each packet carries a bit of padding after being split into ATM cells. If you want to avoid that you need to set the MTU for pppoe-wan to:
31 * 48 = 1488 Byte = MTU + overhead -> MTU = 1488 - 40 = 1448
((1448+40)/48)*53 = 1643

So maybe you should set the pppoe-wan MTU to 1448 if you want to wast no capacity on padding (at least for packets of maximum size as a common for bulk transfers)...
((1452+40)/48) = 31.0833333333 that is just over 31 cells so the last cell is clearly almost empty and hence wasted, compared to this I would argue the default MTU of 1492 is better...

Personally, I would try to figure out what options I have to move away from ATM/AAL5...
I believe that AT&T switched their ADSL links over from ATM/AAL5 to PTM, which from our perspective is a much saner encapsulation....

Sidenote: since encapsulations are not universal, advice as sonic's will not necessarily transfer well between different ISPs... personally I did never bother with MTU games even when on ADSL, I did however take care for proper overhead and ATM cell quantisation.

1 Like

I can modify my modem settings.
The encapsulation before was LLC/SNAP_BRIDGING.
Should I retry with VC/MUX?
What about service category?
I am not located in US, but somewhere that half of the city is stuck with ADSL only :frowning:

It is set for the infrastructure at other end. You will just lose connection.

1 Like

As @brada4 wrote it takes two to tango here. Typically your ISP sets the encapsulation and you can use the exact settings or loose internet connectivity. However some ISPs do some form of autonegotiation, e.g. in the UK you can use either pppoe or pppoa on the same link and the server side autodetecs this...
But while IPoA would be quite attractive, and pppoa would be slightly better than pppoe (better here denoting smaller per packet overhead) this is IMHO not worth stressing about...