So the problem typically is that the true bottleneck (be it a physical link, be it a traffic shaper somewhere in the ISP's network edge) has a true gross rate and a true minimal per-packet-overhead, but end-users typically do not know any of these two values with sufficient precision to make precise predictions what values to set in SQM. The recommendation is to tentatively underestimate the gross rate and overestimate the per-paket-overhead to keep bufferbloat low...
Let's assume we know the true rate (100 Mbps) and the true per-packet-overhead (100B, on top of the TCP IP overhead), with MTU (1500B) packets size, the values are unrealistic but help to illustrate the issue. The achievable TCP/IPv4-goodput (what typical speedtests measure, and what endusers consider ot be the speed of their link, and what, in the EU, ISPs are allowed to advertise) calculates like this:
true_rate * ((TCP/IPv4 payload size) / (packet size on bottleneck link))
100 * ((1500-20-20)/(1500+100)) = 91.25 Mbps
now if we reduce the gross shaper rate to 95 Mbps:
95 * ((1500-20-20)/(1500+100)) = 86.7 Mbps
or if we increase the per-packet overhead to
100 * ((1500-20-20)/(1500+184)) = 86.7 Mbps
in both cases we would send the same amount of traffic over the bottleneck (same goodput), but we reached that state by adjusting separate parameters. If the link only carries a true 86.7 Mbps we will have managed bufferbloat successfully for packets of size 1500.
Now, let's assume that the true per-packet-overhead is truly a ridiculous 184 bytes, but we reduce the MTU from 1500 to 300B, we will get:
95 * ((300-20-20)/(300+100)) = 61.75 Mbps
and
100 * ((300-20-20)/(300+184)) = 53.72 Mbps
respectively. Since 61.75 is larger than the true capacity of the link for 300B packets, the first setting now will produce bufferbloat, while the second does not.
Without knowing at least on of the parameters with certainty, we are reduced to a game of educated guesses. I hope this illustrates the issue...
The obvious sanity check would be to first run a speedtest with the normal MTU/MSS and then confirm the SQM settings optimised for that condition after using MSS clamping to reduce the MTU to something small. Here is a snippet that should work in /etc/firewall.user
:
# special rules to allow MSS clamping for in and outbound traffic
# use ip6tables -t mangle -S ; iptables -t mangle -S to check
forced_MSS=216
# affects both down- and upstream, egress seems to require at least 216 (on macos)
iptables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment "custom: Zone wan MTU fixing" -j TCPMSS --set-mss ${forced_MSS}
ip6tables -t mangle -A FORWARD -p tcp -m tcp --tcp-flags SYN,RST SYN -m comment --comment "custom6: Zone wan MTU fixing" -j TCPMSS --set-mss ${forced_MSS}
Depending on your client OS and the servers you might need to play with the actual forced_MSS setting, as different OS have different minimal MSS they accept. I had ot resort to take packet captures on the router to confirm for a given speedtest server that TCP packets did honor the clamped MSS...