Unstable PPPoE on Unifi Security Gateway

I have a Unifi Security Gateway that I am trying to set up as a replacement for the router issued by my ISP, KPN. They provide these instructions (in Dutch, page 6). As far as I understand it, it all boils down to running PPPoE with any username and password on VLAN 6.

I tried setting this up through Luci, and was able to get a connection... sort of. Here is the relevant part of /etc/config/network:

config device
        option type '8021q'
        option ifname 'eth0'
        option vid '6'
        option name 'eth0.6'

config interface 'wan'
        option device 'eth0.6'
        option proto 'pppoe'
        option username 'internet'
        option password 'internet'
        option ipv6 '1'
        option mtu '1500'

config interface 'wan6'
        option device 'pppoe-wan'
        option proto 'dhcpv6'
        option reqaddress 'try'
        option reqprefix 'auto'

When I ran a ping test, I saw a lot of packets being lost:

root@OpenWrt:~# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=60 time=3.030 ms
64 bytes from 1.1.1.1: seq=1 ttl=60 time=1.972 ms
64 bytes from 1.1.1.1: seq=2 ttl=60 time=3.100 ms
64 bytes from 1.1.1.1: seq=3 ttl=60 time=2.549 ms
64 bytes from 1.1.1.1: seq=5 ttl=60 time=2.485 ms
64 bytes from 1.1.1.1: seq=6 ttl=60 time=17.950 ms
64 bytes from 1.1.1.1: seq=8 ttl=60 time=3.640 ms
64 bytes from 1.1.1.1: seq=10 ttl=60 time=2.108 ms
^C                         
--- 1.1.1.1 ping statistics ---
11 packets transmitted, 8 packets received, 27% packet loss
round-trip min/avg/max = 1.972/4.604/17.950 ms

Shortly after that, the PPPoE connection dropped.

I found the following output from pppd in the logs:

Mon Jan 15 20:39:34 2024 daemon.notice netifd: Interface 'wan' is setting up now
Mon Jan 15 20:39:34 2024 daemon.info pppd[3318]: Plugin pppoe.so loaded.
Mon Jan 15 20:39:34 2024 daemon.info pppd[3318]: PPPoE plugin from pppd 2.4.9
Mon Jan 15 20:39:34 2024 daemon.notice pppd[3318]: pppd 2.4.9 started by root, uid 0
Mon Jan 15 20:39:34 2024 daemon.info pppd[3318]: PPP session is 19894
Mon Jan 15 20:39:34 2024 daemon.warn pppd[3318]: Connected to [redacted MAC 1] via interface eth0.6
Mon Jan 15 20:39:34 2024 daemon.info pppd[3318]: Using interface pppoe-wan
Mon Jan 15 20:39:34 2024 daemon.notice pppd[3318]: Connect: pppoe-wan <--> eth0.6
Mon Jan 15 20:39:35 2024 daemon.info pppd[3318]: Remote message: Authentication success,Welcome!
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: PAP authentication succeeded
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: peer from calling number [redacted MAC 1] authorized
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: local  LL address [redacted IPv6 1]
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: remote LL address [redacted IPv6 1]
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: local  IP address [redacted IPv4 1]
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: remote IP address [redacted IPv4 2]
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: primary   DNS address 195.121.1.34
Mon Jan 15 20:39:35 2024 daemon.notice pppd[3318]: secondary DNS address 195.121.1.66
Mon Jan 15 20:39:35 2024 daemon.notice netifd: Network device 'pppoe-wan' link is up
Mon Jan 15 20:39:35 2024 daemon.notice netifd: Interface 'wan' is now up
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: reading /tmp/resolv.conf.d/resolv.conf.auto
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using nameserver 195.121.1.34#53
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using nameserver 195.121.1.66#53
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for test
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for onion
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for localhost
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for local
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for invalid
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for bind
Mon Jan 15 20:39:35 2024 daemon.info dnsmasq[1]: using only locally-known addresses for lan
Mon Jan 15 20:39:35 2024 user.notice firewall: Reloading firewall due to ifup of wan (pppoe-wan)
Mon Jan 15 20:39:37 2024 user.notice firewall: Reloading firewall due to ifupdate of wan (pppoe-wan)
Mon Jan 15 20:39:41 2024 daemon.info pppd[3318]: No response to 5 echo-requests
Mon Jan 15 20:39:41 2024 daemon.notice pppd[3318]: Serial link appears to be disconnected.
Mon Jan 15 20:39:41 2024 daemon.info pppd[3318]: Connect time 0.2 minutes.
Mon Jan 15 20:39:41 2024 daemon.info pppd[3318]: Sent 228 bytes, received 0 bytes.
Mon Jan 15 20:39:42 2024 daemon.notice netifd: Network device 'pppoe-wan' link is down
Mon Jan 15 20:39:42 2024 daemon.notice netifd: Interface 'wan' has lost the connection

After this pppd was restarted, and it either ran into the same issue with echo-requests, or gave the following output:

Mon Jan 15 20:42:45 2024 daemon.notice netifd: Interface 'wan' is setting up now
Mon Jan 15 20:42:45 2024 daemon.info pppd[5221]: Plugin pppoe.so loaded.
Mon Jan 15 20:42:45 2024 daemon.info pppd[5221]: PPPoE plugin from pppd 2.4.9
Mon Jan 15 20:42:45 2024 daemon.notice pppd[5221]: pppd 2.4.9 started by root, uid 0
Mon Jan 15 20:43:00 2024 daemon.warn pppd[5221]: Timeout waiting for PADO packets
Mon Jan 15 20:43:00 2024 daemon.err pppd[5221]: Unable to complete PPPoE Discovery
Mon Jan 15 20:43:00 2024 daemon.info pppd[5221]: Exit.
Mon Jan 15 20:43:00 2024 daemon.notice netifd: Interface 'wan' is now down

At this point, I reverted to the vendor firmware and entered the same configuration: PPPoE over VLAN 6. This worked immediately.

I then went back to OpenWRT and tried setting up the connection manually, i.e., by removing the configuration in /etc/config/network, rebooting, and running

ip link add link eth0 name eth0.6 type vlan id 6
ip link set dev eth0 up
ip link set dev eth0.6 up
pppd logfd 1 debug noauth nodetach ifname pppoe-wan user internet password internet mtu 1500 mru 1500 plugin pppoe.so lcp-echo-interval 1 lcp-echo-failure 5 lcp-echo-adaptive +ipv6 nic-eth0.6

This still gave the same results: either the connection drops after five missed failed echoes, or the discovery phase fails. Here are the different kind of outputs I got from pppd with debug enabled

Missed echo requests
Plugin pppoe.so loaded.
PPPoE plugin from pppd 2.4.9
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src [redacted MAC 1]
 [service-name] [host-uniq  00 00 06 d6]
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src [redacted MAC 1]
 [service-name] [host-uniq  00 00 06 d6]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 34
 dst [redacted MAC 1]  src [redacted MAC 2]
 [service-name] [host-uniq  00 00 06 d6] [AC-name <redacted IPv4 1>] [end-of-list]
Send PPPOE Discovery V1T1 PADR session 0x0 length 12
 dst [redacted MAC 2]  src [redacted MAC 1]
 [service-name] [host-uniq  00 00 06 d6]
Recv PPPOE Discovery V1T1 PADS session 0xfa24 length 16
 dst [redacted MAC 1]  src [redacted MAC 2]
 [service-name] [host-uniq  00 00 06 d6] [end-of-list]
PADS: Service-Name: ''
PPP session is 64036
Connected to [redacted MAC 2] via interface eth0.6
using channel 3
Using interface pppoe-wan
Connect: pppoe-wan <--> eth0.6
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x3075e4a6>]
rcvd [LCP ConfReq id=0x2 <mru 1500> <auth pap> <magic 0x9cd5b684>]
sent [LCP ConfAck id=0x2 <mru 1500> <auth pap> <magic 0x9cd5b684>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x3075e4a6>]
sent [LCP EchoReq id=0x0 magic=0x3075e4a6]
sent [PAP AuthReq id=0x1 user="internet" password=<hidden>]
rcvd [LCP EchoRep id=0x0 magic=0x9cd5b684]
rcvd [PAP AuthAck id=0x1 "Authentication success,Welcome!"]
Remote message: Authentication success,Welcome!
PAP authentication succeeded
peer from calling number [redacted MAC 2] authorized
sent [IPCP ConfReq id=0x1 <addr 0.0.0.0>]
sent [IPV6CP ConfReq id=0x1 <addr [redacted IPv6 1]>]
rcvd [IPV6CP ConfReq id=0x1 <addr [redacted IPv6 2]>]
sent [IPV6CP ConfAck id=0x1 <addr [redacted IPv6 2]>]
rcvd [IPCP ConfReq id=0x1 <addr [redacted IPv4 1]>]
sent [IPCP ConfAck id=0x1 <addr [redacted IPv4 1]>]
rcvd [IPCP ConfNak id=0x1 <addr [redacted IPv4 2]>]
sent [IPCP ConfReq id=0x2 <addr [redacted IPv4 2]>]
rcvd [IPV6CP ConfAck id=0x1 <addr [redacted IPv6 1]>]
local  LL address [redacted IPv6 1]
remote LL address [redacted IPv6 2]
rcvd [IPCP ConfAck id=0x2 <addr [redacted IPv4 2]>]
local  IP address [redacted IPv4 2]
remote IP address [redacted IPv4 1]
No response to 5 echo-requests
Serial link appears to be disconnected.
Connect time 0.2 minutes.
Sent 228 bytes, received 0 bytes.
sent [LCP TermReq id=0x2 "Peer not responding"]
sent [LCP TermReq id=0x3 "Peer not responding"]
Connection terminated.
Connect time 0.2 minutes.
Sent 228 bytes, received 0 bytes.
Modem hangup
Failed discovery
Plugin pppoe.so loaded.
PPPoE plugin from pppd 2.4.9
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src [redacted MAC]
 [service-name] [host-uniq  00 00 06 d3]
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src [redacted MAC]
 [service-name] [host-uniq  00 00 06 d3]
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src [redacted MAC]
 [service-name] [host-uniq  00 00 06 d3]
Timeout waiting for PADO packets
Unable to complete PPPoE Discovery

I also checked if there was a duplex mismatch of some sort, but that does not seem to be the case:

Output from ethtool
root@OpenWrt:~# ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
                                1000baseX/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
                                1000baseX/Full 
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Full 
                                             100baseT/Full 
                                             1000baseT/Full 
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 7
        Transceiver: external
        Auto-negotiation: on
        Link detected: yes

Reproducing this same configuration my laptop (with the same commands) gives me a fully working connection.

The issue also pops up whether I run OpenWRT snapshot or 23.05.2.

I also found this thread from a year ago that describes very similar symptoms. The configuration there is almost the same, and does not make a difference for me either. Sadly, the OP there seems to have given up. It is not clear whether they got it working with OpenWRT on another device.

I have run out of things to try. Does anyone have an idea?

1 Like

see if https://openwrt.org/docs/guide-user/network/wan/isp-configurations#netherlands helps.

1 Like

Try adding option keepalive '0 1' to the wan section. This should disable disconnection if the ISP does not answer keepalive echo requests.

Also check that the physical Ethernet link is running at the expected speed (1 Gb). If the Ethernet cable is bad you can get inconsistent results with different routers.

It might be MTU related. I've specifically set the MTU for the bare interface to 1512, the VLAN interface to 1508 and the PPPoE interface to 1500.

Here's my working /etc/config/network for OpenWRT on KPN, but not with a USG :slight_smile:
Obviously change eth3 to eth0 for your case.

config interface 'pppoe0'
	option proto 'pppoe'
	option device 'eth3.6'
	option username 'KPN'
	option password 'KPN'
	option ipv6 'auto'
	option metric '1'
	option pppd_options 'debug'
	option mtu '1500'

config interface 'pppoe0_6'
	option proto 'dhcpv6'
	option device '@pppoe0'
	option reqaddress 'none'
	option reqprefix '48'
	option metric '1'

config device
	option name 'pppoe-pppoe0'
	option mtu '1500'

config device
	option type '8021q'
	option ifname 'eth3'
	option vid '6'
	option name 'eth3.6'
	option mtu '1508'

config device
	option name 'eth3'
	option mtu '1512'

Thanks everyone for your suggestions! I'll find some time later this week, or on the weekend, to run more experiments.

Thanks again for all the pointers. I got around to experimenting some more tonight.

I originally based my configuration off this, but I went back and tried to replicate the configuration as closely as I could, down to the custom DNS servers (although I changed option device 'pppoe-kpn' to option device 'pppoe-wan' to make the names match up). Unfortunately, I couldn't see any improvement.

This helps because it stabilises the link - i.e., it is no longer restarted when there is no response to an echo request. However, the packet loss observed from ping persists, even when I ping the gateway instead of Cloudflare.

With the connection now stable, speedtest.net gives me an up/down of about 80Mbps. I think this may also be due to the packet loss.

I do not have a cable tester, but I am using the cable supplied by KPN. When I use this in combination with their router I get near-gigabit speeds, so it is at least capable of it. The same happens when I set up the stock firmware to do PPPoE on the USG.

When I run ethtool it also says that eth0 is configured for 1000Mbps full duplex.

I tried replicating this as closely as possible as well, but no dice. I'm very rusty on my ethernet, but does the MTU matter for very small packets, like the ICMP echo requests produced by ping?

I'm now considering to try and snoop on the traffic between the router and KPN to see if the lost packets are being sent at all... but I'm not sure what the best way to do that would be.

Today I set up a PPPoE server on my laptop in an effort to try and replicate the problem, with my best guess as a configuration. The USG running OpenWRT is able to connect to it without any problems. The LCP echo requests and replies all seem to work. So for some reason, the problem only occurs between the USG and the KPN fiber box, and only when the USG is running OpenWRT.

I bought the USG second-hand, and I do not have the original power supply. I thought the power supply I was using might not be good enough. So I swapped the adapters for the KPN fiber box and the USG (both 12V/1A) and that has not made a difference either.

I also ran tcpdump on an outside server while I pinged it from behind the USG. It seems that the packet loss I am experiencing is asymmetric: some ICMP echo requests do not arrive at the server, but the ICMP echo responses always arrive back home. I do not know if that is relevant, but it seems strange.

It'd be great if I could use this device with OpenWRT, but I'm starting to lose hope. Could this be a driver issue? I'd be up for trying to compile the kernel myself and see if I can get a custom image running to debug - but it'd be good to have some idea of where to look.

Because the stock firmware of the USG is Linux-based, my attention has now turned towards trying to figure out how the stock firmware differs from OpenWRT.

The first thing I tried was to retrieve the pppd configuration from the stock firmware and using that on OpenWRT. That did not make a difference.

I then looked at the kernel modules loaded on the stock firmware.

lsmod output
ubnt@ubnt:~$ lsmod
Module                  Size  Used by
pppoe                  10378  2
pppox                   1922  1 pppoe
ppp_generic            24111  6 pppoe,pppox
slhc                    5234  1 ppp_generic
8021q                  18812  0
garp                    5710  1 8021q
stp                     1829  1 garp
llc                     4017  2 stp,garp
ipt_MASQUERADE          1706  2
xt_set                  5792  4
xt_LOG                 12563  1
xt_conntrack            2961  4
xt_comment               939  21
xt_TCPMSS               3631  4
xt_tcpudp               2519  7
ip6table_mangle         1788  1
ip6table_filter         1356  1
ip6table_raw            1280  1
ip6_tables             16877  3 ip6table_filter,ip6table_mangle,ip6table_raw
iptable_nat             2950  1
nf_conntrack_ipv4       7878  5
nf_defrag_ipv4          1283  1 nf_conntrack_ipv4
nf_nat_ipv4             3904  1 iptable_nat
iptable_mangle          1672  1
xt_CT                   4171  4
iptable_raw             1356  1
nf_nat_pptp             1978  0
nf_conntrack_pptp       4368  1 nf_nat_pptp
nf_conntrack_proto_gre     4623  1 nf_conntrack_pptp
nf_nat_h323             6087  0
nf_conntrack_h323      41900  1 nf_nat_h323
nf_nat_sip              8661  0
nf_conntrack_sip       23229  1 nf_nat_sip
nf_nat_proto_gre        1509  1 nf_nat_pptp
nf_nat_tftp              958  0
nf_nat_ftp              1804  0
nf_nat                 13634  9 nf_nat_ftp,nf_nat_sip,ipt_MASQUERADE,nf_nat_proto_gre,nf_nat_h323,nf_nat_ipv4,nf_nat_pptp,nf_nat_tftp,iptable_nat
nf_conntrack_tftp       3929  1 nf_nat_tftp
nf_conntrack_ftp        7502  1 nf_nat_ftp
nf_conntrack           62676  18 nf_nat_ftp,nf_nat_sip,xt_CT,nf_conntrack_proto_gre,ipt_MASQUERADE,nf_nat,nf_nat_h323,nf_nat_ipv4,nf_nat_pptp,nf_nat_tftp,xt_conntrack,nf_conntrack_ftp,nf_conntrack_sip,iptable_nat,nf_conntrack_h323,nf_conntrack_ipv4,nf_conntrack
_pptp,nf_conntrack_tftp
iptable_filter          1432  1
ip_tables              16599  4 iptable_filter,iptable_mangle,iptable_nat,iptable_raw
x_tables               19172  16 ip6table_filter,xt_CT,ip6table_mangle,xt_comment,ip_tables,xt_tcpudp,ipt_MASQUERADE,xt_conntrack,xt_LOG,xt_set,iptable_filter,ip6table_raw,xt_TCPMSS,iptable_mangle,ip6_tables,iptable_raw
ip_set_hash_net        21210  8
ip_set                 22410  2 ip_set_hash_net,xt_set
nfnetlink               3789  1 ip_set
configfs               24851  1
unifigpio               6128  0
unifihal               58054  0
cvm_ipsec_kame         36927  0
ipv6                  359928  36 ip6table_mangle,cvm_ipsec_kame
imq                     6288  0
cavium_ip_offload     216183  0
ubnt_nf_app            10220  1 cavium_ip_offload
tdts                  549314  2 cavium_ip_offload,ubnt_nf_app
octeon_rng              1794  0
rng_core                3928  2 octeon_rng
octeon_ethernet        54284  1 cavium_ip_offload
mdio_octeon             3619  1 octeon_ethernet
ethernet_mem            3960  1 octeon_ethernet
octeon_common           2400  1 octeon_ethernet
of_mdio                 2726  2 octeon_ethernet,mdio_octeon
ubnt_platform         283665  0
libphy                 19239  4 ubnt_platform,octeon_ethernet,mdio_octeon,of_mdio

It looks like there are a bunch of octeon-specific modules in there. I suppose that could make a difference. Perhaps the ethernet driver shipped by Ubiquiti does something differently.

As a shot in the dark, I finally tried booting the stock kernel with an OpenWRT userland; unsurprisingly, that did not work either.

I noticed that when I put the newest firmware on the USG, it also had degraded performance with PPPoE, but not in the same way. Specifically, the packet loss that I see on OpenWRT is not there, but speedtest.net gives me 100Mbit/s speeds instead of 1Gbit/s. Take that observation with a grain of salt though - maybe there is an option that the previous owner toggled through the controller that is off by default.

I did find a copy of the GPL parts of the USG firmware source, containing the drivers mentioned in the previous post. But I'm way out of my depth in trying to port these to OpenWRT. For one thing, their kernel is based on 3.10, which is more than 10 years old, so I suspect much has changed since then.

So that's the end of the road for me; I think I'll just run OpenWRT on the USG and cope with the packet loss.

Is the CPU perhaps maxing out? The performance you describe sounds about right for software only routing. I don’t know whether hardware offloading is implemented for this chipset, but if so, have you enabled it?

Hmm, that's a good point. I can't check this until the weekend, but I'm decently confident that hardware offloading is probably not implemented for the octeon chipset of the USG on OpenWRT. So that would explain the performance drop.

The ping packet loss only on OpenWRT, over PPPoE and between the USG and my ISPs modem is still very strange to me though. It happens even under the lightest possible load. So I don't think that is explained by the absence of offloading.

By the way, I took another look at the GPL sources for the stock firmware. I am not a kernel hacker by any means, but it seems that they put in a bunch of hooks like cvm_ipfwd_rx_hook which are then implemented by a proprietary module. Adapting that into OpenWRT, if it can be done, is going to cost a lot of time.

Packet loss would be expected if the CPU maxes out, but a regular non-flood ping shouldn’t do that. It’s worth running htop during testing to see what’s going on all the same.
You may want to try software offloading as well, I believe it typically improves performance.
I’m also no kernel hacker, but I believe the way hardware offloading was done pre kernel support (4.14?) is unlikely to integrate cleanly or at all.

OK - I'll monitor the CPU load the next time I experiment with the device, and I'll also give software offloading a try. Thanks!

I had the time to run some more experiments.

CPU usage when running a ping is very low, as expected. CPU usage when running a speed test maxes out a single core. This happens even if I set the CPU affinity of one of the ethernet IRQs to the other core.

Enabling software offload has a measurable impact on performance - I went from 97Mbps down / 107Mbps up to 127Mbps down / 340Mbps up. So that's great! Unfortunately the packet loss is still there, but I think I'll deal with it for now (and maybe find another device in the future).

The packet loss seems weird to me, I wonder what's going on. I've never tried this, but perhaps you can capture the outgoing pings on the LAN, while also capturing the outgoing PPPoE packets - if this is due to data dependent packet drops or corruption, then I'm sure there are devs who'd like to know.

I use an EdgeRouter X to route our symmetrical 1gps fiber connection. It uses an MT7621 SoC, which doesn't even break a sweat routing at line rate with HW offloading enabled. Pretty impressive for a device that runs on ~2W. The connection is PPPoE on a VLAN, so I imagine it would perform just as well for you.

At least in North America these things frequently pop up used for next to no money.

I can try looking into this. I made some packet dumps when trying to debug this in the past, but nothing has stood out. But I suppose it's good to try and document in case I missed something.

That's very interesting. I think I went with the USG because it offered an easier way to install OpenWRT (just write it to the USB stick), but given how far I've come I suppose I might as well look into the ER-X a bit more. I did not know that OpenWRT supports hardware offloading for that device. Thanks for the recommendation!

I’m no expert, but it seems to me the MT7621/22/28 are quite well supported for HW offloading. The ‘21 doesn’t have per-flow metrics, and generally I believe HW offloading drops QoS and such. Still, these are IMHO impressive devices.
From memory the ER-X installation process was reasonably straightforward, and I believe the recovery options are solid in case of goofs…

I got a second-hand ER-X SFP and it works a treat. As you said, the installation was doable, it has no problem at all forwarding at 1Gbps, and PPPoE over VLAN works fine. Thanks again for the hint!

I also made some packet captures of the lost pings, but I'm a bit uncomfortable posting those on a public forum. If any developer is interested, please message me privately and I can send them to you.

Awesome, glad it works for you. The MT7621 is even able to route 2gps under the right conditions (which I don’t quite understand), e.g. 1gps up concurrently with 1gps down. This might require you to use the SPF port on the ER-X SPF, though I’m not sure about that…

There are two more things that I will try.

  1. I have a friend who also has a KPN connection of the same kind, and he kindly offered to let me try and reproduce the problem at his place.

  2. I also have a managed switch that I can try and put in between the USG and the KPN fiber modem to try and capture the traffic going both ways. Because all pings arrive at their destination (see this post), my current hunch is that response packets are being sent back, but get dropped somehow by the USG. I am not sure if sniffing traffic like this is even supposed to work (maybe the L2 stuff is all handled by hardware and not visible to tcpdump), but I suppose it is worth a try.