PPPoE disconnects every few hours

Okay, so this means that your side of the PPPoE tunnel is shutting down, leaving the question, why ;).

Ah thanks, yes this rather looks like GPON, but I had thought that in GPON the downstream packets are encrypted individually for each user, so your neighbourrs PADTs should never ever show up readable on your link.

Mmmh, 44:4E:6D AVM Audiovisuelles Marketing und Computersysteme GmbH, so it seems Nokia wants to talk to a Fritzbox somewhere, you do not happen to use a secondary fritzbox as VoIP-box inside your network? Just asking, probably not...

I think we are looking at a red herring. These bad packets are received and acted upon because the interface is in the promiscuous mode, and this is because you didn't add the "-p" option to tcpdump. On the other hand, the earlier "ifconfig" output says that, without tcpdump, the interface is not in promiscuous mode. This is really confusing. Please always use the "-p" option.

Anyway, we already have a verbose tcpdump log with the "-p" option, and it does not show the PADT message. Perhaps because of the wrong filter? How about:

tcpdump -evpni eth1.3902 '(not ether[0x14:2] = 0x21 and not ether[0x14:2] = 0x57) or pppoed'

1 Like

No.
The original ISP ruter (not connected currently) has 6C:38:xxxx

So nothing like that on my side.

Yes, but this is GPON, and there the downstream should be individually encrypted per subscriber/ONT, so these PADTs for MACs not on his network should never reach him at all I believe?

I'll try that.

Apparently ifconfig does not show if eth1 is in promiscuous mode or not.
Neither does ip link.

I started 'tcpdump -e -v -i eth1.3902 pppoed' for a few seconds to trigger promiscuous mode, and then compared the output of the above commands. There was no difference (except the packet counts).

ip monitor link showed up a line for start and stop, but no details hinting at promiscuous mode:

[2020-04-29 14:43:24] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
[2020-04-29 14:43:24]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 14:43:24] 48: eth1.3902@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
[2020-04-29 14:43:24]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 14:43:38] 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default
[2020-04-29 14:43:38]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff
[2020-04-29 14:43:38] 48: eth1.3902@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
[2020-04-29 14:43:38]     link/ether c4:XX:XX:XX:XX:ed brd ff:ff:ff:ff:ff:ff

only syslog said the promiscuous mode was activated and then deactivated.

Wed Apr 29 14:43:24 2020 kern.info kernel: [853998.131984] device eth1.3902 entered promiscuous mode
Wed Apr 29 14:43:24 2020 kern.info kernel: [853998.137162] device eth1 entered promiscuous mode
Wed Apr 29 14:43:38 2020 kern.info kernel: [854012.994273] device eth1.3902 left promiscuous mode
Wed Apr 29 14:43:38 2020 kern.info kernel: [854012.999393] device eth1 left promiscuous mode

Anyway, the problem happens regardless if I run tcpdump or not.

Also:

There Dianne Skoll says that in their code, the PADT packet is checked to have the correct (our) ethHdr.h_dest

I made a quick look at the kernel pppoe.c and there is no such check (I might have missed it though).

My take under current findings is:

  • ISP sends PADT to another party, but on "my" line; it shouldn't, but that's life...
  • the net if in my router picks it up; it shouldn't, but that's life...
  • the PPPoE code picks it up (it shouldn't, but that's life...) and then terminates the connection

If that is correct, my options are:

  • switch ISP (the previous I had did not use PPPoE, just plain E... sweet memories...)
  • fix kernel pppoe code to ignore PADT packets sent to others
  • fix netif code? netif firmware?
  • filter out the bogus packets (maybe with ebtables)

As for fresh data:

output of : tcpdump -e -v -p -i eth1 vlan 3902 and pppoed

[2020-04-29 12:41:56] tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
[2020-04-29 13:49:13] 13:49:13.163623 a4:7b:2c:9e:c7:44 (oui Unknown) > 44:4e:6d:fd:c7:39 (oui Unknown), ethertype 802.1Q (0x8100), length 101: vlan 3902, p 1, ethertype PPPoE D, PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq 0x444E6DFDC739AAAAAAAAB9000000AAAAAAAA4621426F78202020AAAAAAAA] [AC-Cookie ".5b.u...~.F.mlKv"]

and simultaneously running: tcpdump -evpni eth1.3902 (not ether[0x14:2] = 0x21 and not ether[0x14:2] = 0x57) or pppoed

13:49:13.163623 a4:7b:2c:9e:c7:44 > 44:4e:6d:fd:c7:39, ethertype PPPoE D (0x8863), length 97: PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq 0x444E6DFDC739AAAAAAAAB9000000AAAAAAAA4621426F78202020AAAAAAAA] [AC-Cookie ".5b.u...~.F.mlKv"]
13:49:13.168608 a4:7b:2c:9e:c7:44 > 44:4e:6d:fd:c7:39, ethertype PPPoE S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 21: LCP, Conf-Request (0x01), id 120, length 21
13:49:13.170601 a4:7b:2c:9e:c7:44 > 44:4e:6d:fd:c7:39, ethertype PPPoE D (0x8863), length 58: PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq 0x444E6DFDC739AAAAAAAAB9000000AAAAAAAA4621426F78202020AAAAAAAA]
14:10:11.180520 a4:7b:2c:9e:c7:44 > 44:4e:6d:fd:c7:39, ethertype PPPoE D (0x8863), length 97: PPPoE PADO [Service-Name] [AC-Name "SIMB_TABOR_BNG1"] [Host-Uniq 0x444E6DFDC739AAAAAAAABB000000AAAAAAAA4621426F78202020AAAAAAAA] [AC-Cookie ".5b.u...~.F.mlKv"]
14:10:11.182514 a4:7b:2c:9e:c7:44 > 44:4e:6d:fd:c7:39, ethertype PPPoE S (0x8864), length 56: PPPoE  [ses 0x1] LCP (0xc021), length 21: LCP, Conf-Request (0x01), id 75, length 21
14:10:11.184493 a4:7b:2c:9e:c7:44 > 44:4e:6d:fd:c7:39, ethertype PPPoE D (0x8863), length 58: PPPoE PADS [ses 0x1] [Service-Name] [Host-Uniq 0x444E6DFDC739AAAAAAAABB000000AAAAAAAA4621426F78202020AAAAAAAA]

(note: I filtered out packets sent to or from my MAC)

So I get packets sent to someone else, even if not having eth1 in promiscuous mode.

ebtables could work, but then you should create a bridge and add eth1.3902 to it, and specify that bridge as the physical wan interface.

One more thing about promiscuous mode:

# cat /sys/class/net/eth1/flags
0x1003
# tcpdump -i eth1.3902 &
# cat /sys/class/net/eth1/flags
0x1103
# kill tcpdump
# cat /sys/class/net/eth1/flags
0x1003

So it is recorded internally, just the ifconfig and ip commands don't show it

1 Like

Mystery solved.

I added a debug output to kernel pppoe.c function pppoe_disc_rcv:

pr_info("pppoe: pppoe_disc_rcv /my patch/  PADT received, sid=%d, SRC: %02x:%02x:%02x:%02x:%02x:%02x, DST: %02x:%02x:%02x:%02x:%02x:%02x\n",
ph->sid,
eth_hdr(skb)->h_source[0],
eth_hdr(skb)->h_source[1],
eth_hdr(skb)->h_source[2],
eth_hdr(skb)->h_source[3],
eth_hdr(skb)->h_source[4],
eth_hdr(skb)->h_source[5],

eth_hdr(skb)->h_dest[0],
eth_hdr(skb)->h_dest[1],
eth_hdr(skb)->h_dest[2],
eth_hdr(skb)->h_dest[3],
eth_hdr(skb)->h_dest[4],
eth_hdr(skb)->h_dest[5]

);

When the problem happens, it prints:

pppoe: pppoe_disc_rcv /my patch/  PADT received, sid=1, SRC: a4:7b:2c:9e:c7:44, DST: 44:4e:6d:fd:c7:39

My HWaddr is c4:xxxxxx:ed

After that, the pppoe module closes the connection.

So the problem was as already suspected, that pppoe does not properly check if the received PADT packet belongs to its ppp session.

I'll patch it up soon (if someone does not beat me to it).

1 Like

Do you use OpenWRT to establish the PPPoE connection?

Anyway, I wrote a patch (for the pppoe kernel module) to ignore PADT packets sent to others, fixing the issue.

See https://bugzilla.kernel.org/show_bug.cgi?id=207597

2 Likes

Do you have any idea what sort of equipment your ISP is using as the access concentrator (i.e. PPPoE server)?

I may have seen something like this in the past, where I would lose the PPPoE session randomly from minutes up to days (very rarely got more than 10 days, usually less than 96 hours). I tried an OPNSense setup (FreeBSD based) to see whether a different PPP code heritage changed things but saw the same behaviour. I haven't seen the same behaviour for some months and my current PPPoE session has now been up for just over 35 days, so I suspect either a software update or hardware change to my ISP's AC... (and I know what AC setup they were using while I was experiencing the issue).

Sorry, I don't.
I can do some simple checks (if you suggest any), but I'm just a regular customer.

Awesome, thank you. I'll try this patch out when I build a new image for my router.

Do report about your experience...
What type of router do you use?

You'd have to ask, and you'd probably not be told (because security through obscurity :unamused:).

Have you submitted a ticket to your ISP reporting that you're receiving PADTs for a device other than yours? and the link is being terminated because of this? [I wouldn't tell them you're patching around it!]

I understand that the ISP equipment in use when I had similar symptoms was by Microtik but I don't know model numbers.

EDIT: I'll take a look if other "more sensitive" packets arrive.

1 Like

Fairly typical reaction unfortunately :roll_eyes:.

Yes, if you can find other packets that you shouldn't be getting you can call that a security issue. ISP will probably try and duck that too based on above :frowning:.

At least you have a solution! BTW, in case the kernel-land patch goes nowhere you might try sending the patch to OpenWrt so it could be incorporated pending a kernel resolution.

There is a bunch of DNS, NTP, XMPP and some HTTPS packets visible (packets sent to some other MAC addr, it is the same MAC for the day, then it changes and packets for a different MAC addr start arriving).

So a couple of days short of 6 weeks the random PPPoE drops have reappeared :frowning: - now had 2 in 48 hours. Will have to see whether I can apply your patch to my firmware build to see whether it helps...

The patch just adds two lines to the pppoe kernel module, so you can just build the module and insert it into the running OpenWRT system:

  • stop ppp (it uses the kernel module when connected): ifdown wan
  • rmmod pppoe
  • insmod /root/new_module/pppoe.ko
  • ifup wan

Just to be sure, you can use the verbose patch (" patch with debug output "), it prints a message when it is loaded (or first used, not sure), just so you can see that the patched module is in use (sometimes insmod loads the original module, from the /lib/modules/ folder, not sure why...).

Also note that you must use the stripped module file, not the one in the build folder! That one might crash your router.
Just compare md5sum of unchanged modules with the installed ones, to find the folder with the "correct" version.
(sorry for this if you are an openwrt build expert, it confused me on my first tries)

PS: the debug version also writes a log when a PADT is recevied, noting the destination address, so you can see of "wrong" packets arrive on your connection. (if they don't then the reason for your problem might be something else)

2 Likes