IPQ8065 and 8064 irregular reboot with TP-Link TL-SG1016D

Changed Topic because of new tests. It is NOT the switch, which causes this problem.

Greetings!

I had an issue costing me about 16h to get the cause. My main network switch is a TP-Link TL-SG1016D connected directly to my router. I wanted to replace my venerable TP-Link Archer C5 with something faster. First try was a TP-Link AC2600 with IPQ8064 SOC and second a Netgear R7800 with IPQ8065. Setting up both on my workplace connected to a TP-Link TL-SG2008 switch gave no problems. As soon as I moved both routers in place, connected to the 1016D, they startet to reboot after 2 to 11 minutes, killing my internet connection. Sometimes the log looked like:

Sep  6 10:43:22 kerberos kernel: [  129.277622] Unable to handle kernel paging request at virtual address 65aa45e0
Sep  6 10:43:22 kerberos kernel: [  129.277701] pgd = c0204000
Sep  6 10:43:22 kerberos kernel: [  129.283777] [65aa45e0] *pgd=00000000
Sep  6 10:43:22 kerberos kernel: [  129.289846] Internal error: Oops: 5 [#1] SMP ARM
Sep  6 10:43:23 kerberos kernel: [  129.290088] Modules linked in: pppoe ppp_async iptable_nat ip6table_nat pptp pppox ppp_mppe ppp_generic nf_nat_ipv6 nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY usbserial slh

Often there was no clue in the logs. After many tries for possible causes, changing the C2600 for the R7800 I used an old Zyxel switch instead of the 1016D and bingo: The reboots are gone.
Since my old Archer C5 showed never such behaviour, the switch 1016D was my last guess...

That raises same questions to me:
What's the difference between C5 with Atheros QCA9558 and IPQ806X, which causes a problem with the TP-Link SG1016D?
Why is there a kernel panic and not a network error of some kind?

Some additional information:
On the C2600 I tried: 17.01.02, snapshots dated 2017/08/29 and 30.
On the R7800 only the build r3498 by hnyman (A big thanks for that!) was used.
Jumboframes of 9k are used in my network, when the machine is capable.

I hope this post will warn others with this weird symptoms.

Any suggestions welcome!

Ciao,

Martin

Hi!

I got new results... After changing the switch from my old Zyxel and TP-Link TL-SG1016D to a TL-SG2008 I had ~30 hours without reboots of the R7800. After that time the R7800 rebooted again every 2-10 minutes. Several tries later the conclusion is:

Jumbo frames/mtu greater than 1500 bytes in the network are the cause for the reboots of the router.

My computer is dual booting Windows and Linux. The router was stable while using Windows and when starting Linux with configured MTU of 9000 the rebooting fun begins... After reverting Linux to MTU 1500 there were no problems until now.
It turned out the old Zyxel switch is not capable of jumbo frames so it converted that frames to a size of 1500 to the router. On the other hand my Archer C5 was never instable while I used a MTU of 9000 in my LAN. Both the TP-Link C2600 and the Netgear R7800 get corrupted while using jumbo frames and will do a reboot.