NBG6817: OpenWrt rebooting constantly

it's bug only in 4.14.xx kernel (18.06.xx) ?
4.4.xx kernel in 17.01.xx is free from this bug?

The 4.4.x driver is very different. I quickly browsed the latest 4.4.176 and it does not have the oversized frame reception bug, nor the incorrect dma_free code. That said, a similar bug related to the pre-allocation of SKB buffers was also present in 4.4.x, but fixed in late 2015 here. Unfortunately, the 4.4 version of the driver also had it's fair share of issues. Just take a look here. So unless the 17.01.xx version is based on a very recent version of 4.4.x, you have a good chance to experience some kind of issue, but as far as I can tell not the ones discussed above with 18.06.xx.

How can I set MTU different to standard 1500?

option mtu '1954'

at /etc/config/network not working.

/sbin/ifconfig eth0 mtu 1954 up

at /etc/rc.local not working either leading to error:

kern.err kernel: [ 31.621766] ipq806x-gmac-dwmac 37400000.ethernet eth1: must be stopped to change its MTU

config device
	option name 'eth0'
	option mtu '1954'

Not working:

kern.err kernel: [ 420.840774] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1550 larger than size (1536)

  • What didn't work?
  • Did you did it for all interfaces (bridges and/or VLANs)?

Yes, for all. Still got kernel error.

I'm experiencing the same issue ever since installing Fedora 30 on my laptop, both with wireless and wired ethernet. I get the ipq806x-gmac-dwmac warnings and subsequent crash.

This didn't happen with Fedora 29, so a bug must have been introduced with Fedora 30 that messes up the way MTU-values are set. Does anybody have any tips for how I can try to debug this? I would like to report it upstream.

EDIT: I thought I had found a workaround by setting the MTU manually to 1500 for my network in the KDE system settings, but for some reason jumbo packets are still being sent when my laptop reconnects to my network after waking up from suspend.

I initially didn't notice this, but ipq806x-gmac-dwmac are followed by these warnings:

daemon.warn dnsmasq[2721]: reducing DNS packet size for nameserver 92.220.228.70 to 1280

I have also figured out that disabling NetworkManager and connecting manually to my network does not result in any warnings and crash. By manually I mean with wpa_supplicant:

$ sudo wpa_supplicant -B -iwlp18s0 -cwpa.conf -Dnl80211
$ sudo dhclient wlp18s0

And just with dhclient after plugging in an Ethernet cable:

$ sudo dhclient enp19s0

I assume that the warnings about reducing DNS packet size are somehow related to the previous warnings. Does anybody know how to make any sense of this?

There are some fixes for oversized packets causing memory corruption
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.42&qt=grep&q=stmmac

As mentioned before in this thread, it would help a lot if those affected by these issues could do a test build using kernel 4.19, to check if this has already been fixed in mainline linux.

Yes, there might still be some unresolved issues with USB 3.0, but that shouldn't hinder testing ethernet stability with jumbo frames - and after the testing you can easily revert to your unchanged previous (4.14 based-) installation using luci-app-advanced-reboot (or nbg6817-dualboot).

1 Like

I'm now running master with the same patch applied, and while I have had some messages in the log about oversized frames, there have been no crashes. I'll keep running the build to see whether it will crash, but so far things look promising!

I believe those messages have been triggered by a large DNS response with a fragmented UDP packet. This happens because of DNSSec and the inability to fit all of the response into one frame. I've run packet captures both on the router and my computer directly connected to the media converter (fibre to Ethernet) and I haven't seen any oversized packets, only fragmented packets. Should the router actually be complaining about this with the ipq806x-gmac-dwmac error?

1 Like

I've been pulling my hair out trying to sort why my nbg6817 router reboots multiple times throughout the day. I'm glad to see I'm not the only one experiencing this issue.

@huaracheguarache Have you seen any crashes since you applied those patches? I'd like to create a docker container that will pull in the master openwrt branch, apply those patches and then compile it. I need to learn git a bit more though. Could you show me how you applied those patches to the master?

I'm seeing tons of these message in the kernel log

[Sun Jun  9 17:46:36 2019] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1541 larger than size (1536)
[Sun Jun  9 17:48:09 2019] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1541 larger than size (1536)
[Sun Jun  9 17:48:10 2019] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1541 larger than size (1536)
[Sun Jun  9 17:48:11 2019] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1541 larger than size (1536)
[Sun Jun  9 17:48:14 2019] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1541 larger than size (1536)
[Sun Jun  9 17:48:29 2019] ipq806x-gmac-dwmac 37200000.ethernet eth0: len 1541 larger than size (1536)

I don't remember every seeing this before I upgraded all of my systems from Fedora 29 to Fedora 30. Maybe it's related somehow....
**my DNS server is running on a Fedora 30 box

**update
After playing around with this all weekend, I've learned how the build process works. I've applied the kernel-4.19 related patches and have compiled the firmware. Need to test it out for a day or two to see if this issue is resolved...

FROM fedora:30

ENV FORCE_UNSAFE_CONFIGURE 1

RUN dnf upgrade -y &&\
    dnf install -y git wget hostname bzip2 findutils ccache gcc which ncurses-devel unzip patch openssl-devel subversion python file libtool perl-Thread-Queue zlib-static xz gcc-c++ make rsync &&\
    git clone https://github.com/openwrt/openwrt.git /openwrt &&\
    wget "https://git.openwrt.org/?p=openwrt/staging/chunkeey.git;a=snapshot;h=60802da81d41959d19cf5644e19b8fc3c8462c3a;sf=tgz" -O /tmp/chunkeey-60802da.tar.gz &&\
    tar xvf /tmp/chunkeey-60802da.tar.gz --directory / &&\
    rm /tmp/chunkeey-60802da.tar.gz &&\
    mv /chunkeey-60802da /staging &&\
    /bin/cp /staging/package/kernel/linux/modules/usb.mk /openwrt/package/kernel/linux/modules/usb.mk &&\
    /bin/cp /staging/target/linux/generic/config-4.19 /openwrt/target/linux/generic/config-4.19 &&\
    rsync -avr --delete /staging/target/linux/ipq806x/ /openwrt/target/linux/ipq806x/ &&\
    sed -i '/^--- a\/init\/main.c/,$ d' /openwrt/target/linux/ipq806x/patches-4.19/0067-generic-Mangle-bootloader-s-kernel-arguments.patch &&\
    /openwrt/scripts/feeds update -a &&\
    /openwrt/scripts/feeds install -a &&\
    echo -e 'CONFIG_TARGET_ipq806x=y\nCONFIG_TARGET_ipq806x_DEVICE_zyxel_nbg6817=y\nCONFIG_TARGET_BOARD="ipq806x"' > /openwrt/.config &&\
    make -C /openwrt defconfig &&\
    make -C /openwrt download &&\
    make -C /openwrt V=s -j $(nproc) &&\
    mkdir /openwrt.upload &&\
    cp /openwrt/bin/targets/ipq806x/generic/openwrt-ipq806x-zyxel_nbg6817-squashfs-sysupgrade.bin /openwrt.upload/ &&\
    tar czf /openwrt.upload/openwrt.ipq806x.tar.gz -C /openwrt/bin/ .

CMD ["cp","-r","/openwrt.upload/","/tmp"]

This is a Dockerfile that can be used to build the nbg6817 firmware with kernel 4.19.
Just build and run it:

docker build -t zyxtel.kernel.419 .
docker run --rm -v /tmp:/tmp zyxtel.kernel.419

The compiled files will be placed in /tmp/openwrt.upload on the host system. You can obviously customize this to suit your needs. I've just gone with the default config options here (no luci...etc). But, you could easily supply your own .config file.

I had to remove the last hunk from patch

0067-generic-Mangle-bootloader-s-kernel-arguments.patch

As it didn't match what was in main.c
Anyways, I hope someone finds this useful.....

@slh I found your previous post extremely useful...thanks!

@leif.liddy Great to see that you figured it out. Sorry that I didn't reply to you earlier, but I was a bit busy with other stuff.

That last hunk doesn't apply since kernel 4.19.44, so the patch needs to be modified to apply after the following changes in 4.19.44.

That's just adjusting the context, following upstream/ stable changes, you can follow the example for 4.14.

1 Like

As the kernel patch level increases, the staging repo patches will need to be rebased. I'm not sure where the openwrt roadmap is at, but I'm hoping that the next major release includes kernel 4.19 and is released by the end of the year.

Kernel 4.19 definitely resolved the reboot issues I was having. Router has been up for almost 48 hours without a single reboot!

Same issue here with ASUS WL-500gP v2. 18.06.4 is the latest firmware for it, any chance to fix the issue somehow?

No, your issue is definitely not related to the one discussed in this thread (completely different arch, target, ethernet+switch+wlan cards/ drivers) - the ~3 year old QCA/ ARMv7 based nbg6817 and the 12 year old Broadcom mips32 based wl-500gpv2 don't have anything in common.

@GCRaistlin, please open a new thread to discuss your wl-500gpv2 related issues, but the most likely reason is insufficient RAM - something that is not an issue on the nbg6817.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.