Netgear R7800 exploration (IPQ8065, QCA9984)

Interesting. Could you please run ifconfig and check if the interface has either PROMISC or ALLMULTI set? I suspect that having either attribute set on any interface which uses the same ethernet device (i.e. eth1 or eth0) would be enough to avoid the problem. If neither attribute is set then it seems you do not see the problem in your case, and it must be more complex.

In my setup, I have no "wan" -- this is an internal router and I map the various wired ports into various vlans on eth0 and eth1. However, it is the case that all my testing has been done using the wired connection to the port labelled "Internet". I will test to see if I see the same problem on other physical ports.

I am running 17.01 (kernel 4.4.50).

nope, none of that all, and the system is currently streaming audio to IPTV as well

root@Router:~# ifconfig | grep PROMISC
root@Router:~# ifconfig | grep ALL
root@Router:~# iftop -i br-lan

                                       12.5Kb                                 25.0Kb                                  37.5Kb                                 50.0Kb                            62.5Kb
mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqvqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
213.75.11.11                                                                        => 224.3.2.6                                                                             18.7Kb  12.0Kb  12.6Kb
                                                                                     <=                                                                                          0b      0b      0b

Thanks for the feedback. Is there an interface on that device that is part of a bridge? As part of my debugging, I have discovered that the device gets the promiscuous mode bit set if it is used in a bridge, even though the PROMISC flag doesn't get set on the interface.

I did some more testing and the problem affects both eth0 and eth1 and all physical ports. I have also tried various hacks in the driver to test things out (see the bug report for more info) and have convinced myself it is a real hardware or firmware problem in the multicast filter in the device. It is easy for the driver to workround just by treating the device as if it had no multicast filter and receiving all multicast frames.

Does anyone know if the R7800 has firmware for the lan (dwmac) device? If so, I would be interested to test different versions if there are any.

If not, I will workround it in the driver. What is the best way in the lede world to have the driver work differently on a particular router? Conditional code in the driver? A specific patch which gets selected at build time? Is there a good way for the driver to work out at runtime which device it is running on to avoid having to have a specific image for the R7800? Or should I use a boot parameter?

yes, but that is default, the lan and wifi-ports are bridged:

config interface 'lan'
        option type 'bridge'
        option igmp_snooping '1'
        option ifname 'eth1'
        option proto 'static'

If it's happening at switch level, you can check data sheet for switch registers and fix it in DT with qca8327-initvals
http://www.datasheet4u.com/download_new.php?id=771154

@johnnysl

Good point that the default config means the bug only shows up on the "wan" interface. But many people using lede (like me) don't use the default config. And even people using the default are seeing the problem when trying to use IPv6 on the wan port (see IPv6 works only with wan in promiscuous mode). We certainly need a fix.

@dissent1

Thanks for the suggestion about the switch. I had not checked that but, looking at both the counters (and the ar8327 driver code setting the registers) the switch seems to be forwarding the multicast packets on to the dwmac device.

So, I still think the problem is in the dwmac.

Is there anything new to try regarding the kernel 4.9 ? Or was the memory mapping problem left unsolved for now?

(I looked at the bl49-t2 branch, which I think is the newest(?), but the patches do not apply cleanly.)

Just wondering. Does the stock firmware have these issues as well?
I saw you mentioning that the issue might be hardware related to the switch. Wouldn't the stock firmware show the same issue? and if not, could we maybe extract the switch drivers from the GPL source-code and use those as a workaround?

i pushed an ipq branch to my staging treee. this is based on dissents tree with some cleanups and a few missing patches added. could folks please help test this tree ?

I tried that already earlier today, but it fails to apply patches. Failure at patch 0031:

Applying /Openwrt/k49/target/linux/ipq806x/patches-4.9/0031-mtd-add-SMEM-parser-for-QCOM-platforms.patch using plaintext: 
patching file drivers/mtd/Kconfig
Hunk #1 succeeded at 190 with fuzz 2 (offset 35 lines).
patching file drivers/mtd/Makefile
Hunk #1 FAILED at 13.
1 out of 1 hunk FAILED -- saving rejects to file drivers/mtd/Makefile.rej
patching file drivers/mtd/qcom_smem_part.c
Patch failed!  Please fix /Openwrt/k49/target/linux/ipq806x/patches-4.9/0031-mtd-add-SMEM-parser-for-QCOM-platforms.patch!

oops, forgot to push a fix, will do so after breakfast

pushed the fixed tree

Sorry, but no beef.
I pulled the commit "ipq: more v4.9 fixes" on top of the LEDE master and compiled a 4.9 build for my R7800. But the router gets into a reboot loop, so no improvement :frowning:

I've been absent and busy for a while and still am.
Try to apply these


You may also need to remove zreladdr from config-4.9

The zreladdr issue is patched by Felix and already part of Master now, isn't it?

edit: this one i mean: https://git.lede-project.org/?p=source.git;a=commit;h=9f09bd66064cd3a00631d54e579b1fb2baa0b262

It has returned in @blogic staging commit https://git.lede-project.org/?p=lede/blogic/staging.git;a=blobdiff;f=target/linux/ipq806x/config-4.9;h=4e6e915739eab417f33a50c5f30f0095205459ba;hp=798f5a6771428914b0a38230d1fe236f569688dd;hb=d83c8c7cc7d517a96e5bef1d70671d7d6f396ed1;hpb=3a06dd60eba362df90705315bbbddced39566a2e

You probably referenced the wrong commit...

The patch from Felix is there since two days as the commit https://git.lede-project.org/?p=source.git;a=commitdiff;h=2a3952bcd5ab1818da6e7a9859c0e08086f01c3b
"kernel: re-apply 300-arch-arm-force-ZRELADDR-on-arch-qcom.patch on 4.9 (FS#549)"

But like dissent1 says, the new patch from blogic sets that option CONFIG_AUTO_ZRELADDR=y again into .config

I will try compiling both ways.

I have now tested three different variations, and all failed, but slightly diferently:

  • blogic's patch (incl ZRELADDR=y): bootloop
  • blogic's patch (incl ZRELADDR=y) + 2 nand patches from dissent: stuck with red power plus two switch lights
  • blogic's patch (ZRELADDR reverted) + 2 nand patches from dissent: stuck with red power plus two switch lights

I have not yet tried plain blogic's patch with ZRELADDR reverted. EDIT: that fails too, bootloop.

Looks like we need to get proper bootlogs from R7800 to solve the kernel 4.9 problem. I will install serial cable to my router.

Has anybody opened R7800 and used the serial header? Based on first inspection, I think that the case screws are hidden under the rubber feet. I tried to find any reference about the r7800 serial connection, but did not find anything really good. Based on FCC photos (and looking through the side vents) it looks like there is a proper serial header with pins, but the header is unnamed and pins are not identified.

I will likely install short patch cable to the internal header and use it to create a new permanent serial header outside the router, so that I will not need to open the router later.