Netgear R7800 exploration (IPQ8065, QCA9984)

Does not seem to work. dmesg shows the initial frequency setup, but there is noting in sys/devices to monitor it afterward.

what does CPUFREQ_DT_PLATDEV do actually? Do we need it or did it get enabled in 4.9 just for fun? I am thinking to reverting to my approach and disabling that (if that config item is conflicting).

Well, with current defaults the CPU frequency works in any case (all cores simultaneously), so it is much better than with mvebu (where no cpufreq driver seems to be available for my wrt3200acm).

That platdev driver seem to be an unification to store duplicate code from all cpufreq drivers in one place. It is selected in Kconfig when selecting cpufreq-dt, so we have to workaround this if we want to do it the right way :slight_smile:

I got my approach to work. In my first try I had not noticed the upstream commit "cpufreq: dt: Remove unused code" https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/drivers/cpufreq/cpufreq-dt.c?h=linux-4.9.y&id=e86eee6bc2aaa6b3637f6497b26beee09a91bde9

After reverting also that it works:

The solution is to replace the current 0046 patch with the patch below. Last time I had not included the one line change at line 379:

--- a/drivers/cpufreq/cpufreq-dt.h
+++ b/drivers/cpufreq/cpufreq-dt.h
@@ -14,6 +14,7 @@
 
 struct cpufreq_dt_platform_data {
 	bool have_governor_per_policy;
+	bool independent_clocks;
 };
 
 #endif /* __CPUFREQ_DT_H__ */
--- a/drivers/cpufreq/qcom-cpufreq.c
+++ b/drivers/cpufreq/qcom-cpufreq.c
@@ -20,7 +20,7 @@
 #include <linux/platform_device.h>
 #include <linux/pm_opp.h>
 #include <linux/slab.h>
-#include <linux/cpufreq-dt.h>
+#include "cpufreq-dt.h"
 
 static void __init get_krait_bin_format_a(int *speed, int *pvs, int *pvs_ver)
 {
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -221,7 +221,10 @@ static int cpufreq_init(struct cpufreq_p
 	}
 
 	if (fallback) {
-		cpumask_setall(policy->cpus);
+		struct cpufreq_dt_platform_data *pd = cpufreq_get_driver_data();
+
+		if (!pd || !pd->independent_clocks)
+			cpumask_setall(policy->cpus);
 
 		/*
 		 * OPP tables are initialized only for policy->cpu, do it for
@@ -376,6 +379,8 @@ static int dt_cpufreq_probe(struct platf
 	if (data && data->have_governor_per_policy)
 		dt_cpufreq_driver.flags |= CPUFREQ_HAVE_GOVERNOR_PER_POLICY;
 
+	dt_cpufreq_driver.driver_data = dev_get_platdata(&pdev->dev);
+
 	ret = cpufreq_register_driver(&dt_cpufreq_driver);
 	if (ret)
 		dev_err(&pdev->dev, "failed register driver: %d\n", ret);

EDIT:
I made a PR about it: https://github.com/lede-project/source/pull/962

1 Like

BTW, on the topic of the serial console, it turned out that it was probably a grounding issue or a line length issue. I picked up an AdaFruit FTDI Friend, and the console works flawlessly. Deets: https://plus.google.com/+TedLemon/posts/j31qJbQksLb

Hello there folks!

May I ask someone of R7800 owners to tell about 'overlay' size in LEDE FW? I suppose there the same issue like with WNDR3700v4 and OpenWRT 15.x FW - not full NAND size available to use (because some part of 128MB is reserved for unknown reasons) until recompile FW with some fixes.

Cheers!

@Bobb, look there - everything is explained

Thanks a lot Wally! Your help was really precious! Did you consider placing '...R7800-large-squashfs-factory.img' to LEDE download section (If it possible of course)? I doubt that only two lads like to utilize about 98MB as ubi (overlay). :wink:

Nope, it's not my build. It't custom build made by @cezary and hosted at his server. But You can download from that and if any problem occurs don't hestitate to ask at his forum: eko.one.pl forum. Language is not a problem and it isn't limited to Polish only.

Thank you Wally for answer and advices! Yes I have read your history (like a lab mouse!) You have nice sense of humor! :slight_smile: I mean maybe that img (created by @cezary) could be second variant of official release (primary for official layout and secondary for non-standard layout) if LEDE maintainers would included fixes suggested by @hnyman into official branch for buildbot. Undoubtedly this necessary only in case more than two men use this layout. I don't ask for it just suppose it will be useful.

I don't think that could be possible as LEDE should be easily reverted to OFW. Unfortunatelly 'netgear' partition contains data used by OFW (I found some data that I set/changed in OFW before removing it) and without it could be faulty.
The only 'proper' way is custom build as many users do. So if you want use whole nand space, You could made custom build yourself (there are patches in @cezary's repo) or use his build.
Maybe there could be such info in wiki/faq but this request/suggestion have to be posted on documentation forum, I think.

Oh, OK! Now I understand the reason of standard layout in LEDE FW. Thanks a lot Wally for your exhaustive explanation!

Now that I have my serial console working, I've been able to install a lede build downloaded from the lede downloads tree: lede-17.01.0-rc2-r3131-42f3c1f-ipq806x-R7800-squashfs-factory.img

However, having installed this image, when I try to install a sysupgrade that I've built from a tree updated about an hour ago, I get this (in the Luci upgrade UI):

The uploaded image file does not contain a supported format. Make sure that you choose the generic image format for your platform.

If I try to flash the sysupgrade that's equivalent to what's already running, it accepts it as a valid image (I didn't actually proceed to flash it). Am I missing something? Should I not expect a build from the head of the tree to be usable? There's such a long stream of discussion on this thread that although I am sure the answer to my question is in there somewhere, I haven't been able to find it.

I'm wondering if some of this stuff might belong on the wiki... :slight_smile:

(BTW, I should say that the factory image from the same build also fails; what I get in this case is:

MODEL ID on image: D7800
Firmware Image MODEL ID do not match open source firmware ID
131072 bytes read: OK
HW ID on board: 29764958+0+128+512+4x4+4x4+cascade
HW ID on image: 29764958+0+128+512+4x4+4x4
Firmware Image HW ID do not match Board HW ID
Board HW ID mismatch,it is forbidden to be written to flash!!)

You have built the image for a different router...
D7800 is not R7800.

That would do it. :slight_smile:

@blogic @dissent1

With r3824 my R7800 fails to boot. Kernel starts but the seems to get stalled. Every ~60 seconds a note about being stalled:

[    2.504598]  TX Checksum insertion supported
[    2.506857]  Wake-Up On Lan supported
[    2.511457]  Enable RX Mitigation via HW Watchdog Timer
[   23.499306] INFO: rcu_sched detected stalls on CPUs/tasks:
[   23.503683]  1-...: (2 ticks this GP) idle=145/140000000000000/0 softirq=28/28 fqs=1050
[   23.503775]  (detected by 0, t=2102 jiffies, g=-282, c=-283, q=13)
[   23.513058] Task dump for CPU 1:
[   23.517917] swapper/0       R  running task        0     1      0 0x00000002
[   23.525581] [<c05c5c54>] (__schedule) from [<00000058>] (0x58)
[   86.549311] INFO: rcu_sched detected stalls on CPUs/tasks:
[   86.553679]  1-...: (2 ticks this GP) idle=145/140000000000000/0 softirq=28/28 fqs=4202
[   86.553773]  (detected by 0, t=8407 jiffies, g=-282, c=-283, q=13)

The router booted ok yesterday with r3799 with no other kernel-related changes than using 4.9 instead of 4.4. Otherwise vanilla build.

Bootlogs:

Looking at the changelog, this looks to me as the most suspicious commit:

"ipq806x: add ipq4019 support"
https://git.lede-project.org/?p=source.git;a=commit;h=c2d50bdeb34cfc359f28aeb2fe7648cc335bc623

(As it modifies config symbols, adds several patches and makes DTS changes.)

Other commits look more innocent to me.

will investigate on my ap148 later today, sorry for the inconvenience

I am trying to find the regression range, and sadly it seems that there are multiple failures. I have so far made two minimal test builds and it looks like this:

  • with r3811-eb3ac8281b everything still works at the first glance
  • with r3816-5c617aec05 wifi firmware is broken and wifi is unusable but the router boots and there is normal wired connectivity. Most likely the ath10k-firmware update has broken things :frowning:

Wifi breakage:

 Reboot (SNAPSHOT, r3816-5c617aec05)

[   12.808966] procd: - init -
[   12.961880] kmodloader: loading kernel modules from /etc/modules.d/*
[   12.964879] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   12.972196] Loading modules backported from Linux version wt-2017-01-31-0-ge882dff19e7f
[   12.972593] Backport generated by backports.git backports-20160324-13-g24da7d3c
[   12.999550] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
[   12.999637] ath10k_pci 0000:01:00.0: enabling bus mastering
[   13.000105] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   13.130838] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:01:00.0.bin failed with error -2
[   13.130888] ath10k_pci 0000:01:00.0: Falling back to user helper
[   13.183995] firmware ath10k!pre-cal-pci-0000:01:00.0.bin: firmware_loading_store: map pages failed
[   13.184338] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/cal-pci-0000:01:00.0.bin failed with error -2
[   13.191960] ath10k_pci 0000:01:00.0: Falling back to user helper
[   13.673417] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[   13.673451] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[   13.684393] ath10k_pci 0000:01:00.0: firmware ver 10.4-3.4-00074 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 fa32e88e
[   15.728598] ath10k_pci 0000:01:00.0: unable to read from the device
[   15.728621] ath10k_pci 0000:01:00.0: could not execute otp for board id check: -110
[   15.733663] ath10k_pci 0000:01:00.0: failed to get board id from otp: -110
[   15.741474] ath10k_pci 0000:01:00.0: could not probe fw (-110)
[   15.749117] ath10k_pci 0001:01:00.0: enabling device (0140 -> 0142)
[   15.754119] ath10k_pci 0001:01:00.0: enabling bus mastering
[   15.754552] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   15.890686] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0001:01:00.0.bin failed with error -2
[   15.890727] ath10k_pci 0001:01:00.0: Falling back to user helper
[   15.941501] firmware ath10k!pre-cal-pci-0001:01:00.0.bin: firmware_loading_store: map pages failed
[   15.941716] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/cal-pci-0001:01:00.0.bin failed with error -2
[   15.949434] ath10k_pci 0001:01:00.0: Falling back to user helper
[   16.201093] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[   16.201123] ath10k_pci 0001:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[   16.211923] ath10k_pci 0001:01:00.0: firmware ver 10.4-3.4-00074 api 5 features no-p2p,mfp,peer-flow-ctrl,btcoex-param,allows-mesh-bcast crc32 fa32e88e
[   18.248608] ath10k_pci 0001:01:00.0: unable to read from the device
[   18.248634] ath10k_pci 0001:01:00.0: could not execute otp for board id check: -110
[   18.253677] ath10k_pci 0001:01:00.0: failed to get board id from otp: -110
[   18.261457] ath10k_pci 0001:01:00.0: could not probe fw (-110)
[   18.269981] ip_tables: (C) 2000-2006 Netfilter Core Team

EDIT:
I am currently recompiling those two builds and will re-test. I forgot that "git reset --hard" also reseted to using kernel 4.4, so those two test builds are invalid from kernel 4.9 perspective. :frowning2:

EDIT2:
The same result with properly compiled kernel 4.9 versions. wifi is broken at r3816 but the router boots.

I have narrowed down the non-boot to be between r3818-r3820.

  • r3818-1cb406d019 boots ok (but wifi is broken, already by the earlier commits)
  • r3820-d5b10bb560 does not boot. (Note: I tried both as it is and with the MTD config item change reverted, as I can't see how that item is related to the crypto stuff that the commit is about)

So it looks like the non-boot condition is caused by one of these:

  • d5b10bb560 ipq806x: make the dwc3 driver and required phy drivers built-in
  • 7dc5617173 ipq806x: enable QCE hardware crypto inside the kernel

I am currently compiling r3819

EDIT:
r3819-7dc5617173 boots ok (without wifi) when the MTD change was reverted, so it looks like the culprit is r3820:

  • d5b10bb560 ipq806x: make the dwc3 driver and required phy drivers built-in

Ps. Luckily the whole ipq806x seems to be broken in the phase1 buildbot, so the faulty builds will not reach the general public.

@hnyman

Sorry, I just saw that John merged the IPQ40XX. I guess, I'll be here answering questions in the mean time.
The wifi breakage is caused by the removal of 936-ath10k_skip_otp_check.patch from the mac80211 package.

I've talked to Michal Kazior about this back in November 2016. But he hasn't come up with a solution either.
The issue with the 936- is that it breaks IPQ40XX device identification and it gets detected as a pcie device
(rather than AHB) and this in turn causes the ath10k driver to abort since it can't find the matching boarddata.

Since I don't have a IPQ806x, I can't test what would be a good solution/WA for this. However, this needs to
be fixed in some way since the upcoming IPQ807X (from what I know) also integrates the WIFI-MAC into the
SoC (so it going to be AHB/AXI too).

From IPQ40XX perspective:

  /* otp and board file not needed if calibration data is present */
  if (calret) {
          ret = ath10k_core_get_board_id_from_otp(ar);
          if (ret && ret != -EOPNOTSUPP) {

is the problematic part. Qualcomm switched from "cal" to "pre-cal". The pre-cal data contains the actual project/board id, that gets later returned back to the driver. Since the pre-cal is mandatory, the calret value is 0 for the IPQ40XX and the
board identification step is skipped. (And I think ath10k just assumes it's a PCIE chip).

So I wonder, what's wrong with the QCA988x/QCA99xx, since ath10k_core_get_board_id_from_otp() should work there as well?