Netgear R7800 exploration (IPQ8065, QCA9984)

If someone notice something bad in the driver tell me and i will rework them... But still I should have already polished it enough comparing it to the original driver present in QSDK.

2 Likes

it is... That's why I decided to not merge them in the cache group.

Might it be possible that your patch is having impact on system temperature? I've noticed 2 grades higher temperature with the same system load.

1 Like

it should cause less temp
but i will check stock voltage... Could be related to actually set the l2 voltage...


It looks like initially the regulator is set to 1050000 and then the patch scale it to 1150000 (on high frequency)
So yes the temp increase can be caused by the patch. Now that i notice this i wonder HOW the l2 cache worked with 100000 less uV


Modified the dts to set the regulator to lowest value on idle l2 freq. Should save some extra C°

Compared to what? Master build from yesterday without this? Or some 19.07 build? or?

A generic comment that something has changed in the CPU frequency scaling logic of ipq8065 maybe 2 months ago, before the 4.19 version bump. (So, not related to Ansuel's 4.19 patch and the follow-up work)

Earlier, both cores in ipq8065 or R7800 were 98% of time at 384M when idle (as shown by LuCI statistics CPU freq additional info). For the past two months(?) one core has spent maybe 25% of time at the next speed level of 600M although the system is idle.

This is from my R7800 with Ansuel's new patches (as of two hours ago, so maybe not the quite current ones).


That may contribute to the temp increase, especially if compared to a 19.07 build or so.

Compared to R7800-master-r11834-10cbc896c0-20191230-2315. Currently compiling the same build as with Ansuel's patch to compare apple to apple.
In my case frequency distribution looks different (gap represents move from r11834 without to r11890 with Ansuel's patch).


You may have some different services (or something like that), which explains the higher base case in CPU frequency. Your long-term series shows change in the early October.
Have you changed the CPU scaling parameters? Changed to "performance" governor for a while? Adjusted the scaling minimum frequency for "ondemand"?

Obviously, I am testing Ansuel's patch on 'production' router with some tweaks:

(while true; do (sleep 3d; logger "IRQBalance oneshot"; irqbalance --oneshot); done) &
echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
for FILE in /sys/class/net/*/queues/[rt]x-0/[rx]ps_cpus; do
   [ -w "$FILE" ] && echo 3 > "$FILE" 2>/dev/null
done

Current config has been applied few months back.

echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold

Remove this... problem should be fixed now...

1 Like

Are you sure this still works in master?

Indeed it doesn't and even is not attempting to do anything. I left it in case I would move back.

It seams to be red herring. Same build without your patch still causes slightly higher temperature.
From 9am with your patch while starting 4pm the same build without.

So it's not related to my patch

Dumb question: what's stopping us from going to 5.4 if k4.19 is already eol this year?

Making the kernel jump with hardware-oriented specials for ~50 targets is not that simple. Feel free to author and test the version bumps needed for the 100-200 OpenWrt generic kernel patches & 20-50 target specific patches per target.

Actually 5.4 might well be the next kernel. Please read the discussion at the mailing list a couple of months ago:
http://lists.infradead.org/pipermail/openwrt-devel/2019-October/thread.html#19610

There is already work going on for 5.4, e.g. by xback and blocktrron:
https://git.openwrt.org/?p=openwrt/staging/xback.git;a=summary
https://git.openwrt.org/?p=openwrt/staging/blocktrron.git;a=shortlog;h=refs/heads/k54

It is great that we have now got ipq806x to 4.19 as the first step, as the 4.19 work for ipq806x was stalled for almost a year until ansuel did the final uplifting.

1 Like

Thank you for the insight. By 'us' I meant our r7800 / ipq806x targets, last time I recalled it was usb/sata driver support was the hold up...

I think kernel bump from 4.19 to 5.4 will be """easy"""
Most of the patch are just backport from mainline so... Will do some work but it looks like openwrt support for 5.4 is still WIP

2 Likes

4.19 --> 5.4 is certainly a smaller and hopefully easier step than 4.14 --> 5.4, not just because of the backports from newer kernels we are already carrying (and which can simply be dropped when going to 5.4), but also because of the tangential development of the kernel as a whole that requires rebasing patches.

Ideally, OpenWrt would follow every single stable kernel update (so 4.19 --> 5.0 --> 5.1 --> 5.2 --> 5.3 --> 5.4 --> …), as that would make the steps more manageable and also help with upstreaming and also soliciting help from upstream (fresh memory of needed changes) in an informal way (no one working on upstream remembers what happened between 4.14 --> 4.15, but they do remember what happened within the last 3 months).

In practice that is not possible for a distribution like OpenWrt, which needs heaps of custom patches (many of which have no chance to pass upstream scrutiny, nor anyone working on them to get these in over time), both for hardware support and to beat the kernel into shape to run on these ressource constrained devices at all. Therefore a compromise between staying reasonably on top (remotely in sync with upstream development, as otherwise you will clash in so many ways --> e.g. the mtd subsystems (new NAND/ SPI-NOR chips are constantly being added to the upstream kernel - and vendors will switch to them, even for 'old' targets, often silently, so you're regularly dealing with the task of having to backport new devices support, bugfixes, etc.) is always needed. Given the number of different targets and their nature of being pretty deeply embedded (all custom hacks, but also very security sensitive - and at the same time profiting a lot from new wireless, mtd, USB developments), this is always a difficult question, with different positions. E.g. 'obsolete' targets like ar7, ath25 only feel the pain - there is nothing to gain from 'new' development (aside from security), but a very immediate pain of growing kernels/ userland and having to spend time on forward porting out-of-tree code that accumulated over a decade and more. At the same time contemporary targets (e.g. anything ARM, anything using 802.11ac wireless or newer) profits very directly from upstream development - and upcoming targets (SOCs targetted for 802.11ax) hard-depend on this, there kernel and wireless drivers can't be new enough for at least 3-5 years to come.

The compromise selected by OpenWrt is to follow LTS kernels - which is still hard for the old targets (e.g. ar7, ixp4xx, orion have not made it beyond kernel 4.9) - and as you have observed, keeping up even needs quite some work and attention for the targets that have the most to gain from updating, as e.g. ipq806x. Is that ideal, no - but depending on your interests for different reasons (too fast for old targets, too slow for newer ones), but it's a reasonable one.

Now the question which LTS branch to follow has been asked and answered many times… I don't really want to go into details again, but it's very, very unlikely that v4.19.x will cease being a first-class LTS branch at the end of the year (important projects using it, Debian, android, etc. - they will push forward to extend it closer to the deadline, as it has happened with all previous LTS kernels as well) - at the same time kernel 5.4 might be convincing as well. The decision for this is still pending and time will tell which path to follow (work is happening, that's the important aspect, and you'll only see if the jump towards 5.4 can succeeds reasonably once you start working on it).

1 Like

Alright, just bought two of these. Planning on getting one up and running with the latest build, and then messing around with the 2nd to hopefully help contribute something useful back to the community.

3 Likes

@Ansuel I have just built the lasted master with your latest commits and I think the router's memory access got a little slower. A build from a few days ago would do ~750MB/s while the best throughout for today's build is within the 600MB/s range. The router is idle and is running with a performance governor. Does that make sense?

while sleep 1; do nice -n -10 ./tools/mbw 32 | grep AVG; done | egrep "Copy: [67]"
AVG Method: MEMCPY Elapsed: 0.05225 MiB: 32.00000 Copy: 612.436 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04682 MiB: 32.00000 Copy: 683.473 MiB/s
AVG Method: MEMCPY Elapsed: 0.04761 MiB: 32.00000 Copy: 672.125 MiB/s
AVG Method: MEMCPY Elapsed: 0.05318 MiB: 32.00000 Copy: 601.766 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04831 MiB: 32.00000 Copy: 662.411 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05255 MiB: 32.00000 Copy: 609.001 MiB/s
AVG Method: MEMCPY Elapsed: 0.05285 MiB: 32.00000 Copy: 605.538 MiB/s
AVG Method: MEMCPY Elapsed: 0.05255 MiB: 32.00000 Copy: 608.902 MiB/s
AVG Method: MEMCPY Elapsed: 0.04756 MiB: 32.00000 Copy: 672.809 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05210 MiB: 32.00000 Copy: 614.202 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05241 MiB: 32.00000 Copy: 610.582 MiB/s