Netgear R7800 exploration (IPQ8065, QCA9984)

5.10 with NSS is working well for most ipq806x devices. 22.xx should be easy to support since it is branching off here shortly.

If upstream is a pain I’d love to get 5.15 working, at least in a custom build sense. Especially if we eventually have access to newer qsdk versions eventually.

oh you actually use the qsdk11 firmware?

problem of upstreaming patches is that it's slow and sometimes they ask very strange change...

Looking at the .bin files I think I made an error and I’m still on 10.0. Need to change the naming convention back.

1 Like

:frowning: i advise to put some effort in switching to the """"leaked""""" qsdk11 at least we doesn't use a too ancient version

1 Like

@robimarko funny story for you

I'm doing some work with the clk drivers and i found something that is incredible...

image

In all this time we were lucky that the driver was shit and just ignored the provided clocks...
PXO_SRC is not defined in the gcc driver but only in the include...
I'm converting everything to a sane implementation and i just notice this... I got to enable earlycon to discover this... never expected a definition that wrong...

So in short

  • Documentation document something that nobody ever followed.
  • DTS contains phandle to something never defined in gcc
  • The driver ignore the clock and works as it does have hardcoded parent names

Now that i'm converting everything to parent_data api and dropping every hardcoded values all this shit comes up LOL...

Anyway fun story... I'm playing with the mux and it's so funny setting the core to source clk from the pxo clok (that is 25mhz). It's too fun how the system works that slow but still it does work. AHAHAH

2 Likes

Also real problem now is how to fix this mess... this bad value is pushed upstream and they already said that doing this kind of change is a NONO... considering fixing the driver cause the router panic and not boot at all... i'm stuck now...
testing if i can manage to add a pxo definition to the gcc driver but no idea if it will work...

That just screams QCA, it would be way too easy otherwise.
BTW, I still have not figured out how was the IPQ40xx SDCC clock ever supposed to work as one of its parents is XO which is a fixed reference clock that gets divided down for the 140 and 400kHZ and that really upsets the set_clk_rate() as it gets NULL back from the function which is supposed to find the topmost parent that needs changing and then it just throws and error and that's it.
And it all boils down to fixed clocks not having a determine rate or round OP, since they are obviously just fixed.

Is that PXO even needed?
Cause if not just get rid of it

I should check that but i'm scared of all the mess they could have done...

pxo is a fixed clock. Problem is that they don't really like doing this kind of modification as it would cause regression... And this is a big regression considering it does cause kernel panic with a fixed driver...

Ok, so what's the issue with defining the PXO in DTS as a fixed clock?

I assume that gets fed somewhere without error checking and then you get a NULL pointer

Well fk them i just found a way to fix it and fix any regression from this change...

Ready for the big hack?
But first some explanation. Problem here is that 100% maintainers will complain about this change.
Again merging the changes to the driver will result in a kernel panic as the DTS still has the old values (and maintainers explained that DTS and arm code are merged separately so we can have a small window where the platform is completely broken.)
I had the same exact problem when i converted the entire gcc driver to parent_data and they said that I had to keep the fixed clock definition in the gcc driver even if they are declared in DTS as fixed clock.

Same happens here but even worse. The main problem here is that the driver use parent_names while what i'm doing here is converting it to parent_data using the fw_name definition. Fw_name checks if the dev have the clk defined and try to use them. And here comes the problem.

While acpu0/1_aux clk doesn't have any clock definition (cause fk Documentation pushed in the same series) the l2_aux clk have clk defined. And the pxo clk is defined using the stupid <gcc PXO_SRC>

Now PXO_SRC exist in the include but is not present in the clk table.
This little thing cause the kernel to panic as the clk when it does try to find the parent, find the defined pxo clk defined in the DTS, try to fetch it from the gcc driver and found that is not defined and the kernel panics.

Problem is that we can't change DTS, or we can but we still have to give a way to make the driver work with the old implementation.

And here the big hack at the start...

image

We provide a NULL pxo clk so gcc probe doesn't panic with strange values and after the probe we enter the pxo clk manually. This way when PXO_SRC is searched in the gcc clk table is found and the kernel doesn't crash :smiley:

This is so wrong but still i think it's the only way to permit the conversion...

FUN THING.


Also interesting thing how the mux state is leaved... the secondary mux is set in a strange state...
cpu0 set to pll8 (384mhz) l2 and cpu1 to pxo (25mhz) While this doesn't make any difference as we always source out of qsb it's still strange...

Anyway don't know why but with these change i have these values from mbw. 900mib/s and 190mib/s are a record I assume...

root@OpenWrt:/# ./mbw 32
Long uses 4 bytes. Allocating 2*8388608 elements = 67108864 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0       Method: MEMCPY  Elapsed: 0.03592        MiB: 32.00000   Copy: 890.918 MiB/s
1       Method: MEMCPY  Elapsed: 0.03582        MiB: 32.00000   Copy: 893.306 MiB/s
2       Method: MEMCPY  Elapsed: 0.03576        MiB: 32.00000   Copy: 894.755 MiB/s
3       Method: MEMCPY  Elapsed: 0.03586        MiB: 32.00000   Copy: 892.409 MiB/s
4       Method: MEMCPY  Elapsed: 0.03562        MiB: 32.00000   Copy: 898.246 MiB/s
5       Method: MEMCPY  Elapsed: 0.03562        MiB: 32.00000   Copy: 898.271 MiB/s
6       Method: MEMCPY  Elapsed: 0.03578        MiB: 32.00000   Copy: 894.454 MiB/s
7       Method: MEMCPY  Elapsed: 0.03559        MiB: 32.00000   Copy: 899.230 MiB/s
8       Method: MEMCPY  Elapsed: 0.03563        MiB: 32.00000   Copy: 898.094 MiB/s
9       Method: MEMCPY  Elapsed: 0.03667        MiB: 32.00000   Copy: 872.648 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.03583        MiB: 32.00000   Copy: 893.171 MiB/s
0       Method: DUMB    Elapsed: 0.17263        MiB: 32.00000   Copy: 185.364 MiB/s
1       Method: DUMB    Elapsed: 0.17124        MiB: 32.00000   Copy: 186.874 MiB/s
2       Method: DUMB    Elapsed: 0.17121        MiB: 32.00000   Copy: 186.906 MiB/s
3       Method: DUMB    Elapsed: 0.17123        MiB: 32.00000   Copy: 186.878 MiB/s
4       Method: DUMB    Elapsed: 0.17134        MiB: 32.00000   Copy: 186.768 MiB/s
5       Method: DUMB    Elapsed: 0.17170        MiB: 32.00000   Copy: 186.369 MiB/s
6       Method: DUMB    Elapsed: 0.17261        MiB: 32.00000   Copy: 185.391 MiB/s
7       Method: DUMB    Elapsed: 0.17114        MiB: 32.00000   Copy: 186.980 MiB/s
8       Method: DUMB    Elapsed: 0.17120        MiB: 32.00000   Copy: 186.916 MiB/s
9       Method: DUMB    Elapsed: 0.17119        MiB: 32.00000   Copy: 186.924 MiB/s
AVG     Method: DUMB    Elapsed: 0.17155        MiB: 32.00000   Copy: 186.535 MiB/s
0       Method: MCBLOCK Elapsed: 0.03591        MiB: 32.00000   Copy: 891.141 MiB/s
1       Method: MCBLOCK Elapsed: 0.03593        MiB: 32.00000   Copy: 890.720 MiB/s
2       Method: MCBLOCK Elapsed: 0.03604        MiB: 32.00000   Copy: 888.001 MiB/s
3       Method: MCBLOCK Elapsed: 0.03597        MiB: 32.00000   Copy: 889.680 MiB/s
4       Method: MCBLOCK Elapsed: 0.03585        MiB: 32.00000   Copy: 892.683 MiB/s
5       Method: MCBLOCK Elapsed: 0.03589        MiB: 32.00000   Copy: 891.688 MiB/s
6       Method: MCBLOCK Elapsed: 0.03691        MiB: 32.00000   Copy: 866.997 MiB/s
7       Method: MCBLOCK Elapsed: 0.03574        MiB: 32.00000   Copy: 895.355 MiB/s
8       Method: MCBLOCK Elapsed: 0.03810        MiB: 32.00000   Copy: 839.829 MiB/s
9       Method: MCBLOCK Elapsed: 0.03589        MiB: 32.00000   Copy: 891.737 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03622        MiB: 32.00000   Copy: 883.465 MiB/s

Wait wth? reverting qsb to a rate of 1 (aka make the sec mux sourcing out of pll8_vote i have these results??)

root@OpenWrt:/# ./mbw 32
Long uses 4 bytes. Allocating 2*8388608 elements = 67108864 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0       Method: MEMCPY  Elapsed: 0.04374        MiB: 32.00000   Copy: 731.596 MiB/s
1       Method: MEMCPY  Elapsed: 0.04379        MiB: 32.00000   Copy: 730.827 MiB/s
2       Method: MEMCPY  Elapsed: 0.04359        MiB: 32.00000   Copy: 734.130 MiB/s
3       Method: MEMCPY  Elapsed: 0.04368        MiB: 32.00000   Copy: 732.601 MiB/s
4       Method: MEMCPY  Elapsed: 0.04389        MiB: 32.00000   Copy: 729.079 MiB/s
5       Method: MEMCPY  Elapsed: 0.04573        MiB: 32.00000   Copy: 699.698 MiB/s
6       Method: MEMCPY  Elapsed: 0.04379        MiB: 32.00000   Copy: 730.827 MiB/s
7       Method: MEMCPY  Elapsed: 0.04775        MiB: 32.00000   Copy: 670.199 MiB/s
8       Method: MEMCPY  Elapsed: 0.04370        MiB: 32.00000   Copy: 732.332 MiB/s
9       Method: MEMCPY  Elapsed: 0.04637        MiB: 32.00000   Copy: 690.072 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.04460        MiB: 32.00000   Copy: 717.455 MiB/s
0       Method: DUMB    Elapsed: 0.17335        MiB: 32.00000   Copy: 184.597 MiB/s
1       Method: DUMB    Elapsed: 0.17300        MiB: 32.00000   Copy: 184.969 MiB/s
2       Method: DUMB    Elapsed: 0.17345        MiB: 32.00000   Copy: 184.494 MiB/s
3       Method: DUMB    Elapsed: 0.17314        MiB: 32.00000   Copy: 184.816 MiB/s
4       Method: DUMB    Elapsed: 0.17484        MiB: 32.00000   Copy: 183.020 MiB/s
5       Method: DUMB    Elapsed: 0.17862        MiB: 32.00000   Copy: 179.154 MiB/s
6       Method: DUMB    Elapsed: 0.17318        MiB: 32.00000   Copy: 184.777 MiB/s
7       Method: DUMB    Elapsed: 0.17310        MiB: 32.00000   Copy: 184.866 MiB/s
8       Method: DUMB    Elapsed: 0.17301        MiB: 32.00000   Copy: 184.960 MiB/s
9       Method: DUMB    Elapsed: 0.17298        MiB: 32.00000   Copy: 184.988 MiB/s
AVG     Method: DUMB    Elapsed: 0.17387        MiB: 32.00000   Copy: 184.048 MiB/s
0       Method: MCBLOCK Elapsed: 0.04373        MiB: 32.00000   Copy: 731.713 MiB/s
1       Method: MCBLOCK Elapsed: 0.04411        MiB: 32.00000   Copy: 725.476 MiB/s
2       Method: MCBLOCK Elapsed: 0.04579        MiB: 32.00000   Copy: 698.797 MiB/s
3       Method: MCBLOCK Elapsed: 0.04390        MiB: 32.00000   Copy: 728.846 MiB/s
4       Method: MCBLOCK Elapsed: 0.04786        MiB: 32.00000   Copy: 668.589 MiB/s
5       Method: MCBLOCK Elapsed: 0.04388        MiB: 32.00000   Copy: 729.262 MiB/s
6       Method: MCBLOCK Elapsed: 0.04683        MiB: 32.00000   Copy: 683.337 MiB/s
7       Method: MCBLOCK Elapsed: 0.04389        MiB: 32.00000   Copy: 729.112 MiB/s
8       Method: MCBLOCK Elapsed: 0.04376        MiB: 32.00000   Copy: 731.211 MiB/s
9       Method: MCBLOCK Elapsed: 0.04386        MiB: 32.00000   Copy: 729.528 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04476        MiB: 32.00000   Copy: 714.881 MiB/s
3 Likes

Ok, just stumbled on your reply as I did not get a notification.
Let's just say that I hate the notion that DT bindings are something that must not be broken and written in stone.
I think that just makes more issues then the issue it tries to solve that way

Yhea i hate it too... especially if only one device is supported... but i understand that they want to minimize any problem with downstream project (that targets other stuff and probably are based on ancient kernel version so the complain is a bit strange)

Anyway the main problem is that there can be a delay where in linux master is merged the clk part and after some days the dts and that cause broken platform. They said that right after the dts change is merged i can drop all the compatibility workaround. (but don't know if I will do honestly...)

Can someone do a quick test and comment from ipq8064.dtsi and ipq8065.dtsi the cpu idle nodes?

I'm doing a build and can load it likely tomorrow if that helps. What, where from, and how?

nothing sorry it's already disabled for r7800....

@anon98444528 actually i may have something to test can you wait 3 minutes?

1 Like

i'm around and can make some time

sorry again you must be hating me... i'm checking complex code better wait for some broader test... (unless you want to test your router with single cpu and probably very bad perf ahahha)

1 Like

Can someone remember what was the problem with idle state on ipq8065? And why we had that disabled?

(I have some ideas that the problem is idle state... we have that disabled but still it's misconfigured currently)

which one, wfi or spc? If you are referring back to 4.19, I don't think anyone figured out why spc did not work on ipq8065 (my last suggestions here)

AFAIK, wfi works on ipq8065. Both wfi and spc work for me on ipq8064.