R7800 cache scaling issue

We already have the voltage, I need to check on what is set by default by the rpm...

not just the DTS, the code doesn't support the L2 cache scaling with a different freq to the core.

BTW you've done great work on this, been snowed under IRL and have not had the energy to work on the L2 stuff. i'm sure all the ipq806x guys out there are very appreciative.

I certainly appriciate all the hard work done by everyone in openwrt... I need an "openwrt for dummies" book to understand it all though.

1 Like

I have a short question to @facboy and @Ansuel

Does anybody knows was the L2 Scaling Issues bug fixed in 19.07.x or not?
Maybe that issue was fixed in 'master' or some other custom build?
I mean the bug discussed above where L2 Frequency scales with CPU frequency linearly instead of proper implementation (according to above post and Code Aurora 'powerctl' comments):

  • 384 Mhz when idle
  • 1.0 Ghz at 600, 800, 1000
  • 1.2 GHz at 1400, 1725

PS: I have briefly tested my fresh 19.07 installation and found out that L2 cache frequency is strictly set to a minimum supported value of 384MHz while running CPU at 600MHz and 800MHz.

That was a very disappointing discovery.

IMHO such an artificial slowdown of the L2 cache frequency can have an extremely negative impact not only on memory bandwidth but on the soft-IRQ processing performance also.

It would be interesting to measure IRQ processing speed for the different L2 cache frequencies.
By comparing results it would be possible to evaluate the impact to the system latency and responsiveness that was introduced by the forced down-clocking of the L2 cache for 600/800MHz CPU freqs.
The only thing - I do not know how to measure soft-IRQ performance, what tool could be used to perform such tests.

19.07 / ondemand / CPU freq: min 600 max 600

L2 Cache freqs: Idle | Under max LOAD

/sys/kernel/debug/clk/acpu_l2_aux/clk_rate		384000000	  384000000
/sys/kernel/debug/clk/hfpll_l2/clk_rate			1200000000	  1200000000
/sys/kernel/debug/clk/hfpll_l2_div/clk_rate		600000000	  600000000
/sys/kernel/debug/clk/krait_l2_pri_mux/clk_rate	384000000	> 384000000  <
/sys/kernel/debug/clk/krait_l2_sec_mux/clk_rate	384000000	  384000000

19.07 / ondemand / CPU freq: min 800 max 800

L2 Cache freqs: Idle | Under max LOAD

/sys/kernel/debug/clk/acpu_l2_aux/clk_rate		384000000	  384000000
/sys/kernel/debug/clk/hfpll_l2/clk_rate			1200000000	  1200000000
/sys/kernel/debug/clk/hfpll_l2_div/clk_rate		600000000	  600000000
/sys/kernel/debug/clk/krait_l2_pri_mux/clk_rate	384000000	> 384000000  <
/sys/kernel/debug/clk/krait_l2_sec_mux/clk_rate	384000000	  384000000
1 Like

i only ever run some variant of master. but given that i posted in dec 19 and 19.07 would have been cut in jul 19, i would guess it's never been in 19.07 even if it is in master. pretty sure @Ansuel merged some variant of it into master when we updated to kernel 5.4.

the current cpufreq driver address 2 main problem

  1. cache can't be run a 384 with other core at 1 ghz (so anything must be at 384)
  2. any transition from 384 to 1.4 is first set to the low value... (384) before scaled

in theory this should fix the problems... if you still find problem they could be caused by 1 and 2 should be removed

1 Like

does someone want to do some test in the free time?
I need to test something

The test is to build a custom image with a modded cpufreq driver
Then use mbw and test if the cache scaling issue is present

I can do it.

@facboy can you give me some hint if you notice some problem in this output?
This is a running system with wifi and all watching twich right now

root@No-Lag-Router:/tmp/openwrt-r7800-freq-test-master# ./set_scaling_max_freq.sh -m 384 600 800 1000 1400 1725 800 1000 1400 1725 600 600
 600  1000 1000 1000 1000
Setting scaling_max_freq to 384000
cpu0 cur_freq: 384000
cpu1 cur_freq: 384000
AVG     Method: MEMCPY  Elapsed: 0.05134        MiB: 16.00000   Copy: 311.631 MiB/s
AVG     Method: DUMB    Elapsed: 0.52978        MiB: 16.00000   Copy: 30.201 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04965        MiB: 16.00000   Copy: 322.228 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG     Method: MEMCPY  Elapsed: 0.03255        MiB: 16.00000   Copy: 491.490 MiB/s
AVG     Method: DUMB    Elapsed: 0.32122        MiB: 16.00000   Copy: 49.810 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03198        MiB: 16.00000   Copy: 500.375 MiB/s
Setting scaling_max_freq to 800000
cpu0 cur_freq: 800000
cpu1 cur_freq: 800000
AVG     Method: MEMCPY  Elapsed: 0.02887        MiB: 16.00000   Copy: 554.299 MiB/s
AVG     Method: DUMB    Elapsed: 0.20144        MiB: 16.00000   Copy: 79.429 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03076        MiB: 16.00000   Copy: 520.178 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG     Method: MEMCPY  Elapsed: 0.02973        MiB: 16.00000   Copy: 538.112 MiB/s
AVG     Method: DUMB    Elapsed: 0.16321        MiB: 16.00000   Copy: 98.035 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.02929        MiB: 16.00000   Copy: 546.183 MiB/s
Setting scaling_max_freq to 1400000
cpu0 cur_freq: 1400000
cpu1 cur_freq: 1400000
AVG     Method: MEMCPY  Elapsed: 0.02492        MiB: 16.00000   Copy: 642.116 MiB/s
AVG     Method: DUMB    Elapsed: 0.11526        MiB: 16.00000   Copy: 138.813 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.02461        MiB: 16.00000   Copy: 650.219 MiB/s
Setting scaling_max_freq to 1725000
cpu0 cur_freq: 1725000
cpu1 cur_freq: 1725000
AVG     Method: MEMCPY  Elapsed: 0.03135        MiB: 16.00000   Copy: 510.325 MiB/s
AVG     Method: DUMB    Elapsed: 0.09918        MiB: 16.00000   Copy: 161.319 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03332        MiB: 16.00000   Copy: 480.195 MiB/s
Setting scaling_max_freq to 800000
cpu0 cur_freq: 800000
cpu1 cur_freq: 800000
AVG     Method: MEMCPY  Elapsed: 0.04237        MiB: 16.00000   Copy: 377.643 MiB/s
AVG     Method: DUMB    Elapsed: 0.22685        MiB: 16.00000   Copy: 70.530 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04008        MiB: 16.00000   Copy: 399.178 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG     Method: MEMCPY  Elapsed: 0.03900        MiB: 16.00000   Copy: 410.208 MiB/s
AVG     Method: DUMB    Elapsed: 0.16905        MiB: 16.00000   Copy: 94.649 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03789        MiB: 16.00000   Copy: 422.317 MiB/s
Setting scaling_max_freq to 1400000
cpu0 cur_freq: 1400000
cpu1 cur_freq: 1400000
AVG     Method: MEMCPY  Elapsed: 0.03301        MiB: 16.00000   Copy: 484.709 MiB/s
AVG     Method: DUMB    Elapsed: 0.11744        MiB: 16.00000   Copy: 136.242 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03103        MiB: 16.00000   Copy: 515.647 MiB/s
Setting scaling_max_freq to 1725000
cpu0 cur_freq: 1725000
cpu1 cur_freq: 1725000
AVG     Method: MEMCPY  Elapsed: 0.03229        MiB: 16.00000   Copy: 495.514 MiB/s
AVG     Method: DUMB    Elapsed: 0.10060        MiB: 16.00000   Copy: 159.047 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03152        MiB: 16.00000   Copy: 507.580 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG     Method: MEMCPY  Elapsed: 0.04720        MiB: 16.00000   Copy: 338.967 MiB/s
AVG     Method: DUMB    Elapsed: 0.32533        MiB: 16.00000   Copy: 49.181 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04495        MiB: 16.00000   Copy: 355.930 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG     Method: MEMCPY  Elapsed: 0.04518        MiB: 16.00000   Copy: 354.121 MiB/s
AVG     Method: DUMB    Elapsed: 0.33433        MiB: 16.00000   Copy: 47.857 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04542        MiB: 16.00000   Copy: 352.284 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG     Method: MEMCPY  Elapsed: 0.04607        MiB: 16.00000   Copy: 347.312 MiB/s
AVG     Method: DUMB    Elapsed: 0.33129        MiB: 16.00000   Copy: 48.296 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04627        MiB: 16.00000   Copy: 345.764 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG     Method: MEMCPY  Elapsed: 0.03835        MiB: 16.00000   Copy: 417.214 MiB/s
AVG     Method: DUMB    Elapsed: 0.17045        MiB: 16.00000   Copy: 93.869 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03944        MiB: 16.00000   Copy: 405.730 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG     Method: MEMCPY  Elapsed: 0.03937        MiB: 16.00000   Copy: 406.352 MiB/s
AVG     Method: DUMB    Elapsed: 0.16515        MiB: 16.00000   Copy: 96.882 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03940        MiB: 16.00000   Copy: 406.064 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG     Method: MEMCPY  Elapsed: 0.03762        MiB: 16.00000   Copy: 425.328 MiB/s
AVG     Method: DUMB    Elapsed: 0.16697        MiB: 16.00000   Copy: 95.825 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03704        MiB: 16.00000   Copy: 432.010 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG     Method: MEMCPY  Elapsed: 0.03988        MiB: 16.00000   Copy: 401.227 MiB/s
AVG     Method: DUMB    Elapsed: 0.16731        MiB: 16.00000   Copy: 95.631 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.03849        MiB: 16.00000   Copy: 415.641 MiB/s

what i'm testing here is if the problem was caused by cpu running at low freq with cache running at 1ghz+ and removing the workaround to set the cache to idle freq before scaling

what governor is that on? if it's on performance it's not right, at 1.4 and 1.7g it should be getting 650+MB/s, 130+MB/s, 650+MB/s or so. all those results around 400M are not right.

this is on mine:

t# ./set_scaling_max_freq.sh -m 384 600 800 1000 1400 1725 800 1000 1400 1725 600 600 600 1000 1000 1000 1000
Setting scaling_max_freq to 384000
cpu0 cur_freq: 384000
cpu1 cur_freq: 384000
AVG	Method: MEMCPY	Elapsed: 0.04424	MiB: 16.00000	Copy: 361.664 MiB/s
AVG	Method: DUMB	Elapsed: 0.47620	MiB: 16.00000	Copy: 33.599 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04474	MiB: 16.00000	Copy: 357.624 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG	Method: MEMCPY	Elapsed: 0.03243	MiB: 16.00000	Copy: 493.338 MiB/s
AVG	Method: DUMB	Elapsed: 0.28509	MiB: 16.00000	Copy: 56.123 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02988	MiB: 16.00000	Copy: 535.511 MiB/s
Setting scaling_max_freq to 800000
cpu0 cur_freq: 800000
cpu1 cur_freq: 800000
AVG	Method: MEMCPY	Elapsed: 0.02820	MiB: 16.00000	Copy: 567.446 MiB/s
AVG	Method: DUMB	Elapsed: 0.18499	MiB: 16.00000	Copy: 86.490 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.03003	MiB: 16.00000	Copy: 532.785 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG	Method: MEMCPY	Elapsed: 0.03252	MiB: 16.00000	Copy: 491.938 MiB/s
AVG	Method: DUMB	Elapsed: 0.17978	MiB: 16.00000	Copy: 89.000 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.03760	MiB: 16.00000	Copy: 425.492 MiB/s
Setting scaling_max_freq to 1400000
cpu0 cur_freq: 1400000
cpu1 cur_freq: 1400000
AVG	Method: MEMCPY	Elapsed: 0.02701	MiB: 16.00000	Copy: 592.470 MiB/s
AVG	Method: DUMB	Elapsed: 0.11439	MiB: 16.00000	Copy: 139.877 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02552	MiB: 16.00000	Copy: 626.954 MiB/s
Setting scaling_max_freq to 1725000
cpu0 cur_freq: 1725000
cpu1 cur_freq: 1725000
AVG	Method: MEMCPY	Elapsed: 0.02529	MiB: 16.00000	Copy: 632.656 MiB/s
AVG	Method: DUMB	Elapsed: 0.09081	MiB: 16.00000	Copy: 176.198 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02388	MiB: 16.00000	Copy: 670.036 MiB/s
Setting scaling_max_freq to 800000
cpu0 cur_freq: 800000
cpu1 cur_freq: 800000
AVG	Method: MEMCPY	Elapsed: 0.02979	MiB: 16.00000	Copy: 537.086 MiB/s
AVG	Method: DUMB	Elapsed: 0.19294	MiB: 16.00000	Copy: 82.925 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02814	MiB: 16.00000	Copy: 568.553 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG	Method: MEMCPY	Elapsed: 0.02736	MiB: 16.00000	Copy: 584.729 MiB/s
AVG	Method: DUMB	Elapsed: 0.15168	MiB: 16.00000	Copy: 105.488 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02761	MiB: 16.00000	Copy: 579.473 MiB/s
Setting scaling_max_freq to 1400000
cpu0 cur_freq: 1400000
cpu1 cur_freq: 1400000
AVG	Method: MEMCPY	Elapsed: 0.02674	MiB: 16.00000	Copy: 598.435 MiB/s
AVG	Method: DUMB	Elapsed: 0.11472	MiB: 16.00000	Copy: 139.475 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02686	MiB: 16.00000	Copy: 595.746 MiB/s
Setting scaling_max_freq to 1725000
cpu0 cur_freq: 1725000
cpu1 cur_freq: 1725000
AVG	Method: MEMCPY	Elapsed: 0.02407	MiB: 16.00000	Copy: 664.866 MiB/s
AVG	Method: DUMB	Elapsed: 0.09042	MiB: 16.00000	Copy: 176.943 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02439	MiB: 16.00000	Copy: 656.007 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG	Method: MEMCPY	Elapsed: 0.03071	MiB: 16.00000	Copy: 521.006 MiB/s
AVG	Method: DUMB	Elapsed: 0.28277	MiB: 16.00000	Copy: 56.582 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.03052	MiB: 16.00000	Copy: 524.272 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG	Method: MEMCPY	Elapsed: 0.03290	MiB: 16.00000	Copy: 486.356 MiB/s
AVG	Method: DUMB	Elapsed: 0.30361	MiB: 16.00000	Copy: 52.700 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.03295	MiB: 16.00000	Copy: 485.533 MiB/s
Setting scaling_max_freq to 600000
cpu0 cur_freq: 600000
cpu1 cur_freq: 600000
AVG	Method: MEMCPY	Elapsed: 0.02993	MiB: 16.00000	Copy: 534.652 MiB/s
AVG	Method: DUMB	Elapsed: 0.28420	MiB: 16.00000	Copy: 56.299 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.03007	MiB: 16.00000	Copy: 532.019 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG	Method: MEMCPY	Elapsed: 0.02928	MiB: 16.00000	Copy: 546.448 MiB/s
AVG	Method: DUMB	Elapsed: 0.15735	MiB: 16.00000	Copy: 101.682 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02844	MiB: 16.00000	Copy: 562.580 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG	Method: MEMCPY	Elapsed: 0.02876	MiB: 16.00000	Copy: 556.251 MiB/s
AVG	Method: DUMB	Elapsed: 0.16007	MiB: 16.00000	Copy: 99.957 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02920	MiB: 16.00000	Copy: 547.893 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG	Method: MEMCPY	Elapsed: 0.02874	MiB: 16.00000	Copy: 556.777 MiB/s
AVG	Method: DUMB	Elapsed: 0.16041	MiB: 16.00000	Copy: 99.746 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02908	MiB: 16.00000	Copy: 550.165 MiB/s
Setting scaling_max_freq to 1000000
cpu0 cur_freq: 1000000
cpu1 cur_freq: 1000000
AVG	Method: MEMCPY	Elapsed: 0.02750	MiB: 16.00000	Copy: 581.820 MiB/s
AVG	Method: DUMB	Elapsed: 0.15203	MiB: 16.00000	Copy: 105.245 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.02770	MiB: 16.00000	Copy: 577.657 MiB/s

is this with load or not?

some load, nothing v heavy.

EDIT: mind you this is an NSS build too.

oh wow i notice only now that i'm running the cache to 1.2ghz and not 1.4ghz... can 200mhz be that different?

can't understand why i can't reach the 650+

root@No-Lag-Router:/etc/config/openwrt-r7800-freq-test-master# ./mbw 32
Long uses 4 bytes. Allocating 2*8388608 elements = 67108864 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0       Method: MEMCPY  Elapsed: 0.04169        MiB: 32.00000   Copy: 767.497 MiB/s
1       Method: MEMCPY  Elapsed: 0.04292        MiB: 32.00000   Copy: 745.608 MiB/s
2       Method: MEMCPY  Elapsed: 0.04208        MiB: 32.00000   Copy: 760.492 MiB/s
3       Method: MEMCPY  Elapsed: 0.04288        MiB: 32.00000   Copy: 746.286 MiB/s
4       Method: MEMCPY  Elapsed: 0.04139        MiB: 32.00000   Copy: 773.059 MiB/s
5       Method: MEMCPY  Elapsed: 0.04228        MiB: 32.00000   Copy: 756.823 MiB/s
6       Method: MEMCPY  Elapsed: 0.04212        MiB: 32.00000   Copy: 759.698 MiB/s
7       Method: MEMCPY  Elapsed: 0.04335        MiB: 32.00000   Copy: 738.110 MiB/s
8       Method: MEMCPY  Elapsed: 0.04192        MiB: 32.00000   Copy: 763.450 MiB/s
9       Method: MEMCPY  Elapsed: 0.05502        MiB: 32.00000   Copy: 581.586 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.04357        MiB: 32.00000   Copy: 734.521 MiB/s
0       Method: DUMB    Elapsed: 0.18382        MiB: 32.00000   Copy: 174.083 MiB/s
1       Method: DUMB    Elapsed: 0.18301        MiB: 32.00000   Copy: 174.851 MiB/s
2       Method: DUMB    Elapsed: 0.18235        MiB: 32.00000   Copy: 175.486 MiB/s
3       Method: DUMB    Elapsed: 0.18195        MiB: 32.00000   Copy: 175.872 MiB/s
4       Method: DUMB    Elapsed: 0.18399        MiB: 32.00000   Copy: 173.920 MiB/s
5       Method: DUMB    Elapsed: 0.18321        MiB: 32.00000   Copy: 174.666 MiB/s
6       Method: DUMB    Elapsed: 0.18171        MiB: 32.00000   Copy: 176.100 MiB/s
7       Method: DUMB    Elapsed: 0.18234        MiB: 32.00000   Copy: 175.492 MiB/s
8       Method: DUMB    Elapsed: 0.18172        MiB: 32.00000   Copy: 176.095 MiB/s
9       Method: DUMB    Elapsed: 0.18312        MiB: 32.00000   Copy: 174.749 MiB/s
AVG     Method: DUMB    Elapsed: 0.18272        MiB: 32.00000   Copy: 175.128 MiB/s
0       Method: MCBLOCK Elapsed: 0.04140        MiB: 32.00000   Copy: 772.891 MiB/s
1       Method: MCBLOCK Elapsed: 0.04190        MiB: 32.00000   Copy: 763.687 MiB/s
2       Method: MCBLOCK Elapsed: 0.04164        MiB: 32.00000   Copy: 768.547 MiB/s
3       Method: MCBLOCK Elapsed: 0.04181        MiB: 32.00000   Copy: 765.331 MiB/s
4       Method: MCBLOCK Elapsed: 0.04121        MiB: 32.00000   Copy: 776.586 MiB/s
5       Method: MCBLOCK Elapsed: 0.04190        MiB: 32.00000   Copy: 763.760 MiB/s
6       Method: MCBLOCK Elapsed: 0.04130        MiB: 32.00000   Copy: 774.800 MiB/s
7       Method: MCBLOCK Elapsed: 0.04104        MiB: 32.00000   Copy: 779.651 MiB/s
8       Method: MCBLOCK Elapsed: 0.04162        MiB: 32.00000   Copy: 768.861 MiB/s
9       Method: MCBLOCK Elapsed: 0.04527        MiB: 32.00000   Copy: 706.807 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04191        MiB: 32.00000   Copy: 763.546 MiB/s

ok....

No luck... it seems the workaround is still needed but i think there is a wrong implementation in the qcom clk driver... i will have some fun using their bad implementation and checking if this solves the workaround....

I honestly don't really like the fact that to scale the cpu cache, it must be set twice...

not sure it isn't required though, there is some comment somewhere in one of the qsdk drivers about needing a pulldown before switching clock speeds. it's what made me try it in the first place.

1 Like

can you find that comment?

not until at least the weekend. it was ages ago :(.

1 Like

For the sake of science/statistics.

Linux 5.10.37 (DSA) - ondemand

AVG     Method: MEMCPY  Elapsed: 0.04424        MiB: 32.00000   Copy: 723.300 MiB/s
AVG     Method: DUMB    Elapsed: 0.18647        MiB: 32.00000   Copy: 171.613 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.05025        MiB: 32.00000   Copy: 636.863 MiB/s

The above results was a good run tho', generally it's closer to the 4.14.151 results.

Linux 5.10.37 (DSA) - performance

AVG     Method: MEMCPY  Elapsed: 0.04284        MiB: 32.00000   Copy: 747.013 MiB/s
AVG     Method: DUMB    Elapsed: 0.18988        MiB: 32.00000   Copy: 168.527 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04380        MiB: 32.00000   Copy: 730.529 MiB/s

It really shines here, not sure if it's the kernel change or something you did @ansuel.

3 Likes

Linux 5.10.37 (DSA) - under high cpu load - (iperf3 pumping packets)

AVG     Method: MEMCPY  Elapsed: 0.06352        MiB: 32.00000   Copy: 503.764 MiB/s
AVG     Method: DUMB    Elapsed: 0.21935        MiB: 32.00000   Copy: 145.883 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.06311        MiB: 32.00000   Copy: 507.028 MiB/s