R7800 cache scaling issue

the patch improves memory performance significantly, I have tested it using the same software as @facboy and indeed, it does improve memory performance.

Does it translate into a higher throughput?

not with cake, of if it does, then not by a noticeable amount.

What did improve throughput brutally was this:
for file in /sys/class/net/*
do
echo 3 > $file"/queues/rx-0/rps_cpus"
echo 3 > $file"/queues/tx-0/xps_cpus"
done

1 Like

that looks like it doesn't apply cleanly to 4.19. this is really a 4.14 patch tbh, it just deletes a bunch of stuff that i think has now been merged to 4.19 and replaces it. would need to look at vanilla 4.19 and work out what to change again.

i don't really know how to swap the cpufreq-dt driver to a specific driver for ipq806x...quite a lot of cpufreq-dt is still used. i don't really want to copy/paste the whole thing. i was planning to move all the L2 stuff to a specific driver, but still call it from cpufreq-dt (like the fab_scaling atm).

obviously i reworked to apply on 4.19 and it does compile well

Only problem is this... also the fix you post about l2 scaling is included in your repo?

sorry what i meant is that it doesn't 'work' properly on 4.19, i'm sure you fixed it to compile :). from what i remember a lot of stuff around krait L2 and the voltage management has changed in 4.19, and the patch set on 4.14 is an 'old' version that was never applied. i assumed that a completely different patch would be needed on 4.19, so i gave up trying to work out what was going on with the 4.14 patch set and just continued with dissent's approach of deleting and replacing it.

you mean the fix so that it transitions to the "correct" L2 speed? yes that is in my repo. i remember seeing references to it one of the patch sets i think, it talks about switching the PLL on L2 transition to the secondary or something (which is 384Mhz).

ok can confirm the L2 scaling is still busted on 4.19 after building your PR. i will look into a fix when i have time.

1 Like

Thanks, interestingly this also helps other multi-core arm routers along (in my case an mvebu turris omnia).

1 Like

@facboy searching in the various mess of the L2 cache i found THIS

port to linux-4.19 and running normally

the link is broken but LUCKLY i found it

Will check it and compare them... As we are finally putting some work in kernel 4.19 to master i think we can finally implement this...


With original pr ported for 4.19 i got this result

root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.07640        MiB: 32.00000   Copy: 418.856 MiB/s
AVG     Method: DUMB    Elapsed: 0.46355        MiB: 32.00000   Copy: 69.032 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.08433        MiB: 32.00000   Copy: 379.459 MiB/s

Modified the patch with your changes and

here the report

root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.05278        MiB: 32.00000   Copy: 606.316 MiB/s
AVG     Method: DUMB    Elapsed: 0.49566        MiB: 32.00000   Copy: 64.561 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.05122        MiB: 32.00000   Copy: 624.724 MiB/s
root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.06195        MiB: 32.00000   Copy: 516.577 MiB/s
AVG     Method: DUMB    Elapsed: 0.51844        MiB: 32.00000   Copy: 61.724 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.05544        MiB: 32.00000   Copy: 577.208 MiB/s
root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG     Method: MEMCPY  Elapsed: 0.06237        MiB: 32.00000   Copy: 513.046 MiB/s
AVG     Method: DUMB    Elapsed: 0.49267        MiB: 32.00000   Copy: 64.952 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04944        MiB: 32.00000   Copy: 647.219 MiB/s

Also can you tell me why the mutex lock ?
I added a check as i notice that it could happen that new_l2_volt is never updated (and cause problem with voltage regulator set)

	if (new_l2_volt > 0) {
		ret = regulator_set_voltage_tol(l2_regulator,new_l2_volt,tol);
		if (ret) {
			goto l2_reg_fail;
		}
	}

Did you test without the new L2 scaling code? I remember the patch slowing down the router (throughput with SQM and cake) by 30% in my tests?

actually no... can you test for me?

Also the slow down was caused by a bug in the patch... I tested with max frequency and it does keep high results

With 4.19? I cannot take the router out of service without putting my life at risk :slight_smile:

Is that the IF statement in the post above?

No test it with 4.14 (the patchset hasn't changed) (I mean in stock build without this patch)
The bug was the transition directly from an old frequency to the new one... It looks like there is a bug and we first need to set idle frequency and then set the new frequency.

The if statement fix a problem with kernel 4.19

I will do that later today. Assuming there are no major performance difference between the kernels, the results should be transferable.
I will test in 19.07-SNAPSHOT that I am running.

The stock seems to be faster....

I have this unscientific feeling that the higher throughput is observed when running on CPU0 and slower when running on CPU1. Is there a way to pin a process to the CPU?

    echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
    echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
    echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
    echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
    sleep 1
    echo 1750000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
    echo 1750000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
while [ true ]; do ./mbw 32 | grep AVG; done
AVG Method: MEMCPY Elapsed: 0.05188 MiB: 32.00000 Copy: 616.839 MiB/s
AVG Method: DUMB Elapsed: 0.18568 MiB: 32.00000 Copy: 172.335 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04571 MiB: 32.00000 Copy: 700.046 MiB/s

AVG Method: MEMCPY Elapsed: 0.04487 MiB: 32.00000 Copy: 713.208 MiB/s
AVG Method: DUMB Elapsed: 0.19181 MiB: 32.00000 Copy: 166.831 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04665 MiB: 32.00000 Copy: 686.021 MiB/s

AVG Method: MEMCPY Elapsed: 0.04886 MiB: 32.00000 Copy: 654.923 MiB/s
AVG Method: DUMB Elapsed: 0.18831 MiB: 32.00000 Copy: 169.937 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04642 MiB: 32.00000 Copy: 689.294 MiB/s

AVG Method: MEMCPY Elapsed: 0.05121 MiB: 32.00000 Copy: 624.866 MiB/s
AVG Method: DUMB Elapsed: 0.20197 MiB: 32.00000 Copy: 158.439 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05017 MiB: 32.00000 Copy: 637.890 MiB/s

AVG Method: MEMCPY Elapsed: 0.05931 MiB: 32.00000 Copy: 539.583 MiB/s
AVG Method: DUMB Elapsed: 0.24462 MiB: 32.00000 Copy: 130.814 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04943 MiB: 32.00000 Copy: 647.329 MiB/s

AVG Method: MEMCPY Elapsed: 0.04622 MiB: 32.00000 Copy: 692.332 MiB/s
AVG Method: DUMB Elapsed: 0.19109 MiB: 32.00000 Copy: 167.461 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04570 MiB: 32.00000 Copy: 700.242 MiB/s

AVG Method: MEMCPY Elapsed: 0.04622 MiB: 32.00000 Copy: 692.380 MiB/s
AVG Method: DUMB Elapsed: 0.19142 MiB: 32.00000 Copy: 167.171 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04497 MiB: 32.00000 Copy: 711.622 MiB/s

AVG Method: MEMCPY Elapsed: 0.04687 MiB: 32.00000 Copy: 682.801 MiB/s
AVG Method: DUMB Elapsed: 0.20309 MiB: 32.00000 Copy: 157.567 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04563 MiB: 32.00000 Copy: 701.282 MiB/s

AVG Method: MEMCPY Elapsed: 0.04839 MiB: 32.00000 Copy: 661.266 MiB/s
AVG Method: DUMB Elapsed: 0.18894 MiB: 32.00000 Copy: 169.368 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04539 MiB: 32.00000 Copy: 704.998 MiB/s

AVG Method: MEMCPY Elapsed: 0.04668 MiB: 32.00000 Copy: 685.542 MiB/s
AVG Method: DUMB Elapsed: 0.18796 MiB: 32.00000 Copy: 170.251 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04534 MiB: 32.00000 Copy: 705.783 MiB/s

AVG Method: MEMCPY Elapsed: 0.04464 MiB: 32.00000 Copy: 716.785 MiB/s
AVG Method: DUMB Elapsed: 0.18786 MiB: 32.00000 Copy: 170.341 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04567 MiB: 32.00000 Copy: 700.604 MiB/s

was think the same thing re: pin>cpu...

master-4.14-stock w interest performance run
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04440	MiB: 32.00000	Copy: 720.708 MiB/s
AVG	Method: DUMB	Elapsed: 0.19026	MiB: 32.00000	Copy: 168.190 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04474	MiB: 32.00000	Copy: 715.312 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04445	MiB: 32.00000	Copy: 719.938 MiB/s
AVG	Method: DUMB	Elapsed: 0.19087	MiB: 32.00000	Copy: 167.658 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04481	MiB: 32.00000	Copy: 714.193 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./set_scaling_governor.sh ondemand
[root@syno-rt2600ac /mbw-l2-scalingtest 53°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04533	MiB: 32.00000	Copy: 705.989 MiB/s
AVG	Method: DUMB	Elapsed: 0.19149	MiB: 32.00000	Copy: 167.107 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04530	MiB: 32.00000	Copy: 706.339 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04511	MiB: 32.00000	Copy: 709.421 MiB/s
AVG	Method: DUMB	Elapsed: 0.19192	MiB: 32.00000	Copy: 166.738 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04521	MiB: 32.00000	Copy: 707.860 MiB/s


>>>>>>>>>>>>>
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./set_scaling_governor.sh performance
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.07416	MiB: 32.00000	Copy: 431.526 MiB/s
AVG	Method: DUMB	Elapsed: 0.20964	MiB: 32.00000	Copy: 152.643 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.07472	MiB: 32.00000	Copy: 428.245 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.07404	MiB: 32.00000	Copy: 432.189 MiB/s
AVG	Method: DUMB	Elapsed: 0.20878	MiB: 32.00000	Copy: 153.269 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.07582	MiB: 32.00000	Copy: 422.069 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.07639	MiB: 32.00000	Copy: 418.882 MiB/s
AVG	Method: DUMB	Elapsed: 0.20901	MiB: 32.00000	Copy: 153.104 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.07409	MiB: 32.00000	Copy: 431.881 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.07407	MiB: 32.00000	Copy: 432.005 MiB/s
AVG	Method: DUMB	Elapsed: 0.20792	MiB: 32.00000	Copy: 153.903 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.07433	MiB: 32.00000	Copy: 430.509 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./set_scaling_governor.sh performance
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.07436	MiB: 32.00000	Copy: 430.331 MiB/s
AVG	Method: DUMB	Elapsed: 0.20926	MiB: 32.00000	Copy: 152.919 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.07646	MiB: 32.00000	Copy: 418.498 MiB/s
>>>>>>>>>>>>>


[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./set_scaling_governor.sh ondemand
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04608	MiB: 32.00000	Copy: 694.392 MiB/s
AVG	Method: DUMB	Elapsed: 0.19306	MiB: 32.00000	Copy: 165.747 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04627	MiB: 32.00000	Copy: 691.629 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.07214	MiB: 32.00000	Copy: 443.589 MiB/s
AVG	Method: DUMB	Elapsed: 0.21197	MiB: 32.00000	Copy: 150.961 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.07222	MiB: 32.00000	Copy: 443.085 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04543	MiB: 32.00000	Copy: 704.331 MiB/s
AVG	Method: DUMB	Elapsed: 0.19170	MiB: 32.00000	Copy: 166.925 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04542	MiB: 32.00000	Copy: 704.490 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./set_scaling_governor.sh performance
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04475	MiB: 32.00000	Copy: 715.007 MiB/s
AVG	Method: DUMB	Elapsed: 0.19062	MiB: 32.00000	Copy: 167.869 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04494	MiB: 32.00000	Copy: 712.062 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04942	MiB: 32.00000	Copy: 647.570 MiB/s
AVG	Method: DUMB	Elapsed: 0.19690	MiB: 32.00000	Copy: 162.517 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.05529	MiB: 32.00000	Copy: 578.731 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04502	MiB: 32.00000	Copy: 710.822 MiB/s
AVG	Method: DUMB	Elapsed: 0.19053	MiB: 32.00000	Copy: 167.954 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04597	MiB: 32.00000	Copy: 696.158 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04478	MiB: 32.00000	Copy: 714.665 MiB/s
AVG	Method: DUMB	Elapsed: 0.19011	MiB: 32.00000	Copy: 168.326 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04504	MiB: 32.00000	Copy: 710.416 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04487	MiB: 32.00000	Copy: 713.143 MiB/s
AVG	Method: DUMB	Elapsed: 0.18991	MiB: 32.00000	Copy: 168.499 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04521	MiB: 32.00000	Copy: 707.863 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG	Method: MEMCPY	Elapsed: 0.04484	MiB: 32.00000	Copy: 713.687 MiB/s
AVG	Method: DUMB	Elapsed: 0.19013	MiB: 32.00000	Copy: 168.304 MiB/s
AVG	Method: MCBLOCK	Elapsed: 0.04521	MiB: 32.00000	Copy: 707.752 MiB/s

this results are with?

If it's stock then it's confirmed that there is a problem with scaling... it's strange that we have 400 mb with performance gov

Also take notice that my test are done with a softethervpn running... i notice some time avg of 700MB but the good part is that i never experienced 300-400 mb

Can you pls use the steps from my post just above yours to set performance governor? There were posts in the past that the frequencies are not properly set unless a specific sequence of steps is followed?

1 Like

on my test one is with ondemand... the other is with min frequency scaling set to maxium

I am confused: the stock seems to be delivering better memory throughput (700MB/s vs 600MB/s). What is the problem then?