the patch improves memory performance significantly, I have tested it using the same software as @facboy and indeed, it does improve memory performance.
Does it translate into a higher throughput?
not with cake, of if it does, then not by a noticeable amount.
What did improve throughput brutally was this:
for file in /sys/class/net/*
do
echo 3 > $file"/queues/rx-0/rps_cpus"
echo 3 > $file"/queues/tx-0/xps_cpus"
done
that looks like it doesn't apply cleanly to 4.19. this is really a 4.14 patch tbh, it just deletes a bunch of stuff that i think has now been merged to 4.19 and replaces it. would need to look at vanilla 4.19 and work out what to change again.
i don't really know how to swap the cpufreq-dt driver to a specific driver for ipq806x...quite a lot of cpufreq-dt is still used. i don't really want to copy/paste the whole thing. i was planning to move all the L2 stuff to a specific driver, but still call it from cpufreq-dt (like the fab_scaling atm).
obviously i reworked to apply on 4.19 and it does compile well
Only problem is this... also the fix you post about l2 scaling is included in your repo?
sorry what i meant is that it doesn't 'work' properly on 4.19, i'm sure you fixed it to compile :). from what i remember a lot of stuff around krait L2 and the voltage management has changed in 4.19, and the patch set on 4.14 is an 'old' version that was never applied. i assumed that a completely different patch would be needed on 4.19, so i gave up trying to work out what was going on with the 4.14 patch set and just continued with dissent's approach of deleting and replacing it.
you mean the fix so that it transitions to the "correct" L2 speed? yes that is in my repo. i remember seeing references to it one of the patch sets i think, it talks about switching the PLL on L2 transition to the secondary or something (which is 384Mhz).
ok can confirm the L2 scaling is still busted on 4.19 after building your PR. i will look into a fix when i have time.
Thanks, interestingly this also helps other multi-core arm routers along (in my case an mvebu turris omnia).
@facboy searching in the various mess of the L2 cache i found THIS
port to linux-4.19 and running normally
the link is broken but LUCKLY i found it
Will check it and compare them... As we are finally putting some work in kernel 4.19 to master i think we can finally implement this...
With original pr ported for 4.19 i got this result
root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07640 MiB: 32.00000 Copy: 418.856 MiB/s
AVG Method: DUMB Elapsed: 0.46355 MiB: 32.00000 Copy: 69.032 MiB/s
AVG Method: MCBLOCK Elapsed: 0.08433 MiB: 32.00000 Copy: 379.459 MiB/s
Modified the patch with your changes and
here the report
root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.05278 MiB: 32.00000 Copy: 606.316 MiB/s
AVG Method: DUMB Elapsed: 0.49566 MiB: 32.00000 Copy: 64.561 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05122 MiB: 32.00000 Copy: 624.724 MiB/s
root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.06195 MiB: 32.00000 Copy: 516.577 MiB/s
AVG Method: DUMB Elapsed: 0.51844 MiB: 32.00000 Copy: 61.724 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05544 MiB: 32.00000 Copy: 577.208 MiB/s
root@No-Lag-Router:/tmp# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.06237 MiB: 32.00000 Copy: 513.046 MiB/s
AVG Method: DUMB Elapsed: 0.49267 MiB: 32.00000 Copy: 64.952 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04944 MiB: 32.00000 Copy: 647.219 MiB/s
Also can you tell me why the mutex lock ?
I added a check as i notice that it could happen that new_l2_volt is never updated (and cause problem with voltage regulator set)
if (new_l2_volt > 0) {
ret = regulator_set_voltage_tol(l2_regulator,new_l2_volt,tol);
if (ret) {
goto l2_reg_fail;
}
}
Did you test without the new L2 scaling code? I remember the patch slowing down the router (throughput with SQM and cake) by 30% in my tests?
actually no... can you test for me?
Also the slow down was caused by a bug in the patch... I tested with max frequency and it does keep high results
With 4.19? I cannot take the router out of service without putting my life at risk
Is that the IF statement in the post above?
No test it with 4.14 (the patchset hasn't changed) (I mean in stock build without this patch)
The bug was the transition directly from an old frequency to the new one... It looks like there is a bug and we first need to set idle frequency and then set the new frequency.
The if statement fix a problem with kernel 4.19
I will do that later today. Assuming there are no major performance difference between the kernels, the results should be transferable.
I will test in 19.07-SNAPSHOT that I am running.
The stock seems to be faster....
I have this unscientific feeling that the higher throughput is observed when running on CPU0 and slower when running on CPU1. Is there a way to pin a process to the CPU?
echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor
echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
sleep 1
echo 1750000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1750000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
while [ true ]; do ./mbw 32 | grep AVG; done
AVG Method: MEMCPY Elapsed: 0.05188 MiB: 32.00000 Copy: 616.839 MiB/s
AVG Method: DUMB Elapsed: 0.18568 MiB: 32.00000 Copy: 172.335 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04571 MiB: 32.00000 Copy: 700.046 MiB/s
AVG Method: MEMCPY Elapsed: 0.04487 MiB: 32.00000 Copy: 713.208 MiB/s
AVG Method: DUMB Elapsed: 0.19181 MiB: 32.00000 Copy: 166.831 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04665 MiB: 32.00000 Copy: 686.021 MiB/s
AVG Method: MEMCPY Elapsed: 0.04886 MiB: 32.00000 Copy: 654.923 MiB/s
AVG Method: DUMB Elapsed: 0.18831 MiB: 32.00000 Copy: 169.937 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04642 MiB: 32.00000 Copy: 689.294 MiB/s
AVG Method: MEMCPY Elapsed: 0.05121 MiB: 32.00000 Copy: 624.866 MiB/s
AVG Method: DUMB Elapsed: 0.20197 MiB: 32.00000 Copy: 158.439 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05017 MiB: 32.00000 Copy: 637.890 MiB/s
AVG Method: MEMCPY Elapsed: 0.05931 MiB: 32.00000 Copy: 539.583 MiB/s
AVG Method: DUMB Elapsed: 0.24462 MiB: 32.00000 Copy: 130.814 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04943 MiB: 32.00000 Copy: 647.329 MiB/s
AVG Method: MEMCPY Elapsed: 0.04622 MiB: 32.00000 Copy: 692.332 MiB/s
AVG Method: DUMB Elapsed: 0.19109 MiB: 32.00000 Copy: 167.461 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04570 MiB: 32.00000 Copy: 700.242 MiB/s
AVG Method: MEMCPY Elapsed: 0.04622 MiB: 32.00000 Copy: 692.380 MiB/s
AVG Method: DUMB Elapsed: 0.19142 MiB: 32.00000 Copy: 167.171 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04497 MiB: 32.00000 Copy: 711.622 MiB/s
AVG Method: MEMCPY Elapsed: 0.04687 MiB: 32.00000 Copy: 682.801 MiB/s
AVG Method: DUMB Elapsed: 0.20309 MiB: 32.00000 Copy: 157.567 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04563 MiB: 32.00000 Copy: 701.282 MiB/s
AVG Method: MEMCPY Elapsed: 0.04839 MiB: 32.00000 Copy: 661.266 MiB/s
AVG Method: DUMB Elapsed: 0.18894 MiB: 32.00000 Copy: 169.368 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04539 MiB: 32.00000 Copy: 704.998 MiB/s
AVG Method: MEMCPY Elapsed: 0.04668 MiB: 32.00000 Copy: 685.542 MiB/s
AVG Method: DUMB Elapsed: 0.18796 MiB: 32.00000 Copy: 170.251 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04534 MiB: 32.00000 Copy: 705.783 MiB/s
AVG Method: MEMCPY Elapsed: 0.04464 MiB: 32.00000 Copy: 716.785 MiB/s
AVG Method: DUMB Elapsed: 0.18786 MiB: 32.00000 Copy: 170.341 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04567 MiB: 32.00000 Copy: 700.604 MiB/s
was think the same thing re: pin>cpu...
master-4.14-stock w interest performance run
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04440 MiB: 32.00000 Copy: 720.708 MiB/s
AVG Method: DUMB Elapsed: 0.19026 MiB: 32.00000 Copy: 168.190 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04474 MiB: 32.00000 Copy: 715.312 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04445 MiB: 32.00000 Copy: 719.938 MiB/s
AVG Method: DUMB Elapsed: 0.19087 MiB: 32.00000 Copy: 167.658 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04481 MiB: 32.00000 Copy: 714.193 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./set_scaling_governor.sh ondemand
[root@syno-rt2600ac /mbw-l2-scalingtest 53°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04533 MiB: 32.00000 Copy: 705.989 MiB/s
AVG Method: DUMB Elapsed: 0.19149 MiB: 32.00000 Copy: 167.107 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04530 MiB: 32.00000 Copy: 706.339 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04511 MiB: 32.00000 Copy: 709.421 MiB/s
AVG Method: DUMB Elapsed: 0.19192 MiB: 32.00000 Copy: 166.738 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04521 MiB: 32.00000 Copy: 707.860 MiB/s
>>>>>>>>>>>>>
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./set_scaling_governor.sh performance
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07416 MiB: 32.00000 Copy: 431.526 MiB/s
AVG Method: DUMB Elapsed: 0.20964 MiB: 32.00000 Copy: 152.643 MiB/s
AVG Method: MCBLOCK Elapsed: 0.07472 MiB: 32.00000 Copy: 428.245 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07404 MiB: 32.00000 Copy: 432.189 MiB/s
AVG Method: DUMB Elapsed: 0.20878 MiB: 32.00000 Copy: 153.269 MiB/s
AVG Method: MCBLOCK Elapsed: 0.07582 MiB: 32.00000 Copy: 422.069 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07639 MiB: 32.00000 Copy: 418.882 MiB/s
AVG Method: DUMB Elapsed: 0.20901 MiB: 32.00000 Copy: 153.104 MiB/s
AVG Method: MCBLOCK Elapsed: 0.07409 MiB: 32.00000 Copy: 431.881 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07407 MiB: 32.00000 Copy: 432.005 MiB/s
AVG Method: DUMB Elapsed: 0.20792 MiB: 32.00000 Copy: 153.903 MiB/s
AVG Method: MCBLOCK Elapsed: 0.07433 MiB: 32.00000 Copy: 430.509 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./set_scaling_governor.sh performance
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07436 MiB: 32.00000 Copy: 430.331 MiB/s
AVG Method: DUMB Elapsed: 0.20926 MiB: 32.00000 Copy: 152.919 MiB/s
AVG Method: MCBLOCK Elapsed: 0.07646 MiB: 32.00000 Copy: 418.498 MiB/s
>>>>>>>>>>>>>
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./set_scaling_governor.sh ondemand
[root@syno-rt2600ac /mbw-l2-scalingtest 55°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04608 MiB: 32.00000 Copy: 694.392 MiB/s
AVG Method: DUMB Elapsed: 0.19306 MiB: 32.00000 Copy: 165.747 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04627 MiB: 32.00000 Copy: 691.629 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.07214 MiB: 32.00000 Copy: 443.589 MiB/s
AVG Method: DUMB Elapsed: 0.21197 MiB: 32.00000 Copy: 150.961 MiB/s
AVG Method: MCBLOCK Elapsed: 0.07222 MiB: 32.00000 Copy: 443.085 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 56°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04543 MiB: 32.00000 Copy: 704.331 MiB/s
AVG Method: DUMB Elapsed: 0.19170 MiB: 32.00000 Copy: 166.925 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04542 MiB: 32.00000 Copy: 704.490 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./set_scaling_governor.sh performance
[root@syno-rt2600ac /mbw-l2-scalingtest 54°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04475 MiB: 32.00000 Copy: 715.007 MiB/s
AVG Method: DUMB Elapsed: 0.19062 MiB: 32.00000 Copy: 167.869 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04494 MiB: 32.00000 Copy: 712.062 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04942 MiB: 32.00000 Copy: 647.570 MiB/s
AVG Method: DUMB Elapsed: 0.19690 MiB: 32.00000 Copy: 162.517 MiB/s
AVG Method: MCBLOCK Elapsed: 0.05529 MiB: 32.00000 Copy: 578.731 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04502 MiB: 32.00000 Copy: 710.822 MiB/s
AVG Method: DUMB Elapsed: 0.19053 MiB: 32.00000 Copy: 167.954 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04597 MiB: 32.00000 Copy: 696.158 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04478 MiB: 32.00000 Copy: 714.665 MiB/s
AVG Method: DUMB Elapsed: 0.19011 MiB: 32.00000 Copy: 168.326 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04504 MiB: 32.00000 Copy: 710.416 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04487 MiB: 32.00000 Copy: 713.143 MiB/s
AVG Method: DUMB Elapsed: 0.18991 MiB: 32.00000 Copy: 168.499 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04521 MiB: 32.00000 Copy: 707.863 MiB/s
[root@syno-rt2600ac /mbw-l2-scalingtest 57°]# ./mbw 32 | grep AVG
AVG Method: MEMCPY Elapsed: 0.04484 MiB: 32.00000 Copy: 713.687 MiB/s
AVG Method: DUMB Elapsed: 0.19013 MiB: 32.00000 Copy: 168.304 MiB/s
AVG Method: MCBLOCK Elapsed: 0.04521 MiB: 32.00000 Copy: 707.752 MiB/s
this results are with?
If it's stock then it's confirmed that there is a problem with scaling... it's strange that we have 400 mb with performance gov
Also take notice that my test are done with a softethervpn running... i notice some time avg of 700MB but the good part is that i never experienced 300-400 mb
Can you pls use the steps from my post just above yours to set performance governor? There were posts in the past that the frequencies are not properly set unless a specific sequence of steps is followed?
on my test one is with ondemand... the other is with min frequency scaling set to maxium
I am confused: the stock seems to be delivering better memory throughput (700MB/s vs 600MB/s). What is the problem then?