R7500v2 kernel 4.19 test

@anon98444528

Wed Jun 26 02:51:50 2019 kern.notice kernel: [    1.546629] cpuidle: enable-method property 'qcom,kpss-acc-v1' found operations
Wed Jun 26 02:51:50 2019 kern.err kernel: [    1.547062] CPUidle arm: CPU 0 failed to init idle CPU ops. Error code: -6
Wed Jun 26 02:51:50 2019 kern.notice kernel: [    1.554347] cpuidle: enable-method property 'qcom,kpss-acc-v1' found operations
Wed Jun 26 02:51:50 2019 kern.err kernel: [    1.561193] CPUidle arm: CPU 1 failed to init idle CPU ops. Error code: -6
1 Like

i think thats

#define	ENXIO		 6	/* No such device or address */

but I don't know why its negative

1 Like

errors are always negative

anyway yes it's that... so.... think they changed the specification in dts ???

you should take a look. I will need some time to think and fresh eyes (its time for me to sleep)

1 Like

If you are willing to keep looking at the 4.19 cpuidle-arm.c driver as opposed other approaches like the dts files, I think it might be helpful to try a quick test to to register the driver before initialization as you suggested.

it would be helpful to know if the kernel errors goes away and if cpuidle/current_driver shows something other than none.

Something like

...

	ret = cpuidle_register_driver(drv);
	if (ret) {
		if (ret != -EBUSY)
			pr_err("Failed to register cpuidle driver\n");
		goto out_kfree_drv;
	}

	/*
	 * Call arch CPU operations in order to initialize
	 * idle states suspend back-end specific data
	 */
	ret = arm_cpuidle_init(cpu);
...

no worries if your busy or lost interest. it'll just take me a while to do it myself and I'm still contemplating...

actually i tested that yesterday and you just gain a bootloop for NULL POINTER EXCEPTION

ok (and thank you), i think that means I need to do some reading and broaden what I'm looking at. I'll post back on this about what I find.

Also if you want to try "socializing" this a bit to see if the answer really is already known as I suspect please feel free...

think we should write on the arm kernel mailing list... IMHO this is "fixed" for some platform and broken for others... or just broken for everybody....

I'm unfamiliar with and unknown on the arm kernel mailing list, but if you feel comfortable with that, then sure. I was thinking something more casual like irc...

quick update:
5+ days on 4.19.53 no new issues
upgraded and using 4.19.56 for just under a day w/o issues

no real progress understanding what's up with cpuidle

I did compile 4.19.56 with "CPU Idle Driver for QCOM processors" deselected (added by "0059-ARM-cpuidle-Add-cpuidle-support-for-QCOM-cpus.patch") but left "Generic ARM/ARM64 CPU idle Driver" selected - compiles and runs but same errors and "none" for current_driver.

the linux-arm-kernel mailing list is helpful as you can see the comments of the devs as they are making changes but I haven't found a good clue there yet...

the kernel docs on idle-states and the device tree specification docs (v1) are helpful to understand how things should work.

Right now I'm looking for what might generate ENXIO errors related to cpuidle to try and trance the error back through the code. The comment

	/*
	 * SPM probe for the cpu should have happened by now, if the
	 * SPM device does not exist, return -ENXIO to indicate that the
	 * cpu does not support idle states.
	 */
check_spm:
	return per_cpu(cpu_spm_drv, cpu) ? 0 : -ENXIO;

in drivers/soc/qcom/spm.c under

static int __init qcom_cpuidle_init(struct device_node *cpu_node, int cpu)

looks promising but I'm still working through if this gets called from arm_cpuidle_init(cpu)

can we check if spm is actually working?


Notice that we have patch 0070 that ""fix"" something in spm

change -ENXIO in spm.c to -788 and edit pr_err as before?

yeah, part of today I spent diffing 4.14 patches and 4.19 patches... still need to get back to that

let's try remove that as that crash could be caused by wrong cpuidle driver that registred before inizialization ?

not sure I follow, but since it might take longer for you to explain it than try it, I'd say try it

compiling... and i will check with git some changes to that file... hope they changed something between kernel 4.14 and 4.19

think we should fully debug that file... ad check per_cpu(qcom_idle_ops, cpu) it could very well be that qcom_idle_ops is NULL

problem is find what they change to idle_ops

"learning" kernel debugging during boot is also on my mind... too many learning curve items for me atm combined with old hardware, old brain - it seems like as soon as I dust of the cobwebs the spiders make new ones faster than I can keep up

ok i have bloated the code with pr_warn LOL... let's see if it goes directly to the goto or it does actually inizialize something

1 Like

mhhh we should test if with kernel 4.14 the driver does the same thing...

anyway it goes directly to check_spm so we need to investigate only

per_cpu(cpu_spm_drv, cpu)

@anon98444528 ok i think our problem is in the spm probing...

spm_dev_probe this is responsible for

per_cpu(cpu_spm_drv, cpu) = drv; and if this is NULL the error is triggered

1 Like