I repeated the changes to the dts in this post above. This time no kernel oops on boot/reboot and no loss of wifi. This makes me doubt that I tried rebooting/powercycles after the wifi outage... but I'm pretty sure I did (bad nvram flash?).
In any event, it may not have been related and this might be worth testing (I'm using it now with functional wifi and thermal sensors). Upon nvram flash only part of my settings were retained (I lost some files in ~/ (easily restored). This type of update strangeness I have experienced before so it may not be related. Or the 2KB after 0x904000 is used for something and causing the nvram flashing inconsistencies.
If your brave, would you mind testing these dts changes as well?
For the record, I also tried several other variants none of which worked. If I expand the gcc mem range to 0x8000 and try several variants of placing the thermal sensor range in that range, I get the same error. This makes me think this is not a lack of memory issue but an inability to use the gcc range in 19 as was done in 14. The router will not boot if I make the gcc range size smaller - it wants to be 0x4000.
I made some attempts at using reserved-memory but the best result I got from this indicates I'll need to edit thermal sensor driver code to get further... and I don't know if it will work even if I do it correctly.
5+ days up and no obvious errors or issues; however, at least on R7800 user can't boot after building an image based on my k419 git hub branch. I don't know the reason for this boot failure yet but I suspect the thermal sensor patch.
My best guess about why thermal sensors fails on 4.19 but works on 4.14 has to do with how the current thermal sensor driver requests memory. The thermal sensor driver (init_common in drivers/thermal/qcom/tsense_common.c) makes calls to devm_ioremap_resource (in lib/devres.c). Subsequent function calls from the devm_ioremap_resource function check if another driver has requested the memory region which of course the gcc driver has. This might have worked in 4.14 but was likely a bug that has been fixed in 4.19 (I've seen a reference to a similar type of bug).
If all this speculation is relavent, one solution might be keep the 4.14 device tree definition and change the devm_ioremap_resource function to devm_ioremap (also in /lib/devres.c - I'll need to work out how to do the function args for this) ref.
It will be 2-3 weeks before I'll have chance to try this... unless someone else wants to have a go in the interim.
tried several variants splitting 0x900000 to 0x904000 between gcc and tsens - all would not boot. It seems gcc has a minimum size but I don't know what that is. I do know (0x4000-0x3680) = 0x980 is too small.
The "range" device tree property has functionality that seems relevant to allocating a memory range but I suspect that it won't work here; however, I have not tried it.
Several questions I don't have the answer for:
the kernel can address memory by logical, virtual, of physical addresses. Are the addresses in the device tree physical address?
Is some firmware code secretly using (physical) addresses 0x904000+?
Is there another memory location that can be safely utilized? (I tried having tsens use 0x42000000+0x3680 but that would not boot.)
If I do request more memory like I did from 0x904000 + 0x3680 do I have to adjust some other parameter in the device tree when I do that?
@anon50098793 has pointed out that the dev structures have changed (increased and need more memory?). Is there now more than one issue related to sharing the 0x900000+0x4000 memory region between gcc and tsens?
Sorry for all the questions, I don't expect you or others to answer them, I'm just admitting what I know that I don't know.
I've had 17+ days up (including the temperature sensor patch mentioned above); however, I've also had a few unexplained crashes (one where the 5 ghz radio quit working). I've not been paying close attention to why but I continue to suspect the thermal sensor patch.
I tried updating my source today and ran into multiple (likely unrelated to my patches) build failures. One failure was PF_RING kernal module - easy enough to de-select as I don't use this. However, ath10k-ct-htt complains about wanting to install the board.bin file but can't do due to conflicts from ath10k-ct (non htt).
In sort, it seems there is little or no ipq806x progress (especially related to kernal 4.19) so I've so set this aside for now and will use 4.14.
Thermal sensor update: in short, I found another way to get them working without any hint of strange boot or wireless issues.
Base on the speculation above, I made the following changes to init_common in drivers/thermal/qcom/tsense_common.c:
int __init init_common(struct tsens_device *tmdev)
{
resource_size_t size;
void __iomem *base;
struct resource *res;
struct platform_device *op = of_find_device_by_node(tmdev->dev->of_node\
);
if (!op)
return -EINVAL;
/* The driver only uses the TM register address space for now */
if (op->num_resources > 1) {
tmdev->tm_offset = 0;
} else {
/* old DTs where SROT and TM were in a contiguous 2K block */
tmdev->tm_offset = 0x1000;
}
res = platform_get_resource(op, IORESOURCE_MEM, 0);
size = resource_size(res);
base = devm_ioremap(&op->dev, res->start, size);
//base = devm_ioremap_resource(&op->dev, res);
if (IS_ERR(base))
return PTR_ERR(base);
tmdev->map = devm_regmap_init_mmio(tmdev->dev, base, &tsens_config);
if (IS_ERR(tmdev->map))
return PTR_ERR(tmdev->map);
return 0;
}
and this works (no changes made to dts(i) files). No non-reproducable boot issues or wireless failures - it just works (so far). I'll test for a week or two and report back then.
I'll try to post a patch to my git tree, but I'd prefer if non r7500v2 users not use my branch to test this change for now. If you'd like to try it, obviously go right ahead, just make the change "by hand" yourself and please report back how it turned out.
nice work!... will test when i get a chance but definitely looks like you nailed it in the guts... should be much easier to backpedal, to any struct issues etc. later.... now that the root is found.
as in the apq8064 definition the register that control the cpuidle are called "generic 806x" i wonder if it's better to also patch the driver and adds the compatible in the cpuidle driver
I mean it's strange to see apq8064 in ipq8064 dtsi.
I mean as we need to add a patch for this... patch also the driver while we are there.
I could be wrong but the conclusion I came to after reading about dts etiquette and conventions (see here under " Understanding the compatible Property") is that it should be left as apq.
glad you tested it... but I'm under the impression @anon50098793 tested it above. Perhaps not, and the warnings above are valid.
r7500v2 is IPQ8064
r7800 is IPQ8065
I'm running an image built off the patch set here. So as far as cpuidle goes only change should be in dtsi (but my patch set makes other changes for usb).