Ok, so this SOC uses the GICv2m and things are not that simple like revert the commit then... ![]()
Regards.
Ok, so this SOC uses the GICv2m and things are not that simple like revert the commit then... ![]()
Regards.
Did some debugs and this is the chatgpt explanation:
Regards.
Hi,
We are seeing a regression introduced in Linux 6.12.58 affecting PCIe
devices using GICv2m MSI on a Qualcomm (arm64) platform.
The issue was bisected to:
irqchip/gic-v2m: Handle Multiple MSI base IRQ Alignment
commit 2ef3886ce626dcdab0cbc452dbbebc19f57133d8
Hardware:
- Qualcomm platform (arm64, GICv2m MSI)
- PCIe modem: SDX55 (MHI-based)
Symptoms:
- Modem reaches MHI M0 / AMSS / READY
- Channels are created (mbim, at, qcdm)
- But channel START/RESET command completions fail:
"Failed to receive START channel command completion"
- Device becomes unusable
Investigation:
Instrumenting gicv2m_irq_domain_alloc() shows different MSI base
allocation between working and broken kernels:
Linux 6.12.57 (working):
spi_start=448 nr_spis=32 nr_irqs=5
offset=8 → spi=456
Linux 6.12.58 (broken):
spi_start=448 nr_spis=32 nr_irqs=5
align_mask=4 align_off=0
offset=1 → spi=449
So the MSI base moves from 456 → 449.
This change correlates directly with the regression:
- 456 → modem works
- 449 → modem fails (missing command completions)
Additional observations:
- IRQs are received, but MSI distribution appears incorrect
- Some MSI vectors remain unused or misrouted
- Reverting the above commit restores correct operation
Analysis:
The new alignment logic uses:
align_mask = nr_irqs - 1
However, for PCI MSI, vector encoding relies on lower bits and
effectively requires power-of-two alignment (as handled previously via
get_count_order(nr_irqs)).
In this case, nr_irqs=5 (non power-of-two), and the new alignment leads
to a base (449) that breaks the device.
The previous allocation (offset=8 → spi=456) corresponds to a power-of-two
aligned region and works correctly.
Conclusion:
The new alignment logic appears incorrect or insufficient when
nr_irqs is not a power-of-two, at least for PCIe devices using GICv2m.
Workaround:
Reverting the commit fixes the issue.
Potential fix direction:
Use power-of-two alignment:
nvec = roundup_pow_of_two(nr_irqs)
align_mask = nvec - 1
instead of aligning directly on nr_irqs.
---
Please let me know if additional logs or testing are needed.
Thanks
I tried this small patch over 6.12.74, and works fine. I think it would easier to push it upstream instead of reverting the previous one, as we're just fixing an alignment issue.
@mkrle what's your view on this?
--- a/drivers/irqchip/irq-gic-v2m.c 2026-03-20 09:45:22.170192561 +0100
+++ b/drivers/irqchip/irq-gic-v2m.c 2026-03-20 09:45:26.284210783 +0100
@@ -158,7 +158,7 @@
struct v2m_data *v2m = NULL, *tmp;
int hwirq, i, err = 0;
unsigned long offset;
- unsigned long align_mask = nr_irqs - 1;
+ unsigned long align_mask = roundup_pow_of_two(nr_irqs) - 1;
spin_lock(&v2m_lock);
list_for_each_entry(tmp, &v2m_nodes, entry) {
Tested and modem is detected and accessible; IRQ are asigned as before the commit:
50: 1 0 0 0 GICv2m-PCI-MSI-0000:01:00.0 0 Edge bhi
51: 5 0 0 0 GICv2m-PCI-MSI-0000:01:00.0 1 Edge mhi
52: 0 0 0 0 GICv2m-PCI-MSI-0000:01:00.0 2 Edge mhi
53: 0 0 0 0 GICv2m-PCI-MSI-0000:01:00.0 3 Edge mhi
54: 0 0 0 0 GICv2m-PCI-MSI-0000:01:00.0 4 Edge mhi
The question is if this pach affects other plataforms if requested nr_irqs != 5.
Regards.
I don't understand this process well enough to confidently judge if that's a good fix or not. I think submitting it upstream is a good way to get good feedback at the very least. And if it's accepted, then back-porting it to OpenWrt should be quite smooth.
It's been like 25 years since I've sent a patch to the Linux kernel, so saying "I'm a bit rusty" on the process I don't think is an exaggeration. But I agree, it'll be a great way to get feedback.
Meanwhile, just adding this patch to targets/linux/qualcommax/patches-6.12 in the OpenWrt tree will allow custom builds for anyone willing to try.
It will if there's any other platform with nr_irqs != power_of_two, but as I read the code, I wonder how could it work in those cases. The ~mask resulting from such a number will be kind of weird. I.e. in this case, with nr_irqs = 5, the ~mask will be 0xfffb (clearing only the third bit), while with the patch is 0xfff8 (clearing the last 3 bits), which is much more reasonable.
The original patch seemed to assume log2(nr_irqs) was an integer, which is not the case when nr_irqs != power_of_two. roundup_pow_of_two just fixes this.
I agree it's a hassle to set everything up if you're not doing it on a regular basis (and I've only done it once). If I can help let me know.
Hi
The only missing thing now is a usb host... looking at the OP photos, seems that mainboard has a microusb with OTG configuration (ID pin to GND)... I'll take a try and share the results.
Regards.
Patch submitted. Let's see how it goes...
I've got good feedback from Marc:
"This looks wrong for a bunch of reasons:
you're hacking the allocation path, but not the free path -- what
could possibly go wrong?
nr_irqs not being a power of two to start with is more indicative of
a bug somewhere else in the system. The only case where we allocate
more than a single IRQ at a time is for Multi-MSI, and that is
definitely a power-of-two construct.
I have seen other reports, all concerning QC based HW allocating silly
(aka non Po2) numbers of interrupts for Nulti-MSI devices, and I think
we should instead address the root cause, most likely in the PCI code."
While I don't see why the free path would do anything wrong, I understand the concern about something wrong somewhere else. Why is nr_irqs = 5? Any clue?
@gmtii did you trace where the allocation came from?
hi
This is what Chatgpt said:
spi_start and nr_spis come from hardware (MSI_TYPER) (gicv2m_init_one() in drivers/irqchip/irq-gic-v2m.c)nr_irqs comes from the PCIe device: pci_alloc_irq_vectors(dev, min, max, flags);SDX55 is a multi-msi, so the patch seems adequate for this case...
Regards.
Next try for a patch. I've moved it from the main IRQ tree to the MHI one, so focus in dealing with QC devices.
Works fine in the latest main branch (6.12.77) but I'm not sure if it's good anough to round up the number of irqs in the pci_alloc_irq_vectors call or it should be rounded up in the data in the mhi_cntrl structure (2 lines above). I cannot find anywhere where it should be relevant.
Any ideas?
--- a/drivers/bus/mhi/host/pci_generic.c 2026-02-19 16:29:56.000000000 +0100
+++ b/drivers/bus/mhi/host/pci_generic.c 2026-03-29 13:14:17.053879617 +0200
@@ -1014,7 +1014,7 @@
*/
mhi_cntrl->nr_irqs = 1 + mhi_cntrl_config->num_events;
- nr_vectors = pci_alloc_irq_vectors(pdev, 1, mhi_cntrl->nr_irqs, PCI_IRQ_MSI);
+ nr_vectors = pci_alloc_irq_vectors(pdev, 1, roundup_pow_of_two(mhi_cntrl->nr_irqs), PCI_IRQ_MSI);
if (nr_vectors < 0) {
dev_err(&pdev->dev, "Error allocating MSI vectors %d\n",
nr_vectors);
I had a chat with Gemini and it says this should be handled in the QC pci host driver and not in MHI. I guess it makes some sense, as sdx55 modems are also found in x86 laptops, and they probably don't encounter this issue because x86 code apparently enforces power of two (if I understood Thomas Gleixner correctly).
I know this doesn't help a lot, and could be completely wrong, but I guess it's ok to leave it here.
Well, this is the file where the different events (for the different models) are defined, which end up defining the required number of irqs that are requested to the PCI subsytem, so it looks like the right place to round it up.
I'll give it a shot with the developers and let's see what's their feedback.
It looks like the patch will be integrated upstream but for kernel 7.1, so for the time being, I guess we should add it as a patch in the OpenWRT tree.
Awesome, great job! Do you want to submit the PR or shall I do it?
Please go ahead!
Have a look here:
I'm building it now but I won't be able to test.
Btw, I had some issues extracting the patch from LKML, looks like some extra line breaks got in there. I had to adapt the patch for 6.12 and 6.18 anyway so no worries there, but I hope you won't get push back upstream.
Yep, from 6.18 onwards there's also MSIX. As the power of 2 alignment isn't needed there, maintainer's approach is to leave it as it is right now and see if someone complains about a waste of resources with MSIX.