IPQ4019 xhci-hcd host died - modem crashing?

I am testing a Unielec U4019-01 and running into some crashes. I am testing with a Sierra Wireless EM7565 in conjunction with ModemManager, tested in MBIM (1,3,1009) and QMI (1,3,10D) mode. I am using a NGFF to mPCIe adaptor. The rest of this post is my logs, and list of things I have tried.

Are any issues with USB3 on IPQ40xx?

In QMI Mode
After an arbitrarily long amount of time I will start to see the following errors:

Sun Oct 27 05:53:56 2019 kern.warn kernel: [  590.528176] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 05:56:42 2019 kern.warn kernel: [  756.749648] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37

Eventually the xhci-hcd driver will crash, and the system will need to be rebooted.

Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4372.964719] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4372.983113] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4373.002448] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4373.021859] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.err kernel: [ 4373.024543] qmi_wwan 1-1:1.8: nonzero urb status received: -71
Sun Oct 27 06:56:58 2019 kern.err kernel: [ 4373.027175] qmi_wwan 1-1:1.8: wdm_int_callback - 0 bytes
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4373.049578] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4373.068801] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4373.088147] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37
Sun Oct 27 06:56:58 2019 kern.warn kernel: [ 4373.107536] xhci-hcd xhci-hcd.0.auto: ERROR unknown event type 37

In MBIM Mode
With MBIM mode I do not get the "unknown event type 37" error, but I will get the following crash:

Sat Oct 26 11:06:24 2019 kern.err kernel: [ 5405.878356] cdc_mbim 1-1:1.12: Tx URB error: -19
Sat Oct 26 11:06:26 2019 kern.warn kernel: [ 5407.782904] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
Sat Oct 26 11:06:26 2019 kern.warn kernel: [ 5407.799267] xhci-hcd xhci-hcd.0.auto: Host halt failed, -110
Sat Oct 26 11:06:26 2019 kern.err kernel: [ 5407.799498] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
Sat Oct 26 11:06:26 2019 kern.err kernel: [ 5407.804779] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up

It seems to be correlated with stress testing the device. Any suggestions on what to check?

Here is the full kernel log from an MBIM mode crash:

Sat Oct 26 11:06:20 2019 daemon.warn [8652]: <warn>  Reloading stats failed: Transaction timed out
Sat Oct 26 11:06:20 2019 daemon.warn [8652]: <warn>  Couldn't load extended signal information: Transaction timed out
Sat Oct 26 11:06:21 2019 kern.info kernel: [ 5402.778911] usb 1-1: USB disconnect, device number 2
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.832709] ------------[ cut here ]------------
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.832761] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:466 dev_watchdog+0x16c/0x248
Sat Oct 26 11:06:21 2019 kern.info kernel: [ 5402.836404] NETDEV WATCHDOG: wwan0 (cdc_mbim): transmit queue 0 timed out
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.844507] Modules linked in: xt_connlimit qcserial pppoe ppp_async option nf_conncount iptable_nat ipt_MASQUERADE cdc_mbim ath10k_pci ath10k_core ath xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_FLOWOFFLOAD xt_CT usb_wwan usb_serial_simple sierra_net sierra qmi_wwan pppox ppp_generic nf_nat_ipv4 nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack mac80211 ipt_REJECT cfg80211 cdc_ncm cdc_ether xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY xr_usb_serial_common wireguard usbserial usbnet ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda slhc nf_reject_ipv4 nf_log_ipv4
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.900486]  nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hwmon crc_ccitt compat cdc_wdm xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ip6_udp_tunnel udp_tunnel usb_storage leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom ehci_pci ehci_platform sd_mod scsi_mod ehci_hcd gpio_button_hotplug mii
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.962805] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W         4.19.79 #0
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.985014] Hardware name: Generic DT based system
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.992413] [<c030e310>] (unwind_backtrace) from [<c030ab70>] (show_stack+0x10/0x14)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5402.997006] [<c030ab70>] (show_stack) from [<c0783a5c>] (dump_stack+0x88/0x9c)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.004906] [<c0783a5c>] (dump_stack) from [<c031bad0>] (__warn+0xd4/0xf0)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.011934] [<c031bad0>] (__warn) from [<c031b778>] (warn_slowpath_fmt+0x38/0x48)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.018795] [<c031b778>] (warn_slowpath_fmt) from [<c06ac288>] (dev_watchdog+0x16c/0x248)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.026348] [<c06ac288>] (dev_watchdog) from [<c0371eb8>] (call_timer_fn+0x24/0x90)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.034506] [<c0371eb8>] (call_timer_fn) from [<c03725a4>] (run_timer_softirq+0x1ac/0x418)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.041972] [<c03725a4>] (run_timer_softirq) from [<c03021c8>] (__do_softirq+0xe0/0x25c)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.050302] [<c03021c8>] (__do_softirq) from [<c031f9c8>] (irq_exit+0x9c/0xfc)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.058550] [<c031f9c8>] (irq_exit) from [<c035f7a4>] (__handle_domain_irq+0x9c/0xac)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.065583] [<c035f7a4>] (__handle_domain_irq) from [<c0538ec0>] (gic_handle_irq+0x5c/0x90)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.073482] [<c0538ec0>] (gic_handle_irq) from [<c0301a0c>] (__irq_svc+0x6c/0x90)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.081631] Exception stack(0xdf473f68 to 0xdf473fb0)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.089271] 3f60:                   00000000 001b3ea0 dfb532c4 c0313ac0 ffffe000 c0b04c6c
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.094310] 3f80: 00000004 c0b04cac c0a3bda8 00000000 00000000 00000000 c0b0d740 df473fb8
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.102465] 3fa0: c0308358 c030835c 60000013 ffffffff
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.110631] [<c0301a0c>] (__irq_svc) from [<c030835c>] (arch_cpu_idle+0x34/0x38)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.115672] [<c030835c>] (arch_cpu_idle) from [<c0341a20>] (do_idle+0xf0/0x200)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.123134] [<c0341a20>] (do_idle) from [<c0341d8c>] (cpu_startup_entry+0x18/0x1c)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.130161] [<c0341d8c>] (cpu_startup_entry) from [<803023ec>] (0x803023ec)
Sat Oct 26 11:06:21 2019 kern.warn kernel: [ 5403.137828] ---[ end trace 1916688659126e01 ]---
Sat Oct 26 11:06:23 2019 daemon.warn [8652]: <warn>  Couldn't load extended signal information: Transaction timed out
Sat Oct 26 11:06:24 2019 kern.err kernel: [ 5405.878356] cdc_mbim 1-1:1.12: Tx URB error: -19
Sat Oct 26 11:06:26 2019 kern.warn kernel: [ 5407.782904] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
Sat Oct 26 11:06:26 2019 kern.warn kernel: [ 5407.799267] xhci-hcd xhci-hcd.0.auto: Host halt failed, -110
Sat Oct 26 11:06:26 2019 kern.err kernel: [ 5407.799498] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
Sat Oct 26 11:06:26 2019 kern.err kernel: [ 5407.804779] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.815149] qcserial ttyUSB0: Qualcomm USB modem converter now disconnected from ttyUSB0
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.818541] qcserial 1-1:1.0: device disconnected
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.831180] qcserial ttyUSB1: Qualcomm USB modem converter now disconnected from ttyUSB1
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.832591] qcserial 1-1:1.2: device disconnected
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.844663] qcserial ttyUSB2: Qualcomm USB modem converter now disconnected from ttyUSB2
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.846265] qcserial 1-1:1.3: device disconnected
Sat Oct 26 11:06:26 2019 kern.info kernel: [ 5407.855899] cdc_mbim 1-1:1.12 wwan0: unregister 'cdc_mbim' usb-xhci-hcd.0.auto-1, CDC MBIM
Sat Oct 26 11:06:26 2019 daemon.notice netifd: Network device 'wwan0' link is down
Sat Oct 26 11:06:26 2019 kern.warn kernel: [ 5407.866783] failed to kill vid 0081/0 for device wwan0
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: <warn>  Couldn't load extended signal information: Transaction timed out
Sat Oct 26 11:06:26 2019 user.notice ModemManager: hotplug: remove serial interface ttyUSB0: event processed
Sat Oct 26 11:06:26 2019 user.notice ModemManager: hotplug: event reported: action=remove, name=ttyUSB0, subsystem=tty
Sat Oct 26 11:06:26 2019 daemon.debug [8652]: [/dev/cdc-wdm0] unexpected port hangup!
Sat Oct 26 11:06:26 2019 daemon.debug [8652]: [/dev/cdc-wdm0] channel destroyed
Sat Oct 26 11:06:26 2019 daemon.info [8652]: <info>  Connection to mbim-proxy for /dev/cdc-wdm0 lost, reprobing
Sat Oct 26 11:06:26 2019 daemon.info [8652]: <info>  [device /sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1] creating modem with plugin 'Sierra' and '5' ports
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: <warn>  Could not grab port (tty/ttyUSB1): 'Cannot add port 'tty/ttyUSB1', unhandled serial type'
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: <warn>  Could not grab port (tty/ttyUSB0): 'Cannot add port 'tty/ttyUSB0', unhandled serial type'
Sat Oct 26 11:06:26 2019 daemon.debug [8652]: [/dev/cdc-wdm0] unexpected port hangup!
Sat Oct 26 11:06:26 2019 daemon.debug [8652]: [/dev/cdc-wdm0] channel destroyed
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: [/dev/cdc-wdm0] QMI client for service 'pdc' with CID '3' wasn't released
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: [/dev/cdc-wdm0] QMI client for service 'nas' with CID '4' wasn't released
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: [/dev/cdc-wdm0] QMI client for service 'dms' with CID '3' wasn't released
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: [/dev/cdc-wdm0] QMI client for service 'loc' with CID '3' wasn't released
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: <warn>  Couldn't start initialization: Couldn't query file info: Error when getting information for file “/dev/cdc-wdm0”: No such file or directory
Sat Oct 26 11:06:26 2019 daemon.warn [8652]: <warn>  couldn't initialize the modem: 'Modem is unusable, cannot fully initialize'
Sat Oct 26 11:06:26 2019 user.notice ModemManager: hotplug: cleanup wait for modem at sysfs path /sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1
Sat Oct 26 11:06:26 2019 daemon.info [8652]: <info>  (tty/ttyUSB0): released by device '/sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1'
Sat Oct 26 11:06:26 2019 user.notice ModemManager: hotplug: remove serial interface ttyUSB1: event processed
Sat Oct 26 11:06:26 2019 user.notice ModemManager: hotplug: event reported: action=remove, name=ttyUSB1, subsystem=tty
Sat Oct 26 11:06:26 2019 user.notice ModemManager: hotplug: cleanup wait for modem at sysfs path /sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1
Sat Oct 26 11:06:26 2019 daemon.info [8652]: <info>  (tty/ttyUSB1): released by device '/sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1'
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: remove serial interface ttyUSB2: event processed
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: event reported: action=remove, name=ttyUSB2, subsystem=tty
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: cleanup wait for modem at sysfs path /sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1
Sat Oct 26 11:06:27 2019 daemon.info [8652]: <info>  (tty/ttyUSB2): released by device '/sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1'
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: remove network interface wwan0: event processed
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: event reported: action=remove, name=wwan0, subsystem=net
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: cleanup wait for modem at sysfs path /sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1
Sat Oct 26 11:06:27 2019 daemon.info [8652]: <info>  (net/wwan0): released by device '/sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1'
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: remove cdc interface cdc-wdm0: custom event processed
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: event reported: action=remove, name=cdc-wdm0, subsystem=usbmisc
Sat Oct 26 11:06:27 2019 user.notice ModemManager: hotplug: cleanup wait for modem at sysfs path /sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1
Sat Oct 26 11:06:27 2019 daemon.info [8652]: <info>  (usbmisc/cdc-wdm0): released by device '/sys/devices/platform/soc/8af8800.usb3/8a00000.dwc3/xhci-hcd.0.auto/usb1/1-1'
Sat Oct 26 11:06:29 2019 daemon.warn [8652]: <warn>  Couldn't load extended signal information: Transaction timed out

What I have tried so far
I have attempted to access the modem over USB2 mode, but this is a little difficult as the mPCIe slot is connected to the SoC's USB3 pins. I may attempt to tape over the USB 3.0 pins on the adaptor or modem and test again. The following list is what I have done so far and the result:

  • Setting usb-superspeed to "disabled" in the .dts file
    • modem is never detected
  • unloading xhci_pci driver
    • no difference
  • Using usb3_disable kernel command line
    • modem is never detected
  • compiling without usb3 kernel module
    • modem is never detected
  • upgrade to kernel 4.19.80
    • no change
  • changing SMP affinity for the USB 3 device to another core smp_affinity_list set to 2-3
    • no change
  • Using Gobiserial and cdc_mbim driver
    • so far so good after 1 day

Crashed after 8 days, on a second device using qcserial and cdc_mbim drivers, not the gobiserial driver.

[566389.792034] usb 1-1: USB disconnect, device number 2
[566394.603851] cdc_mbim 1-1:1.12: Tx URB error: -19
[566394.829730] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
[566394.846083] xhci-hcd xhci-hcd.0.auto: Host halt failed, -110
[566394.846315] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
[566394.851573] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
[566394.862108] qcserial ttyUSB0: Qualcomm USB modem converter now disconnected from ttyUSB0
[566394.865473] qcserial 1-1:1.0: device disconnected
[566394.880516] qcserial ttyUSB1: Qualcomm USB modem converter now disconnected from ttyUSB1
[566394.881892] qcserial 1-1:1.3: device disconnected
[566394.892850] cdc_mbim 1-1:1.12 wwan0: unregister 'cdc_mbim' usb-xhci-hcd.0.auto-1, CDC MBIM
[566394.897722] failed to kill vid 0081/0 for device wwan0

Crashed with cdc_mbim and gobiserial

[83201.380250] usb 1-1: USB disconnect, device number 2
[83206.417927] xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command.
[83206.434273] xhci-hcd xhci-hcd.0.auto: Host halt failed, -110
[83206.434505] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
[83206.439121] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
[83206.447154] GobiSerial driver ttyUSB0: GobiSerial converter now disconnected from ttyUSB0
[83206.452528] gobiserial 1-1:1.0: device disconnected
[83206.461258] GobiSerial driver ttyUSB1: GobiSerial converter now disconnected from ttyUSB1
[83206.465390] gobiserial 1-1:1.3: device disconnected
[83206.473888] cdc_mbim 1-1:1.12 wwan0: unregister 'cdc_mbim' usb-xhci-hcd.0.auto-1, CDC MBIM
[83206.480815] failed to kill vid 0081/0 for device wwan0

This made me curious about the NGFF to mPCIe adaptor. Is this a special adapter intended for mPCIe slots with the USB3 SS pairs, or just an arbitrary one intended for mPCIe slots with the usuaul PCIe pairs connected to those pins?

Note the the EM7365 supports PCIe connections as well as USB2 and USB3, in theory at least. So a normal NGFF-adapter will likely cause problems by connecting the host USB3 pins to the modem PCIe pins

Did you try this? If the problem is what I suspect, then taping over the USB3 pins will fix it. I believe the USB2 pins should be connected. Looking at pictures of the U4019 shows 3 differential pairs leading to the mPCIe slot.

It would be really nice to be able to use USB3 with this router though. I don't understand why they made it with a mPCIe slot instead of an m.2 slot. It would have been so much more useful with m.2. I guess mPCIe modems like the MC7455 will work fine with USB3 here. But those are old and rare, and not as fast as the EM7565.

Hey Bjørn, thanks for your input. I have not tried taping it yet as it seems the modem is running in High Speed mode already, and I think it's because the adaptor is USB 2.0 only. But at this stage I better give it a go and see what happens.

Here are photos of the adaptor I am currently using


I have also tested with these adaptors

And here's the AT command on the modem

at!usbspeed?
SUPPORTED:Super-Speed
CURRENT  :High-Speed

Thanks. That looks good. The PCIe pins of the mini-PCIe connector are not connected, so taping won't make any difference. AFAICT this adapter should work fine, although in USB2 - High Speed - mode only.

So what could be causing it to crash? It's very irregular. Sometimes it will go for a week or longer before crashing, sometimes it crashes multiple times per evening. I have contacted Sierra Wireless support and they want me to use the GobiSerial and GobiNet drivers, but I want to use ModemManager, and using those drivers forces PPP mode.

It just crashed again, here's a log with MM debug output

https://pastebin.com/9jjFV0A5

well, this shows

xHCI host controller not responding, assume dead

again, so looking at ModemManager logs is pointless. There's nothing ModemManager, USB device drivers, or the USB device can do about this. It's a software or hardware problem related to the host controller.

I assume the problem might be triggered by cabling/slot/adapter issues, but confirming this is hard. I'd start by trying to verify the mini-PCIe slot with another device. Do you have an older modem you can test the slot with? Or maybe one of those laptop wlan cards with Bluetooth? The Bluetooth controller connects by USB2. You'd obviously need to mask off the USB3 SS/PCIe pins with tape for this test. The point is to try to eliminate issues related to the adapter and/or EM7565 (I find the latter unlikely here).

But unless there are unexpected issues with the adapter, then this will end up pointing to a hardware problem or a bug in the xhci host controller driver. Is this a platform xhci host?

I have an MC7430, I'll give that a go with masked pins.
Update - I currently have it running in Super Speed mode, is it necessary to mask the pins?

at!usbspeed?
SUPPORTED:Super-Speed
CURRENT  :Super-Speed

The point is to try to eliminate issues related to the adapter and/or EM7565 (I find the latter unlikely here).

Well, hearing that coming from you is a bit of relief. I can also pop the EM7565 into another board that I know is working fine with an MC7430 to validate it. I guess that would narrow it down to board, adaptor, or maybe xhci_hcd, though I feel that's as unlikely as a problem with the modem.

Is this a platform xhci host?

Unsure what you mean.

Probably not. You may want to repeat the test with the pins masked to eliminate issues related to the USB2 pair only, but I don't find that very likely. BTW: Great to see that this does indeed work in SuperSpeed mode with the MC74xx modems.

I was wondering if the router was using some common PCI connected USB3 controller, or if the USB3 controller is part of the SoC. There is a much higher probablility of driver bugs or missing hardware quirks in the latter case, simply based on the code exposure.

It's the one in the IPQ4019.

Right, I see. So that's a qcom,dwc3 block with Qualcomm phys. And the driver in OpenWrt is based on some SDK driver, using a number of magic values. Not much help there. Makes me wonder if the drivers/phy/qualcomm/phy-qcom-dwc3.c used by ipq806x wouldn't be a better fit? But I don't know... At least it documents what all those control bits do.

Hey Bjørn,

Over a year later, with a different modem (EM12-G) but same target (IPQ4019) I am seeing something that resembles this issue. The device is using kmod-usb-dwc3 and kmod-usb-dwc3-qcom, OpenWrt master branch. Are you suggesting that something is wrong with these drivers and I need to change them? It looks like some work has been done here in 5.10, could switching to 5.10 help?

Here's a recent crash:

[510237.193479] ------------[ cut here ]------------
[510237.193526] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:448 dev_watchdog+0x26c/0x270
[510237.193537] NETDEV WATCHDOG: wwan0 (qmi_wwan): transmit queue 0 timed out
[510237.193551] Modules linked in: xt_connlimit pppoe ppp_async option nf_conncount iptable_nat cdc_mbim ath10k_pci ath10k_core ath xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT usb_wwan qmi_wwan pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack_netlink nf_conntrack mac80211 ipt_REJECT cfg80211 cdc_ncm xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY wireguard usbserial usbnet slhc sch_cake nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hwmon crc_ccitt compat cdc_wdm sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred ledtrig_oneshot ledtrig_heartbeat ledtrig_gpio xt_set ip_set_list_set ip_set_hash_netportnet
[510237.193746]  ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb ip6_udp_tunnel udp_tunnel tun leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom gpio_button_hotplug mii
[510237.193864] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.87 #0
[510237.193869] Hardware name: Generic DT based system
[510237.193894] [<c030e524>] (unwind_backtrace) from [<c030afb0>] (show_stack+0x10/0x14)
[510237.193910] [<c030afb0>] (show_stack) from [<c08505f8>] (dump_stack+0x94/0xa8)
[510237.193926] [<c08505f8>] (dump_stack) from [<c0321404>] (__warn+0xbc/0xd8)
[510237.193937] [<c0321404>] (__warn) from [<c03214a4>] (warn_slowpath_fmt+0x84/0x94)
[510237.193951] [<c03214a4>] (warn_slowpath_fmt) from [<c0756a88>] (dev_watchdog+0x26c/0x270)
[510237.193967] [<c0756a88>] (dev_watchdog) from [<c037e830>] (call_timer_fn.constprop.8+0x24/0x98)
[510237.193980] [<c037e830>] (call_timer_fn.constprop.8) from [<c037eee0>] (run_timer_softirq+0x3fc/0x428)
[510237.193995] [<c037eee0>] (run_timer_softirq) from [<c0302288>] (__do_softirq+0x120/0x2b0)
[510237.194009] [<c0302288>] (__do_softirq) from [<c0325424>] (irq_exit+0xb0/0xd4)
[510237.194022] [<c0325424>] (irq_exit) from [<c0369da0>] (__handle_domain_irq+0x60/0xb4)
[510237.194035] [<c0369da0>] (__handle_domain_irq) from [<c05a19b4>] (gic_handle_irq+0x4c/0x90)
[510237.194047] [<c05a19b4>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[510237.194054] Exception stack(0xc0c01f28 to 0xc0c01f70)
[510237.194065] 1f20:                   00000000 5f0e2588 dfb381f4 c0313c00 ffffe000 c0c04e68
[510237.194077] 1f40: c0c04ea8 00000001 00000000 c0b3ede8 c0c0dc14 00000000 c0c04f14 c0c01f78
[510237.194085] 1f60: c03083d8 c03083dc 60000013 ffffffff
[510237.194100] [<c0301a8c>] (__irq_svc) from [<c03083dc>] (arch_cpu_idle+0x38/0x3c)
[510237.194115] [<c03083dc>] (arch_cpu_idle) from [<c0349868>] (do_idle+0x1e4/0x21c)
[510237.194128] [<c0349868>] (do_idle) from [<c0349b30>] (cpu_startup_entry+0x18/0x1c)
[510237.194143] [<c0349b30>] (cpu_startup_entry) from [<c0b00df4>] (start_kernel+0x488/0x498)
[510237.194157] ---[ end trace 4fb34ea0df558a4a ]---


Not sure about this at all, but testing new kernel versions if there has been work on the involved code is always good.

Wrt the warning you post: This can be a symptom of serious issues. But it doesn't by itself imply an error. It means literally what it says: transmit queue timed out. This is something that can happen on radio networks without anything being wrong. qmi_wwan use the usbnet default timeout of 5 seconds. This works most of the time, but sometimes there will be a stall lasting longer.

The warning will also appear if there is an error casuing TX to time out. This was the case earlier in this thread. But then there will usually be other error messsages telling us what the real issue is. Do you see anything else in the logs? Is the modem completely dead when this happens, or are you only worried about the warning?

Thanks, yes the modem is completely dead after this happens. The only thing I have found, apart from a reboot, that gets the device back into working shape is removing and reloading the xhci_hcd, xhci_pci, xhci_plat_hcd modules (in the correct order). The LTE signal has been rather good when this happens (20dB SNR, -78dBm RSRP, -7dBm RSRQ -- something like that). I am thinking it could caused by either the modem firmware, or, (probably more likely since it happened to the Sierra Wireless one too) something related to the USB host controller.

I wonder if anyone else has had problems with IPQ4019 USB 3.0?

I had a similar problem here with InHand Networks VG710 and Quectel EM06-E modem. xHCI controller was dying when mmcli -m a -r was called, the workaround is to enable some USB quirks:

		dwc3@8a00000 {
			compatible = "snps,dwc3";
			reg = <0x8a00000 0xf8000>;
			interrupts = <0x00 0x84 0x04>;
			phys = <0x22 0x23>;
			phy-names = "usb2-phy\0usb3-phy";
			dr_mode = "host";
			snps,dis_u2_susphy_quirk;
			snps,dis_u3_susphy_quirk;
			snps,parkmode-disable-hs-quirk;
			snps,parkmode-disable-ss-quirk;
			snps,dis_enblslpm_quirk;
		};

		dwc3@6000000 {
			compatible = "snps,dwc3";
			reg = <0x6000000 0xf8000>;
			interrupts = <0x00 0x88 0x04>;
			phys = <0x24>;
			phy-names = "usb2-phy";
			dr_mode = "host";
			snps,dis_u2_susphy_quirk;
			snps,dis_u3_susphy_quirk;
			snps,parkmode-disable-hs-quirk;
			snps,parkmode-disable-ss-quirk;
			snps,dis_enblslpm_quirk;
		};

I also got RCU lockups in __irq_svc:

[  140.704215] rcu: INFO: rcu_sched self-detected stall on CPU
[  140.704645] rcu:     0-....: (2157 ticks this GP) idle=8ba/1/0x40000002 softirq=6468/6469 fqs=853
[  140.708903]  (t=2100 jiffies g=8105 q=3076)
[  140.717534] NMI backtrace for cpu 0
[  140.721745] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.10.120 #0
[  140.725173] Hardware name: Generic DT based system
[  140.731956] [<c030e3d4>] (unwind_backtrace) from [<c030ad50>] (show_stack+0x10/0x14)
[  140.736507] [<c030ad50>] (show_stack) from [<c06257cc>] (dump_stack+0x94/0xa8)
[  140.744394] [<c06257cc>] (dump_stack) from [<c062d508>] (nmi_cpu_backtrace+0xc8/0xf4)
[  140.751428] [<c062d508>] (nmi_cpu_backtrace) from [<c062d638>] (nmi_trigger_cpumask_backtrace+0x104/0x13c)
[  140.759337] [<c062d638>] (nmi_trigger_cpumask_backtrace) from [<c03817f8>] (rcu_dump_cpu_stacks+0xe4/0x10c)
[  140.768897] [<c03817f8>] (rcu_dump_cpu_stacks) from [<c0386e3c>] (rcu_sched_clock_irq+0x6e4/0x8ac)
[  140.778562] [<c0386e3c>] (rcu_sched_clock_irq) from [<c038d844>] (update_process_times+0x5c/0x84)
[  140.787593] [<c038d844>] (update_process_times) from [<c039e788>] (tick_sched_timer+0x7c/0x118)
[  140.796747] [<c039e788>] (tick_sched_timer) from [<c038dd7c>] (__hrtimer_run_queues+0x15c/0x218)
[  140.805132] [<c038dd7c>] (__hrtimer_run_queues) from [<c038ec0c>] (hrtimer_interrupt+0x118/0x334)
[  140.814162] [<c038ec0c>] (hrtimer_interrupt) from [<c07c959c>] (arch_timer_handler_virt+0x28/0x30)
[  140.822946] [<c07c959c>] (arch_timer_handler_virt) from [<c03780a4>] (handle_percpu_devid_irq+0x78/0x148)
[  140.831872] [<c03780a4>] (handle_percpu_devid_irq) from [<c0372430>] (__handle_domain_irq+0x84/0xd8)
[  140.841413] [<c0372430>] (__handle_domain_irq) from [<c063d514>] (gic_handle_irq+0x80/0x94)
[  140.850603] [<c063d514>] (gic_handle_irq) from [<c0300b4c>] (__irq_svc+0x6c/0x90)
[  140.858568] Exception stack(0xc1091e20 to 0xc1091e68)
[  140.866265] 1e20: c4596d80 c4597000 00000040 0000003f ef795918 00001000 00000000 ffffffb9
[  140.871225] 1e40: 00000000 00000040 00000100 c1091f08 38e38e39 c1091e70 c0313590 c03170a4
[  140.879292] 1e60: 80000013 ffffffff
[  140.887576] [<c0300b4c>] (__irq_svc) from [<c03170a4>] (v7_dma_inv_range+0x3c/0x50)
[  140.890883] [<c03170a4>] (v7_dma_inv_range) from [<c0313590>] (__dma_page_dev_to_cpu+0x2c/0xc4)
[  140.898529] [<c0313590>] (__dma_page_dev_to_cpu) from [<c074fb54>] (usb_hcd_unmap_urb_for_dma+0xbc/0x124)
[  140.907182] [<c074fb54>] (usb_hcd_unmap_urb_for_dma) from [<c074fca8>] (__usb_hcd_giveback_urb+0xec/0x138)
[  140.916895] [<c074fca8>] (__usb_hcd_giveback_urb) from [<c0750110>] (usb_giveback_urb_bh+0x9c/0xec)
[  140.926459] [<c0750110>] (usb_giveback_urb_bh) from [<c03297e4>] (tasklet_action_common.constprop.0+0x70/0xf8)
[  140.935408] [<c03297e4>] (tasklet_action_common.constprop.0) from [<c030132c>] (__do_softirq+0x104/0x29c)
[  140.945927] [<c030132c>] (__do_softirq) from [<c0329a80>] (run_ksoftirqd+0x2c/0x38)
[  140.955245] [<c0329a80>] (run_ksoftirqd) from [<c0348eb0>] (smpboot_thread_fn+0xd0/0x18c)
[  140.963000] [<c0348eb0>] (smpboot_thread_fn) from [<c0343c40>] (kthread+0x148/0x14c)
[  140.971303] [<c0343c40>] (kthread) from [<c0300128>] (ret_from_fork+0x14/0x2c)
[  140.978697] Exception stack(0xc1091fb0 to 0xc1091ff8)
[  140.985741] 1fa0:                                     00000000 00000000 00000000 00000000
[  140.991232] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  140.999075] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000

The workaround for this is to enable performance scaling governor:

echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

I think the lockup is caused by an interrupt storm coming from the modem, I don't know if it's HW or SW issue though.

1 Like