Sierra Wireless EM7565 LTE-card crashes regularly

Hi,

I am using an EM7565 card for a wwan connection and when it works, it does so really good. But unfortunatelly it crashes every hour or so which makes using the internet a real pain.

Currently I am using mbim, but it was the same with modemmanager.

When it happens there is a kernel.warning in the logs:

Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.687146] ------------[ cut here ]------------
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.691911] WARNING: CPU: 0 PID: 8 at net/sched/sch_generic.c:467 0x805386f8
Thu Mar 23 02:58:17 2023 kern.info kernel: [  206.699110] NETDEV WATCHDOG: wwan0 (cdc_mbim): transmit queue 0 timed out
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.706006] Modules linked in: ath9k ath9k_common qcserial pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet cdc_mbim ath9k_hw ath wireguard usb_wwan sierra_net sierra qmi_wwan pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mac80211 libchacha20poly1305 cfg80211 cdc_ncm cdc_ether usbserial usbnet slhc poly1305_mips nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_ipv6 nf_log_ipv4 nf_log_common nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c crc_ccitt compat chacha_mips cdc_wdm fsl_mph_dr_of ehci_platform ehci_fsl ip6_udp_tunnel udp_tunnel mii sha256_generic libsha256 seqiv jitterentropy_rng drbg kpp hmac cmac ehci_hcd gpio_button_hotplug usbcore nls_base usb_common crc32c_generic
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.794751] CPU: 0 PID: 8 Comm: ksoftirqd/0 Not tainted 5.10.161 #0
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.801122] Stack : 807efc40 800bf594 80800000 806f0f38 00000000 00000000 00000000 00000000
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.809638]         00000000 00000000 00000000 00000000 00000000 00000001 80c47c70 b0327216
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.818149]         80c47d08 00000000 00000000 80c47b18 00000038 80396f44 00000000 ffffffea
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.826644]         000000ee 80c47b24 000000ee 80785aa8 80c47c50 806bf2d0 00000000 805386f8
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.835150]         00000009 00000000 ffffdbb9 807efc54 00000018 803fc700 00000000 80940000
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.843657]         ...
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.846145] Call Trace:
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.846153] [<800bf594>] 0x800bf594
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.852187] [<80396f44>] 0x80396f44
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.855732] [<805386f8>] 0x805386f8
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.859291] [<803fc700>] 0x803fc700
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.862832] [<8006697c>] 0x8006697c
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.866369] [<80066984>] 0x80066984
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.869920] [<8008543c>] 0x8008543c
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.873490] [<805386f8>] 0x805386f8
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.877052] [<80085534>] 0x80085534
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.880597] [<805386f8>] 0x805386f8
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.884147] [<81e5a49c>] 0x81e5a49c [mac80211@2b85b995+0x7fe40]
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.890176] [<8053848c>] 0x8053848c
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.893720] [<800cfe3c>] 0x800cfe3c
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.897280] [<800d0178>] 0x800d0178
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.900832] [<80810000>] 0x80810000
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.904378] [<806ab1f0>] 0x806ab1f0
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.907938] [<80810000>] 0x80810000
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.911497] [<80088754>] 0x80088754
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.915043] [<806a73b8>] 0x806a73b8
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.918607] [<800a840c>] 0x800a840c
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.922161] [<800a2e20>] 0x800a2e20
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.925700] [<806a7678>] 0x806a7678
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.929263] [<800a82c4>] 0x800a82c4
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.932808] [<800a2fac>] 0x800a2fac
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.936347] [<800a2e70>] 0x800a2e70
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.939900] [<800a2e70>] 0x800a2e70
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.943439] [<80062178>] 0x80062178
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.946994]
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.948507] ---[ end trace 3f5b629c6c8ab98e ]---
Thu Mar 23 02:58:17 2023 kern.warn kernel: [  206.958193] ttyS ttyS0: 1 input overrun(s)

And after that, this gets spammed into the logs

Thu Mar 23 03:01:34 2023 kern.err kernel: [  403.685002] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:34 2023 kern.err kernel: [  403.691083] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:36 2023 kern.err kernel: [  405.604990] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:36 2023 kern.err kernel: [  405.611056] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:38 2023 kern.err kernel: [  407.524989] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:38 2023 kern.err kernel: [  407.531061] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:40 2023 kern.err kernel: [  409.444989] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:40 2023 kern.err kernel: [  409.451061] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:42 2023 kern.err kernel: [  411.364993] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:42 2023 kern.err kernel: [  411.371065] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:44 2023 kern.err kernel: [  413.284989] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:44 2023 kern.err kernel: [  413.291057] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:45 2023 kern.err kernel: [  415.204992] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:45 2023 kern.err kernel: [  415.211068] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes
Thu Mar 23 03:01:47 2023 kern.err kernel: [  417.124991] cdc_mbim 1-1:1.12: nonzero urb status received: -71
Thu Mar 23 03:01:47 2023 kern.err kernel: [  417.131073] cdc_mbim 1-1:1.12: wdm_int_callback - 0 bytes

what is going on?

Okay... i think i might be on to something here.

It hung up again, but without something in the log here. And it happend as soon as i started hitting wifi with a lot of traffic.

This might be unrelated to the module, but maybe a power thing?!

That's likely. If you are able to test it with different power options under the same network conditions, then try that

The kernel warning doesn't tell anything. It just means that the modem firmware stopped transmission for too long. Which can be caused by a number of things,including normal network issues.

But the fact that it stops working shows that there's a real error behind it

What exactly do you mean with "different power options"?

Are there different power options for the LTE-module? Or do you mean for Wifi? (dbi setting)

Didn't mean anything magical. Any change in host port power limits will provide useful test data. If the behaviour changes, either for better or for worse, then you can be pretty sure power is the issue.

Different hosts and USB ports will have different characteristics. All high bandwidth 4G and 5G modems exceed the USB spec for short periods, depending on hosts being able to provide higher currents than the spec requires.

Adding or removing a USB extension cable will for example change the current response of the USB port. Adding will obviously make the problem worse, but hopefully let you confirm similar issues only with higher probability.

As for permanent solutions with more power: Some m.2 adapters (if you are using one?) come with additional power supply inputs. There is also great variation in the regulators they use (USB input is 5V, while the m.2 slot VCC is 3.3V). If the adapter is OK but the host port is the problem, then you might be able to work around that with a powered hub. But most of those will have the same current limiters as ordinary host ports, so there is no guarantee that it helps with the out-of-spec requirements.

As a last resort, you could try to limit the capabilities of the modem. I's better to run without CA than crashing, if a simple thing like that is enough to avoid issues. There's no guarantee it is though.

1 Like

I am not aware of any power settings in OpenWRT / Linux Kernel, would be cool if you could guide me there. What exactly is a "host power limit"? I would be happily running out of spec if it works.

I use a mPcie to M.2 Adapter for the EM7565 card, but that does not have an extra power adapter.

The thing is I used this setup for years (to be fair without wifi) and it was rock solid. I now upgraded from LEDE to OpenWRT 22.03.03 and thats where the problems began. So I hoped it would be some kind of setting, but i fear the issue is in a more deeper layer than that.

If I can not fix it on this verison, i might go back some versions of OpenWRT and see if things gets better.

To be clear, it very well could be a power issue. And everything seems like it, because when I hit wifi with a lot of traffic the lte module get knocked out. BUT it also happens rarely also when wifi is disabled and my computer is the only one connected via ethernet cable. The fact that this never happened with the old LEDE build bothers me.

is this issue fixed?

no unfortunately. I hope for the next stable release, but I do not have a lot of hope.