RFC - wwan/uqmi revamp

yogo1212 · November 1, 2022, 12:52pm

I'm very excited to have published my recent changes related to the wwan/umbim/uqmi packages.
At the same time, I'm unsure about whether they will resonate well with the Openwrt community.

The goal is to improve the WWAN experience with Openwrt and to clean up the ridiculously long netifd handler.

The change with the most impact should be to not call proto_block_restart when the network registration times out.
Imo, the registration should be seen as layer 2 from netifd's perspective and this is equivalent to not refusing to work with an Ethernet cable just because it wasn't plugged a moment ago.

This abstraction is slightly broken again for the next functional change:
Keeping the interface up as long as any data connections on it are active.
The new qmi.script started using proto_run_command regularly polls the connection state of the configured data connections.
If there is no active data connection, the script will end, which should trigger a restart of the main interface (and, thus, re-run the registration code).

A few improvements to the hotplug handlers are also included - which will now be more careful when choosing the interface to (re-)start when a modem is attached or spontaneously resets. This is mainly useful for setups with multiple modems but there, it's a necessity (otherwise, attaching modem 2 via USB will reset the interface for modem 1).

A more controversial change should be the addition of an ifname parameter that renames the associated Linux network device - maybe that needs renaming - maybe it's undesired.

For those people interested in tinkering with WWAN who don't have hardware:
I've got a Huawei ME906E that is missing a UFL connector for AUX/GPS. There's also a Sierra Wireless EM7455 (with all UFL connectors). To complete the package, there are three pigtails and a USB-to-NGFF adapter.
Both have been used with Openwrt (a kernel patch is required for the Huawei ME906E - I can provide that) but I lack the time to properly integrate them.
Preferably, you'll add the missing pieces, help getting them upstream, and want to pay for shipping (not required if you're short on money).
DM me if you're interested!

If this gets merged I want to encapsulate the SIM unlocking and network registration code somehow; maybe using extdev? I need feedback!

Here is:

github.com/openwrt/openwrt

improve WWAN options and handling

openwrt:master ← yogo1212:wwan_busy

opened 09:15AM - 06 Oct 22 UTC

yogo1212

+442 -202

This PR adds a few quality-of-life features to the wwan/umbim/uqmi packages. Mo…st notably, a connection script is added for uqmi. I could go on changing more and more but this PR is already rather large for my taste. Message from before: This is now a small collection of changes I implemented to make the use of the WWAN modems easier. These are open topics I see as being relevant: - ~~the `ifname` option is ineffective~~ (implemented) - ~~`network reload` sometimes doesn't do anything either if the interface is down~~ (that is probably the right thing to do. hardly any options change behaviour leading to `proto_block_restart` are affected, except for plmn - ignoring that for now) - increase ability to automatically recover from these events: - ~~screwing in the antennas a while after the router has booted~~ - ~~patch of clouds during boot~~ - ~~a particular SIM card takes a while longer to register~~ (one of mine does) - address change (needs testing) - netifd considers the interface to be up despite: - automatic regular disconnect by the provider (needs testing) - USB device was detached - ~~involuntary modem reset (USB device reappears)~~ - ~~pdptype 'auto' - connect to ipv4/ipv6 as available but require at least one~~ (at least for uqmi, the interface stays up if both are configured and only one is used) Just imagine having to reboot after plugging in the ethernet cable for a DHCP interface because the network device is locked up (happens with dangling uqmi instances blocking `/dev/cdc-wdm*`). The error recovery and restart logic *really* need more love. Often, bringing the interface down and up is enough for the long registration but sometimes it isn't - because uqmi hangs and it's quicker to just always reboot. This is not a complaint - I just want to provide motivation for the changes. Original message: The netifd handler `qmi.sh` sets `raw-ip` and calls `uqmi --sync` both of which fail if the interface is up. ``` Thu Oct 6 08:36:31 2022 daemon.notice netifd: n_fb2 (27090): Device does not support 802.3 mode. Informing driver of raw-ip only for wwan-n_fb2 .. Thu Oct 6 08:36:31 2022 daemon.notice netifd: n_fb2 (27090): + echo Y # this is from `echo "Y" > /sys/class/net/$ifname/qmi/raw_ip`: Thu Oct 6 08:36:31 2022 daemon.notice netifd: n_fb2 (27090): sh: write error: Resource busy Thu Oct 6 08:36:31 2022 daemon.notice netifd: n_fb2 (27090): + uqmi -s -d /dev/cdc-wdm0 --sync Thu Oct 6 08:36:31 2022 kern.err kernel: [ 3494.189112] qmi_wwan 1-2:1.4 wwan-n_fb2: Cannot change a running device ``` I'm not sure this is the right way but as it should be an improvement nonetheless.. "works for me" Open for feedback :-)

Lynx · November 1, 2022, 1:26pm

Dear @yogo1212,

Very interesting.

Could you comment on how this relates to the issue I outlined here:

Might your proposal address / fix the issue I outline in the thread above (which seems to be an issue encountered by several users)? If so, that seems positive to me. Does it mean users can forego ModemManager and just use e.g. 'luci-proto-qmi' + these fixes? ModemManager does seem a bit bloated.

Otherwise in general for what it's worth the wwan handling in OpenWrt seems like a bit of a mess. Is that a fair assessment?

It feels to me like the present wwan handling in OpenWrt really could do with some revamping and/or consolidation. My understanding is that there are various different tools (e.g. luci-proto-qmi or luci-proto-modemmanager, uqmi or ModemManager and mmcli with proxy, and more?) with overlapping, unclear or poorly documented utility and even in 22.03.2 the handling seems a bit broken at the moment.

At the moment various users seem to be relying upon their own DIY hacks to try to overcome wwan disconnects and reconnections, e.g. by restarting the router every 24 hours or other techniques.

The overall capability seems all there across the various different tools, it just needs some looking at to converge on something robust that is well documented.

yogo1212 · November 1, 2022, 3:18pm

Hi

Could you comment on how this relates to the issue I outlined here:

You mention a few different issues.
I have little experience with ModemManager but I come to the same assessment: It's crufty.
Ofono is just more pleasant overall but, for Openwrt, I only use the standard packages.

What I did mostly relates to uqmi. No idea what the interface restart encompasses for the change with ModemManager. I can say for my change that it tries to only restart the data connection and leaves the rest be.

Can't say anything about FCC unlock.

Waiting before re-attempting to register is an interesting option.
delay could be used for something similar but, really, a proto_block_restart .. 600 would be more fitting.

Otherwise in general for what it's worth the wwan handling in OpenWrt seems like a bit of a mess. Is that a fair assessment?

Yes, I do think wwan is 'a bit of a mess' - but it's hard to hold the volunteers driving the Openwrt accountable
I would prefer if ofmodemsandmen contributed to Openwrt instead of creating a one-off project that will easily die the same way that every Qualcomm SDK dies (in a rock cave? ).

At the moment various users seem to be relying upon their own DIY hacks to try to overcome wwan disconnects and reconnections, e.g. by restarting the router every 24 hours or other techniques.

This is what I see as well and I think it hurts the project.
But again, we can't make demands. Community effort means having to chime in from to time.

Would you mind testing for me?
What's your hardware?

Lynx · November 1, 2022, 3:43pm

I'd be happy to test although would that mean switching from 22.03.2 to master?

I have a Zyxel NR7101:

Selfishly the specific situation I am keen to address is that my ISP modem disconnects my device at 48 hours. An immediate reconnection is needed with possible wan IP refresh.

So originally I just used 'luci-proto-qmi' and that worked well save for the 48 hour disconnect resulting in loss of connectivity and an unhappy wife.

At this time a call to uqmi confirmed the connection had been lost. Apparently I could set 'set-autoconnect=enabled' but I am told that at this point OpenWrt wouldn't even know to assign a new IP address.

That is why I tried ModemManager which seemed to pull in a bunch of packages. Given its age and weightiness I imagined it would just effortlessly deal with this 48 hour disconnect out of the box.

Helpfully it gave the error message:

Mon Oct 31 17:37:46 2022 daemon.info [2719]: <info>  [modem0/bearer1] verbose call end reason (6,36): [3gpp] regular-deactivation

But then to my astonishment the fat lazy man just sat there doing nothing, I have loss of internet connectivity and OpenWrt doesn't even know about it. And naturally my heart sank.

Apparently there is a patch in master to deal with this:

But not in 22.03.2.

I'd personally like to ditch ModemManager and just work with uqmi if even that's possible. Although maybe there's no getting away from the need for a daemon of some form. I'm pretty new to this having relied upon a Huwaei B818-263 in bridge mode before. Bridge mode in this context in OpenWrt doesn't seem to be a thing.

I'm not altogether sure if your proposed fixes would address my issue or not.

yogo1212 · November 2, 2022, 10:12am

To my knowledge, the modem figures out the addressing for a given data connection and Openwrt pulls it. The data connection is invalidated at the whims of the provider (20h with one of my providers and 4 hours with another).
The qmi script I'm proposing checks whether this has happened every minute and refreshes only the addresses - if it's able to establish a new data connection. Otherwise, the main interface goes down and restarts network registration.

This is enough to bring me up:

config interface 'rm502q'
        option proto 'wwan'
        option ifname 'wwan-rm502q'
        option apn 'sipgate'
        option bus '2-1' # only required with more than one modem

I've rebased my changes on the release tag. zip/branch.
The same is possible with the commit of aleksander0m.
All this is assuming you build Openwrt.

I've built images for your router with a vanilla config + screen in case you haven't got a build system set up (expires in two weeks).

patrakov · November 2, 2022, 10:26am

I would NAK this kind of one-minute polling. If the modem announces this kind of change via one of its serial ports immediately, then there should be a daemon that listens for such unsolicited announcements, parses them, and reacts accordingly - immediately.

bmork · November 2, 2022, 10:40am

I agree. Susbcribe to QMI notifications and act on them

Polling is bad. Constantly closing and reopening the /dev/cdc-wdmX device because we start a new uqmi process for every poll session makes it even worse. Lots of work for both driver and modem firmware for absolutely no reason at all.

Either fix uqmi to support this or using something more suitable

yogo1212 · November 2, 2022, 1:58pm

that's reasonable.
for me personally, the extra cycles wasted for polling are less of a problem than frequent connection interrupts but i also don't want to add to the list of hacks that everyone is using.

i'm willing to continue but i need some pointers.
in my experience, each uqmi instance locks up the device exclusively and it's unclear to me how to 'subscribe' to messages without starving all other instances.
i'm looking at libqmi as a reference. do you have any keywords that could help me identify a method for not blocking the device while waiting for messages?

yogo1212 · November 2, 2022, 2:15pm

@Lynx i made a mistake while squashing commits and introduced a syntax error.

the two images won't work and i'll delete them. let me know if you want new ones.

Lynx · November 2, 2022, 3:28pm

@yogo1212 I concur here with @patrakov and @bmork - I am not so keen on the idea of polling. My Huawei B818-263 reconnected from these disconnects in about one second. I am keen to minimize as much as possible any disruption. In my case since I use cake-autorate I could just restart the modem on a detected stall, but that would also be a hack and I'm really looking for an immediate reconnect to minimize disruption as far as possible. I don't know the specifics of uqmi but I wonder if a proxy service can be used to facilitate commands being issued but at the same time wait for these disconnection events?

In a general sense I think efforts here are absolutely worth it given that wwan is surely gaining and gaining in popularity and its support in OpenWrt clearly needs some attention.

sandberg · November 2, 2022, 6:27pm

With TP-Link TL-MR6400 v5 I have been modifying qmi.sh to add a hard coded timeout to a call to uqmi like shown here.

Currently the QMI protocol has configuration options for 'timeout' and 'delay' but they are used for other things. How do you see things, do your recent changes perhaps introduce a workaround or remedy to this?

yogo1212 · November 4, 2022, 7:43am

hi

i've found that any of the uqmi invocations can hang and creating a function to 'overload' uqmi is less error-prone than changing all invocations manually.
in my PR, it's done like this:

uqmi() {
  local t="${TIMEOUT-10}"

  timeout -k "$(( t + 2 ))" "$(( t + 1 ))" /sbin/uqmi -t "$(( t * 1000 ))" "$@"
}

it will ask uqmi to finish after t, waits one second before sending SIGTERM, and waits another second before sending SIGKILL.

this is still part of the PR - only the polling-based mechanism to handle provider-issued disconnects of data connections was redrawn.
please leave a comment on the PR should you decide to test it

Lynx · November 4, 2022, 10:00am

@yogo1212 for the 'qmi' protocol of netifd there is surely a way to achieve what ModemManager does and interpret messages from the modem a la:

Wed Nov  2 20:38:54 2022 daemon.info [2716]: <info>  [modem0] state changed (connected -> disconnecting)
Wed Nov  2 20:38:54 2022 daemon.info [2716]: <info>  [modem0] state changed (disconnecting -> registered)
Wed Nov  2 20:38:54 2022 daemon.info [2716]: <info>  [modem0/bearer3] connection #1 finished: duration 121s, tx: 258333 bytes, rx: 371928 bytes
Wed Nov  2 20:38:54 2022 user.notice modemmanager: interface wan (network device wwan0) disconnected

or:

Mon Oct 31 17:37:46 2022 daemon.info [2719]: <info>  [modem0/bearer1] verbose call end reason (6,36): [3gpp] regular-deactivation

And react on that by immediately putting interface down, reconnecting and interface back up (if I am getting this right)?

This would bring the 'qmi' protocol up to speed with the 'ModemManager' protocol, and perhaps even surpass it since presently with the 'ModemManager' protocol netifd doesn't even bother to reconnect, which baffles me.

yogo1212 · November 4, 2022, 10:59am

@patrakov @bmork

looking at libqmi, it appears there are unsolicited messages in qmi as well.

to be able to react to unsolicited messages, there will need to be a process that runs permanently and that uses the cdc device exclusively - not obsoleting but also disabling applications like uqmi.

i would like to create a new tool that translates ubus <-> qmi per cdc device.

i don't see a good way of doing it with the existing uqmi architecture. it appears to have been designed for transactional interactions. it should be possible to transform uqmi to behave more like a daemon but maybe its more sensible to split it in two:

a new daemon-like application like mentioned above (ubus <-> qmi)
the uqmi tool mostly keeps its syntax and becomes a frontend for the daemon-like

@nbd is there any chance that such changes would be accepted into uqmi?

nbd · November 4, 2022, 11:23am

@yogo1212 I think such a change is a very good idea. Ideally, starting a ubus service should be possible via a command line switch from the uqmi tool.
I think it would also be a good idea to decide via command line flag if a command should be run directly, or if it should be passed to a running daemon instead.
This could be done by simply translating the cli commands to ubus method + blobmsg arguments and then at the final step using the flag to decide whether to run ubus_invoke or call the method handler function directly.

Lynx · November 4, 2022, 11:58am

@yogo1212 since your post above, @jow and I had a discussion on #openwrt-devel about this very issue here covering:

general brokenness of wireless wan in OpenWrt right now because of the lack of disconnection handling in either ModemManager or qmi protocols
ModemManager;
@aleksander0m's patch to ModemManager; and
uqmi and your proposed changes to the qmi protocol.

@jow gives some thoughts relative to netifd which might be useful in terms of fixing uqmi to properly handle disconnections.

Here are some titbits that seem relevant:

[11:21:12] [jow] maybe. problem is that neither mm nor uqmi really fit into netifd's expected process model
[11:21:17] [jow] what netifd really needs is:
[11:21:34] [jow] a process it starts in order to create/configure/keep running a network device
[11:21:45] [jow] when the process dies, the network device dies along with it
[11:22:03] [jow] this holds true for pppd, openvpn and various other tunnel protocols
[11:22:22] [jow] there's also protocols using one-shot commands (ip link ... or ip tunnel ...) to set things up

[11:27:11] [jow] I could imagine a theoretical "uqmi monitor" subcommand
[11:27:24] [jow] it will start a daemon process that subscribes to events from thje given device
[11:27:34] [jow] and if it catches a disconnection event, it will exit(0) itself
[11:27:58] [jow] this "uqmi montor" daemon process could then be launched by netifd

[11:31:21] [jow] I mean uqmi oneshot commands could be redesigned in a way that they firs try to communicate through the monitor process if one is found active for the device it is attmepting to talk to
[11:31:29] [jow] and falling back to the old way of doing things

Please see full log here: https://pastebin.com/raw/7ePH0W5N

One observation is that this assumption:

is apparently incorrect - netifd will not automatically reconnect. @jow suggested that in this instance ModemManager ought to reconnect, then a report-up script like the existing report down-script would need to communicate new details to netifd.

So actually the state of the ModemManager protocol is still broken in OpenWrt right now notwithstanding the patch because ModemManager expects netifd to reconnect, but netifd expects ModeManager to reconnect. And the poor user has to resort to dodgy DIY hacks like rebooting the router every 12 hours or reconnecting on ICMP failure.

So we are really needing knight in shining armour to fix up netifd qmi so that disconnection events are handled and modem is reconnected and new configuration details provided to netifd.

I can help with testing, subject to not wanting to brick my Zyxel NR7101.

aleksander0m · November 4, 2022, 2:06pm

But then to my astonishment the fat lazy man just sat there doing nothing

LoL that fat lazy man does nothing by itself, he must have a partner asking him to do the things, pretty much like in my marriage.

Lynx · November 4, 2022, 2:50pm

Please forgive my impertinence and ignorant comments; I likewise rely upon my wife to tell me off for speaking before thinking and other inappropriate mannerisms, but that important feedback mechanism does not apply in respect of some of my online interactions.

Now I feel a bit like a bull in a china shop bandying about my ill-conceived thoughts and ideas here and there without really properly understanding the underlying issues.

I think what you see is just a reflection of frustration on my part in trying to get things up and running properly. I have so enjoyed setting up OpenWrt on my downstream router and access points, and it turned out doing the same on my 4G modem replacement has not been so easy.

For completeness, I enclose the follow-up conversation between @jow and @aleksander0m on #openwrt-devel, which describes how a 'watcher' process should perhaps be setup. @yogo1212 I presume this technique could also work for the 'qmi' protocol?

[13:30:34] <aleksander> oh wow, I missed a long discussion
[13:31:16] <aleksander> MM doesn't automatically reconnect, MM does not have any logic for that, it's not a connection manager
[13:31:32] <aleksander> MM only acts on what the upper layer connection manager requests to do, be it connect or disconnect
[13:31:57] <aleksander> MM will monitor network initiated disconnections, and report the disconnection to the upper layers so that they decide what to do
[13:33:31] <aleksander> jow, not sure if I understood it correctly, but are you suggesting triggering the reconnection from within the MM dispatcher script when it reports a disconnection? that's a bit convoluted
[13:38:49] <jow> aleksander: problem is that netifd regards mm to be the connection manager as it is not supervised by netifd
[13:39:04] <aleksander> "nobody likes ModemManager since it uses dbus and all these extra packages and it's really a big elephant" MM was never targeting devices with estremely low amount of memory, but a lot of openwrt devices have tonds of memory available and MM is not such a big elephant there
[13:40:15] <aleksander> jow, there's no connection management done by MM itself though, it's all done at netifd and protocol handler level
[13:41:03] <jow> aleksander: I don't want to repeat all the discussion again
[13:41:19] <jow> but in essence, netifd only does reconnections for supervised proto handler processes
[13:41:28] <aleksander> jow, the dispatcher script called by MM on disconnect (.e.g when a network initiated disconnect happens) was supposed to notify netifd that the "underlying" connection is down, so that netifd can report the iface as down
[13:41:58] <aleksander> ah, wrongly assumed that then
[13:42:49] <aleksander> what needs to happen to have it a supervised proto handler process? is that something that can be developed in the proto handler implementation itself?
[13:43:24] <jow> maybe you can get away with a simple brute force approach of simply calling   ubus call network.interface down '{ "interface": "$CFG" }' && sleep 1 && ubus call network.interface up '{ "interface": "$CFG" }'
[13:43:32] <jow> in that disconnect notify script
[13:44:05] <jow> aleksander: basically you need some sort of process that exits when the underlying device loses connection
[13:44:20] <aleksander> doing that breaks the purpose of the disconnect notify script though
[13:44:35] <jow> aleksander: netifd will then kill the netdev, restart that process and await proto updates
[13:44:59] <aleksander> we could have a watcher process launched on a connect, and have that process killed by the disconnect dispatcher script
[13:45:06] <jow> yep
[13:45:23] <jow> netifd will restart the watcher process
[13:45:30] <aleksander> that's doable and clean, because the watcher process would be launched within the protocol handler
[13:45:40] <jow> but it is also assumed that this "watcher process" is also the "setup process"
[13:45:54] <aleksander> that's fine
[13:46:09] <jow> so the watcher would also need to do all the necessary things to initiate a connection, fetch details, trigger a proto update notify
[13:46:12] * danitool (~dani@94.73.56.247) has quit IRC (Remote host closed the connection)
[13:46:27] <aleksander> yes, yes, that's doable I think
[13:47:00] <jow> you will also only need this watcher process for non-ppp connection types
[13:47:37] <jow> for ppp, the proto handler already does proto_run_command /usr/sbin/pppd ...
[13:47:45] <jow> which has the required semantics
[13:47:55] <aleksander> yep
[13:48:58] <jow> (while looking at it: 22https://github.com/openwrt/packages/blob/master/net/modemmanager/files/modemmanager.proto#L159 - $username and $password should be quoted)
[13:49:50] <jow> nvm, they don't need to
[13:54:15] <aleksander> oh well, already did this: 22https://github.com/openwrt/packages/pull/19811
[13:54:31] <aleksander> want me to close that?
[13:54:40] <jow> keep it open, it does not hurt

yogo1212 · November 4, 2022, 8:58pm

thank you both for the valuable feedback!
i'll do the first stub on sunday to see what it feels like.

IceG · November 5, 2022, 6:14am

Hi everyone
As a mobile internet user (qmi/mbim, mm for me it is unnecessary on routers), I am glad that the problem with disconnecting the connection could be solved. There would be no need to install watchcat or similar packages.

I just hope that what's new will not take away the possibility of frequent communication with the modem using at commands because it would kill my LuCI addons.