RFC - wwan/uqmi revamp

that's reasonable.
for me personally, the extra cycles wasted for polling are less of a problem than frequent connection interrupts but i also don't want to add to the list of hacks that everyone is using.

i'm willing to continue but i need some pointers.
in my experience, each uqmi instance locks up the device exclusively and it's unclear to me how to 'subscribe' to messages without starving all other instances.
i'm looking at libqmi as a reference. do you have any keywords that could help me identify a method for not blocking the device while waiting for messages?

@Lynx i made a mistake while squashing commits and introduced a syntax error.

the two images won't work and i'll delete them. let me know if you want new ones.

@yogo1212 I concur here with @patrakov and @bmork - I am not so keen on the idea of polling. My Huawei B818-263 reconnected from these disconnects in about one second. I am keen to minimize as much as possible any disruption. In my case since I use cake-autorate I could just restart the modem on a detected stall, but that would also be a hack and I'm really looking for an immediate reconnect to minimize disruption as far as possible. I don't know the specifics of uqmi but I wonder if a proxy service can be used to facilitate commands being issued but at the same time wait for these disconnection events?

In a general sense I think efforts here are absolutely worth it given that wwan is surely gaining and gaining in popularity and its support in OpenWrt clearly needs some attention.

With TP-Link TL-MR6400 v5 I have been modifying qmi.sh to add a hard coded timeout to a call to uqmi like shown here.

Currently the QMI protocol has configuration options for 'timeout' and 'delay' but they are used for other things. How do you see things, do your recent changes perhaps introduce a workaround or remedy to this?

hi :slight_smile:

i've found that any of the uqmi invocations can hang and creating a function to 'overload' uqmi is less error-prone than changing all invocations manually.
in my PR, it's done like this:

uqmi() {
  local t="${TIMEOUT-10}"

  timeout -k "$(( t + 2 ))" "$(( t + 1 ))" /sbin/uqmi -t "$(( t * 1000 ))" "$@"
}

it will ask uqmi to finish after t, waits one second before sending SIGTERM, and waits another second before sending SIGKILL.

this is still part of the PR - only the polling-based mechanism to handle provider-issued disconnects of data connections was redrawn.
please leave a comment on the PR should you decide to test it :wink:

@yogo1212 for the 'qmi' protocol of netifd there is surely a way to achieve what ModemManager does and interpret messages from the modem a la:

Wed Nov  2 20:38:54 2022 daemon.info [2716]: <info>  [modem0] state changed (connected -> disconnecting)
Wed Nov  2 20:38:54 2022 daemon.info [2716]: <info>  [modem0] state changed (disconnecting -> registered)
Wed Nov  2 20:38:54 2022 daemon.info [2716]: <info>  [modem0/bearer3] connection #1 finished: duration 121s, tx: 258333 bytes, rx: 371928 bytes
Wed Nov  2 20:38:54 2022 user.notice modemmanager: interface wan (network device wwan0) disconnected

or:

Mon Oct 31 17:37:46 2022 daemon.info [2719]: <info>  [modem0/bearer1] verbose call end reason (6,36): [3gpp] regular-deactivation

And react on that by immediately putting interface down, reconnecting and interface back up (if I am getting this right)?

This would bring the 'qmi' protocol up to speed with the 'ModemManager' protocol, and perhaps even surpass it since presently with the 'ModemManager' protocol netifd doesn't even bother to reconnect, which baffles me.

@patrakov @bmork

looking at libqmi, it appears there are unsolicited messages in qmi as well.

to be able to react to unsolicited messages, there will need to be a process that runs permanently and that uses the cdc device exclusively - not obsoleting but also disabling applications like uqmi.

i would like to create a new tool that translates ubus <-> qmi per cdc device.

i don't see a good way of doing it with the existing uqmi architecture. it appears to have been designed for transactional interactions. it should be possible to transform uqmi to behave more like a daemon but maybe its more sensible to split it in two:

  • a new daemon-like application like mentioned above (ubus <-> qmi)
  • the uqmi tool mostly keeps its syntax and becomes a frontend for the daemon-like

@nbd is there any chance that such changes would be accepted into uqmi?

1 Like

@yogo1212 I think such a change is a very good idea. Ideally, starting a ubus service should be possible via a command line switch from the uqmi tool.
I think it would also be a good idea to decide via command line flag if a command should be run directly, or if it should be passed to a running daemon instead.
This could be done by simply translating the cli commands to ubus method + blobmsg arguments and then at the final step using the flag to decide whether to run ubus_invoke or call the method handler function directly.

1 Like

@yogo1212 since your post above, @jow and I had a discussion on #openwrt-devel about this very issue here covering:

  • general brokenness of wireless wan in OpenWrt right now because of the lack of disconnection handling in either ModemManager or qmi protocols
  • ModemManager;
  • @aleksander0m's patch to ModemManager; and
  • uqmi and your proposed changes to the qmi protocol.

@jow gives some thoughts relative to netifd which might be useful in terms of fixing uqmi to properly handle disconnections.

Here are some titbits that seem relevant:

[11:21:12] [jow] maybe. problem is that neither mm nor uqmi really fit into netifd's expected process model
[11:21:17] [jow] what netifd really needs is:
[11:21:34] [jow] a process it starts in order to create/configure/keep running a network device
[11:21:45] [jow] when the process dies, the network device dies along with it
[11:22:03] [jow] this holds true for pppd, openvpn and various other tunnel protocols
[11:22:22] [jow] there's also protocols using one-shot commands (ip link ... or ip tunnel ...) to set things up

[11:27:11] [jow] I could imagine a theoretical "uqmi monitor" subcommand
[11:27:24] [jow] it will start a daemon process that subscribes to events from thje given device
[11:27:34] [jow] and if it catches a disconnection event, it will exit(0) itself
[11:27:58] [jow] this "uqmi montor" daemon process could then be launched by netifd

[11:31:21] [jow] I mean uqmi oneshot commands could be redesigned in a way that they firs try to communicate through the monitor process if one is found active for the device it is attmepting to talk to
[11:31:29] [jow] and falling back to the old way of doing things

Please see full log here: https://pastebin.com/raw/7ePH0W5N

One observation is that this assumption:

is apparently incorrect - netifd will not automatically reconnect. @jow suggested that in this instance ModemManager ought to reconnect, then a report-up script like the existing report down-script would need to communicate new details to netifd.

So actually the state of the ModemManager protocol is still broken in OpenWrt right now notwithstanding the patch because ModemManager expects netifd to reconnect, but netifd expects ModeManager to reconnect. And the poor user has to resort to dodgy DIY hacks like rebooting the router every 12 hours or reconnecting on ICMP failure.

So we are really needing knight in shining armour to fix up netifd qmi so that disconnection events are handled and modem is reconnected and new configuration details provided to netifd.

I can help with testing, subject to not wanting to brick my Zyxel NR7101.

But then to my astonishment the fat lazy man just sat there doing nothing

LoL :smiley: that fat lazy man does nothing by itself, he must have a partner asking him to do the things, pretty much like in my marriage.

1 Like

Please forgive my impertinence and ignorant comments; I likewise rely upon my wife to tell me off for speaking before thinking and other inappropriate mannerisms, but that important feedback mechanism does not apply in respect of some of my online interactions.

Now I feel a bit like a bull in a china shop bandying about my ill-conceived thoughts and ideas here and there without really properly understanding the underlying issues.

I think what you see is just a reflection of frustration on my part in trying to get things up and running properly. I have so enjoyed setting up OpenWrt on my downstream router and access points, and it turned out doing the same on my 4G modem replacement has not been so easy.

For completeness, I enclose the follow-up conversation between @jow and @aleksander0m on #openwrt-devel, which describes how a 'watcher' process should perhaps be setup. @yogo1212 I presume this technique could also work for the 'qmi' protocol?

[13:30:34] <aleksander> oh wow, I missed a long discussion
[13:31:16] <aleksander> MM doesn't automatically reconnect, MM does not have any logic for that, it's not a connection manager
[13:31:32] <aleksander> MM only acts on what the upper layer connection manager requests to do, be it connect or disconnect
[13:31:57] <aleksander> MM will monitor network initiated disconnections, and report the disconnection to the upper layers so that they decide what to do
[13:33:31] <aleksander> jow, not sure if I understood it correctly, but are you suggesting triggering the reconnection from within the MM dispatcher script when it reports a disconnection? that's a bit convoluted
[13:38:49] <jow> aleksander: problem is that netifd regards mm to be the connection manager as it is not supervised by netifd
[13:39:04] <aleksander> "nobody likes ModemManager since it uses dbus and all these extra packages and it's really a big elephant" MM was never targeting devices with estremely low amount of memory, but a lot of openwrt devices have tonds of memory available and MM is not such a big elephant there
[13:40:15] <aleksander> jow, there's no connection management done by MM itself though, it's all done at netifd and protocol handler level
[13:41:03] <jow> aleksander: I don't want to repeat all the discussion again
[13:41:19] <jow> but in essence, netifd only does reconnections for supervised proto handler processes
[13:41:28] <aleksander> jow, the dispatcher script called by MM on disconnect (.e.g when a network initiated disconnect happens) was supposed to notify netifd that the "underlying" connection is down, so that netifd can report the iface as down
[13:41:58] <aleksander> ah, wrongly assumed that then
[13:42:49] <aleksander> what needs to happen to have it a supervised proto handler process? is that something that can be developed in the proto handler implementation itself?
[13:43:24] <jow> maybe you can get away with a simple brute force approach of simply calling   ubus call network.interface down '{ "interface": "$CFG" }' && sleep 1 && ubus call network.interface up '{ "interface": "$CFG" }'
[13:43:32] <jow> in that disconnect notify script
[13:44:05] <jow> aleksander: basically you need some sort of process that exits when the underlying device loses connection
[13:44:20] <aleksander> doing that breaks the purpose of the disconnect notify script though
[13:44:35] <jow> aleksander: netifd will then kill the netdev, restart that process and await proto updates
[13:44:59] <aleksander> we could have a watcher process launched on a connect, and have that process killed by the disconnect dispatcher script
[13:45:06] <jow> yep
[13:45:23] <jow> netifd will restart the watcher process
[13:45:30] <aleksander> that's doable and clean, because the watcher process would be launched within the protocol handler
[13:45:40] <jow> but it is also assumed that this "watcher process" is also the "setup process"
[13:45:54] <aleksander> that's fine
[13:46:09] <jow> so the watcher would also need to do all the necessary things to initiate a connection, fetch details, trigger a proto update notify
[13:46:12] * danitool (~dani@94.73.56.247) has quit IRC (Remote host closed the connection)
[13:46:27] <aleksander> yes, yes, that's doable I think
[13:47:00] <jow> you will also only need this watcher process for non-ppp connection types
[13:47:37] <jow> for ppp, the proto handler already does proto_run_command /usr/sbin/pppd ...
[13:47:45] <jow> which has the required semantics
[13:47:55] <aleksander> yep
[13:48:58] <jow> (while looking at it: 22https://github.com/openwrt/packages/blob/master/net/modemmanager/files/modemmanager.proto#L159 - $username and $password should be quoted)
[13:49:50] <jow> nvm, they don't need to
[13:54:15] <aleksander> oh well, already did this: 22https://github.com/openwrt/packages/pull/19811
[13:54:31] <aleksander> want me to close that?
[13:54:40] <jow> keep it open, it does not hurt

thank you both for the valuable feedback!
i'll do the first stub on sunday to see what it feels like.

1 Like

Hi everyone :slight_smile:
As a mobile internet user (qmi/mbim, mm for me it is unnecessary on routers), I am glad that the problem with disconnecting the connection could be solved. There would be no need to install watchcat or similar packages.

I just hope that what's new will not take away the possibility of frequent communication with the modem using at commands because it would kill my LuCI addons.

2 Likes

Yeah there is clearly huge demand for these changes and so efforts to improve are hugely welcome.

Hi !

I've improved the qmi scripts for our usage. The code is really bad but it gave me a high stability and also muxing and packet aggregation. Perhaps you can use it for inspiration:

1 Like

Thanks for sharing. Does this include reconnection handling and if so how is that implemented? As you can see from the above there is a desire to avoid polling and instead react to modem events as soon as they happen, which requires some special handling to keep the possibility to run manual commands whilst the connection watchdog is running.

There's a watchdog implemented which checks the connection status. It also has some advanced checks to see if

  • PDP context is still active and if
  • traffic is flowing, if the modem or the provider forget the PDP contexts (don't ask, I know this from big global m2m providers)
  • counts critical qmi errors

The watchdog is started for every qmi connection and handled by netifd, so the common netifd process flow works.

Every qmi call goes thru timeout and locking to make sure that [parallelism] is not a problem.
Also it is important for stability that you register client ids for the several qmi services and only use them later. A special case is the WDS service where a new client id is bound to the PDP context.
If you do not care, the modem may hang up. This is done through a hotplug script which also intializes the modem and creates the several muxes which are needed for packet aggregation (speed, speed).

That's also the reason for the hotplug script. If the modem reboots because it is unreliable, you have to make sure it is correctly initialized again.

Currently it needs qmicli and uqmi, so it has big dependencies. Also it calls somewhere a usb-repower shell script, which turns USB power off/on to reboot it in cause auf catastrophic failure. But that's not included.

1 Like

Sounds pretty comprehensive. I haven't looked in detail at the code, but isn't the watchdog relying upon polling every sixty seconds?

At the moment ModemManager interprets disconnects from the modem as they come in and can react accordingly. And yet presently it just informs netifd of the disconnect, doesn't reconnect, because the author expects netifd to handle the reconnection, and because netifd does not act on the disconnect, the net result (no pun intended!) is loss of internet connectivity for the user.

So even though the ModemManager implementation in OpenWrt is broken, at least it is configured to react immediately without polling.

It polls every 60 seconds, correct. But only relying on the disconnect events will not be enough to make sure you have a reliable connection. I recommend to check data counters also, if you want it to be long term stable.

So to summarize in the ideal case there'd be:

  • continuous monitoring for disconnect events
  • periodic evaluation of data transfer

Anything else?