Mwan3 and Maximum number of concurrent DNS queries reached

Hello,
I'm encountering a strange problem.
I use mwan to load-balance a vdsl (wan0) and an LTE connection (eth4).
The LTE connection is reset by ISP every 4 hours (my external modem needs 2-3 seconds to reconnect) and this happens:

Wed Aug 16 13:55:33 2023 user.info mwan3track[9571]: Check (ping) failed for target "1.1.1.1" on interface wanb (eth4). Current score: 6
Wed Aug 16 13:55:36 2023 user.info mwan3track[9571]: Check (ping) failed for target "9.9.9.9" on interface wanb (eth4). Current score: 6
Wed Aug 16 13:55:46 2023 user.info mwan3track[9571]: Lost 4 ping(s) on interface wanb (eth4). Current score: 5
Wed Aug 16 13:58:23 2023 daemon.warn dnsmasq[1]: Maximum number of concurrent DNS queries reached (max: 150)
Wed Aug 16 13:58:29 2023 daemon.warn dnsmasq[1]: Maximum number of concurrent DNS queries reached (max: 150)
Wed Aug 16 13:58:36 2023 daemon.warn dnsmasq[1]: Maximum number of concurrent DNS queries reached (max: 150)
Wed Aug 16 13:58:57 2023 daemon.warn dnsmasq[1]: Maximum number of concurrent DNS queries reached (max: 150)

No device is able to connect to internet and the only way is to restart wan0 and eth4.
Why mwan3 doesn't deactivate eth4?
Any idea how to solve the problem?
Thanks to all

no idea about mwan3, but you can bump up the dnsmasq query limit

1 Like

Tried, doesn't work...it only takes a few milliseconds more to unravel.
Seems a mwan3 problem, like it's unable to failover and crashes dnsmasq.
I'm posting my mwan3 config to give more clues:

config globals 'globals'
        option mmx_mask '0x3F00'

config interface 'wan'
        option enabled '1'
        option family 'ipv4'
        option reliability '2'
        option initial_state 'online'
        option track_method 'ping'
        option count '1'
        option size '56'
        option timeout '4'
        option interval '10'
        option failure_interval '5'
        option recovery_interval '5'
        option down '5'
        option up '5'
        option max_ttl '60'
        list track_ip '1.0.0.1'
        list track_ip '1.1.1.1'
        list track_ip '208.67.222.222'
        list track_ip '9.9.9.9'

config interface 'wan6'
        option enabled '0'
        list track_ip '2606:4700:4700::1001'
        list track_ip '2606:4700:4700::1111'
        list track_ip '2620:0:ccd::2'
        list track_ip '2620:0:ccc::2'
        option family 'ipv6'
        option reliability '2'

config interface 'wanb'
        option family 'ipv4'
        option enabled '1'
        option initial_state 'online'
        option track_method 'ping'
        option size '56'
        option failure_interval '5'
        option count '2'
        option timeout '2'
        option down '3'
        option up '3'
        option reliability '2'
        option interval '3'
        option recovery_interval '3'
        list track_ip '1.1.1.1'
        list track_ip '9.9.9.9'
        list track_ip '193.110.81.0'
        option max_ttl '60'

config interface 'wanb6'
        option enabled '0'
        list track_ip '2606:4700:4700::1001'
        list track_ip '2606:4700:4700::1111'
        list track_ip '2620:0:ccd::2'
        list track_ip '2620:0:ccc::2'
        option family 'ipv6'
        option reliability '1'

config member 'wan_m1_w3'
        option interface 'wan'
        option metric '1'
        option weight '55'

config member 'wanb_m1_w3'
        option interface 'wanb'
        option metric '1'
        option weight '45'

config policy 'wan_only'
        list use_member 'wan_m1_w3'
        option last_resort 'unreachable'

config policy 'wanb_only'
        option last_resort 'unreachable'
        list use_member 'wanb_m1_w3'

config policy 'balanced'
        list use_member 'wan_m1_w3'
        list use_member 'wanb_m1_w3'
        option last_resort 'unreachable'

config rule 'https'
        option sticky '1'
        option dest_port '443'
        option proto 'tcp'
        option use_policy 'balanced'

config rule 'Geforcenow'
        option proto 'udp'
        option src_ip '192.168.1.53/32'
        option sticky '0'
        option use_policy 'wan_only'
        option family 'ipv4'

config rule 'Qnap431'
        option family 'ipv4'
        option proto 'all'
        option src_ip '192.168.1.16/32'
        option sticky '0'
        option use_policy 'wanb_only'

config rule 'default_rule_v4'
        option dest_ip '0.0.0.0/0'
        option use_policy 'balanced'
        option family 'ipv4'

config rule 'default_rule_v6'
        option dest_ip '::/0'
        option use_policy 'balanced'
        option family 'ipv6

'

You might want to pile on here: Dnsmasq: Maximum concurrent DNS queries limit

1 Like

I think that mwan3 isn't taking the eth4 down because the outages are too brief for that with the timers you have configured.

To test that try changing the ping interval to 1 second, with your configured down option of 3 pings mwan3 should take the interface down with outages of about 3 seconds.

Before reading tour post I tried to streamline the configuration removing the dns-http plugin.
At 20.56 seems mwan3 correctly removed and restarted the interface with time set to 3 seconds as before.

Wed Aug 16 20:56:17 2023 user.info mwan3track[1599]: Check (ping) failed for target "9.9.9.9" on interface wanb (eth4). Current score: 6
Wed Aug 16 20:56:20 2023 user.info mwan3track[1599]: Check (ping) failed for target "193.110.81.0" on interface wanb (eth4). Current score: 6
Wed Aug 16 20:56:20 2023 user.notice mwan3track[1599]: Interface wanb (eth4) is disconnecting
Wed Aug 16 20:56:27 2023 user.info mwan3track[1599]: Lost 4 ping(s) on interface wanb (eth4). Current score: 5

Tomorrow morning I'll check and report. If It fails in the night I'll try to lower to 1 second

When mwan3 takes the interface down it logs "interface is offline", the "interface is disconnecting" message is logged when the first pings are missed, see the below from a test I made earlier disconnecting my fibre link from its ONT:

Wed Aug 16 20:31:16 2023 user.info mwan3track[6122]: Check (ping) failed for target "208.67.220.220" on interface wan (wan). Current score: 9
Wed Aug 16 20:31:17 2023 user.info mwan3track[6122]: Check (ping) failed for target "8.8.4.4" on interface wan (wan). Current score: 9
Wed Aug 16 20:31:17 2023 user.notice mwan3track[6122]: Interface wan (wan) is disconnecting
Wed Aug 16 20:31:42 2023 user.info mwan3track[6122]: Check (ping) failed for target "208.67.220.220" on interface wan (wan). Current score: 8
Wed Aug 16 20:31:43 2023 user.info mwan3track[6122]: Check (ping) failed for target "8.8.4.4" on interface wan (wan). Current score: 8
Wed Aug 16 20:31:49 2023 user.info mwan3track[6122]: Check (ping) failed for target "208.67.220.220" on interface wan (wan). Current score: 7
Wed Aug 16 20:31:50 2023 user.info mwan3track[6122]: Check (ping) failed for target "8.8.4.4" on interface wan (wan). Current score: 7
Wed Aug 16 20:31:50 2023 user.notice mwan3track[6122]: Interface wan (wan) is offline
Wed Aug 16 20:31:51 2023 user.notice mwan3-hotplug[6507]: Execute disconnected event on interface wan (wan)
Wed Aug 16 20:33:43 2023 user.info mwan3track[6122]: Check (ping) success for target "208.67.220.220" on interface wan (wan). Current score: 0
Wed Aug 16 20:33:43 2023 user.info mwan3track[6122]: Lost 22 ping(s) on interface wan (wan). Current score: 0
Wed Aug 16 20:33:43 2023 user.notice mwan3track[6122]: Interface wan (wan) is connecting
Wed Aug 16 20:33:48 2023 user.info mwan3track[6122]: Check (ping) success for target "208.67.220.220" on interface wan (wan). Current score: 1
Wed Aug 16 20:33:53 2023 user.info mwan3track[6122]: Check (ping) success for target "208.67.220.220" on interface wan (wan). Current score: 2
Wed Aug 16 20:33:58 2023 user.info mwan3track[6122]: Check (ping) success for target "208.67.220.220" on interface wan (wan). Current score: 3
Wed Aug 16 20:34:04 2023 user.info mwan3track[6122]: Check (ping) success for target "208.67.220.220" on interface wan (wan). Current score: 4
Wed Aug 16 20:34:09 2023 user.info mwan3track[6122]: Check (ping) success for target "208.67.220.220" on interface wan (wan). Current score: 5
Wed Aug 16 20:34:09 2023 user.notice mwan3track[6122]: Interface wan (wan) is online
Wed Aug 16 20:34:09 2023 user.notice mwan3-hotplug[6912]: Execute connected event on interface wan (wan)

As far as I can tell, which I've just understood by comparing your logs and mwan3 config to mine, the "score" starts counting down from a number that is the sum of the "down" and "up" options and the interface is taken offline when the score reaches the "up" option number, this is so that when the link comes back up the score starts from 0 and when it reaches the "up" option number again the interface is taken back online.

My mwan3 WAN options are "down" 3, "up" 6 and ping interval 10 seconds. In your case the interface should be taken offline when the score reaches 3.

I'm not using mwan3's load balancing, it's only a simple primary/backup setup as the primary is a good fibre link.

Very interesting explanation...
But if it's like you say, even lowering to 1 second couldn't be sufficient considering the 2 max 3 seconds of disconnection....

It should capture 3 seconds outages, it may miss some but for testing it would have been enough. To capture 2 seconds outages you could lower the "down" option to 2 (pings).

But maybe the problem you were having was caused by those load balancing options and you may not need to change any of the timers at all, see how it goes.

1 Like

For now no more crashes to report.
Seems that disabling dns-https proxy has solved the bug.

1 Like

Its not a bug. Its a feature :slight_smile: dnsmasq uses the dns-proxy as upstream resolver. In case, the https-connection from proxy to its upstream (google ?) is broken, the https-connection hangs (until timeout). But dnsmasq still sending more and more dns-inquieries to the proxy ... Even worse, in case of failover. Because the https-connection, opened before failover, still will be used, and hang, after failover, unless explicitly broken, to force a reconnection. Without proxy, the simple DNS-request to upstream resolver causes a timeout for the response. dnsmasq is used to that. And no problem because of switchover, because of UDP.

3 Likes

Ah you meant the https-dns-proxy yesterday, I thought you were talking about the load balancing rules you had on mwan3.

I had the same problem with https-dns-proxy causing the DNS to fail after a primary/backup switch, I uninstalled it and I've been using stubby which works fine.

Interesting... I'll wait a few days to be sure everything works and the I'll try stubby and dot

You did not "break" the failed connection, to force reconnection.

I'm thinking: the mwan3 wiki page should advertise the incompatibility with dns-https-proxy package...
How can I propose the addendum?

There is no incompatibility with dns-https-proxy. You need to set up and use mwan3 correctly, for this to work. But first, you obviously need to understand my explanation.

Well, your explanation seemed to offer no viable solution in my specific use case (2/3 seconds disconnection every 4 hour)...
You also think I should try to lower mwan check to 1 second to force a disconnection? But if I force It, how long it'll take to timeout the https connection?

Yes, I realized that after reading your previous post on this thread, I was referring to what happened to me a few months ago when I saw that the DNS was failing after a WAN switchover and that restarting dns-https-proxy solved the problem.

As stubby is perfectly fine for my use case I didn't bother to dig deeper to get dns-https-proxy to work with mwan3 and I just switched to stubby.

When I tested this, the problem existed also with a full mwan3 switchover from fibre to lte and vice versa but I didn't test for how long it would take for the DNS to sort itself out as I was looking for a seamless and full failover, which I got with stubby.

Edit: @reinerotto what would be the best way to break the failed connection after a mwan3 switchover?

Simplest, from https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3 :
flush_conntrack list no (none) Flush global firewall conntrack table on interface events. See alerts/notifications for a list of interface events

Best, because most flexible, but more complicated: Dig into the possibilities of mwan3.user .

2 Likes