Mesh, AP and DNS question

Hi Guys

I have my 5 node HH5a 22.03 rc4 mesh with roaming working fairly well, but with one problem which keeps cropping up.

It seems as though DNS is not getting served consistently to the AP clients attached to the mesh nodes (clients on the master node and cabled ethernet seems ok), the symptons are as follows:

  1. Connecting to the master mesh node (which is running 21.02.1) via its AP everything works as it should all of the time.

  2. Connecting to one of the mesh nodes results in no internet browser connectivity but I can ping the other nodes and external sites using thier address or name in the terminal command prompt.

  3. Using the openwrt network diagnostic ping tool with a name results in a fail but entering an ip address 8.8.8.8 works fine.

From the above I conclude that the mesh nodes are not getting DNS settings, specifically they seem to forget them over night.

The strange thing is at step 2 using the command prompt on the client I can ping external sites with name or ip address and get a return.

I suspect I haven't got the DHCP / DNS setting quite right somewhere and would appreciate a little guidance.

REgards Tim

Mesh nodes doesn't really need the DNSes, it's only for the clients.
Are the nodes using static IPs, or DHCP?

If pinging dns names works, check the browser settings, like DoH.

What kind of clients are we talking about, anyway?

Hi thanks for the reply.

In my simple way of thinking (which maybe incorrect) as the mesh nodes are L2 all the dhcp / dns functionality should come from the master nodes, thus on the mesh nodes anything related to dns / dhcp can be disabled which is what I've tried to do but I still get the problem if I added 8.8.8.8 to dns forwardings under dhcp.

To answer your questions

All the mesh nodes are set to dhcp clients, but the mac's are bound to static ip addresses in the master node for ease of tracking problems.

The clients are mostly laps tops, ipads and iphones. The problem seems more prevelant with ios devices which are entirely wifi. Lap tops with a cable connection to a mesh node seem ok.

Interesting about DoH as some ios devices give a warning about dns privacy but seem to work ok, but this could be part of the problem if I knew how to dig a bit deeper.

Regards Tim

I've been digging into this a little and I might have gone down a rabbit hole, but about once a day the master mesh node drops the bridge to the rest of the mesh so preventing clients getting on the web. Clients connected to the master node AP are ok.

This is not the same problem as originally listed but maybe related.

Looking in both the system and kernel logs I get an out of memory error as per below.
Could it be that dnsmasq is running out of memory ?

Either way running out of memory is a bad thing and needs fixing. Only seems to happen on the master node.

Regards Tim

Mem-Info:
[38259.540412] active_anon:726 inactive_anon:5 isolated_anon:0
[38259.540412]  active_file:172 inactive_file:212 isolated_file:0
[38259.540412]  unevictable:0 dirty:0 writeback:0 unstable:0
[38259.540412]  slab_reclaimable:377 slab_unreclaimable:2774
[38259.540412]  mapped:130 shmem:59 pagetables:85 bounce:0
[38259.540412]  free:4428 free_pcp:75 free_cma:0
[38259.572211] Node 0 active_anon:2904kB inactive_anon:20kB active_file:688kB inactive_file:848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:520kB dirty:0kB writeback:0kB shmem:236kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[38259.594394] Normal free:17680kB min:20480kB low:24576kB high:28672kB active_anon:2904kB inactive_anon:20kB active_file:700kB inactive_file:844kB unevictable:0kB writepending:0kB present:131072kB managed:121892kB mlocked:0kB kernel_stack:552kB pagetables:340kB bounce:0kB free_pcp:300kB local_pcp:140kB free_cma:0kB
[38259.622145] lowmem_reserve[]: 0 0
[38259.625389] Normal: 144*4kB (UM) 258*8kB (UM) 177*16kB (UM) 92*32kB (UM) 44*64kB (UM) 10*128kB (UM) 2*256kB (M) 1*512kB (U) 4*1024kB (UM) 0*2048kB 0*4096kB = 17632kB
[38259.640225] 458 total pagecache pages
[38259.643868] 0 pages in swap cache
[38259.647094] Swap cache stats: add 0, delete 0, find 0/0
[38259.652410] Free swap  = 0kB
[38259.655179] Total swap = 0kB
[38259.658146] 32768 pages RAM
[38259.660895] 0 pages HighMem/MovableOnly
[38259.664767] 2295 pages reserved
[38259.667878] Tasks state (memory values in pages):
[38259.672596] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[38259.681237] [    602]    81   602      321       25    16384        0             0 ubusd
[38259.689420] [    603]     0   603      235        9    12288        0             0 askfirst
[38259.697865] [    637]     0   637      261       12    12288        0             0 urngd
[38259.706044] [   1040]   514  1040      316       33    20480        0             0 logd
[38259.714154] [   1092]     0  1092      565      105    16384        0             0 rpcd
[38259.722293] [   1501]     0  1501      287       12    16384        0             0 dropbear
[38259.730659] [   1662]     0  1662      435      114    16384        0             0 netifd
[38259.738927] [   1719]     0  1719      373       55    20480        0             0 odhcpd
[38259.747193] [   1884]     0  1884     1013       74    24576        0             0 uhttpd
[38259.755461] [   2104]     0  2104      331       28    20480        0             0 sh
[38259.763383] [   2692]     0  2692      314       12    16384        0             0 ntpd
[38259.771501] [   3194]   453  3194      357      119    16384        0             0 dnsmasq
[38259.779803] [   1237]     0  1237     1212       92    20480        0             0 hostapd
[38259.788242] [   1238]     0  1238     1256      115    24576        0             0 wpa_supplicant
[38259.797179] [   7031]     0  7031      331       27    20480        0             0 sh
[38259.805091] [   7032]     0  7032      284       15    20480        0             0 iw
[38259.813015] [   7033]     0  7033      315       12    20480        0             0 awk
[38259.821018] [   7034]     0  7034      346       15    12288        0             0 modprobe
[38259.829440] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=dnsmasq,pid=3194,uid=453
[38259.842197] Out of memory: Killed process 3194 (dnsmasq) total-vm:1428kB, anon-rss:120kB, file-rss:352kB, shmem-rss:4kB, UID:453 pgtables:16kB oom_score_adj:0
[38259.865772] oom_reaper: reaped process 3194 (dnsmasq), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[38259.876770] logd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[38259.884780] CPU: 1 PID: 1040 Comm: logd Tainted: G        W         5.4.154 #0

As a functional 802.11s mesh is a layer 2, dynamic mac-routing network. In simple terms it functions like a virtual switch. The mesh builds internal mac-routing tables and you can view each meshnode's table using:
iw dev mesh0 mpath dump
and
iw dev mesh0 mpp dump
assuming the mesh interface is mesh0.

A dhcp server sits listening for layer 2 requests from devices wanting a layer 3 ip address and responds with an appropriate set of information including an ip address, ip gateway address, dns server address, etc.

In a typical "home" mesh, there will only be one dhcp server and it will reside on the internet gateway router - lets call this the "master" meshnode.

You mean static ip address leases, but yes, this is the best way to do it for numerous reasons.

Laptops, ipads, iphones etc all use a browser and all browsers cache previously resolved dns requests. So if dns stops working for some reason, they will still work for urls already in the browser cache. This can give the illusion that dns is working.....

It is normal to configure the Internet connected router to relay dns for all the local clients. You can set numerous external dns servers that will be tried. Dhcp should supply only the dhcp server's ip address.

You can then test if dns is working from a meshnode by doing something like:
nslookup openwrt.org

You should get something like:

Server:		10.187.1.1
Address:	10.187.1.1#53

Non-authoritative answer:
Name:	openwrt.org
Address: 139.59.209.225
Name:	openwrt.org
Address: 2a03:b0c0:3:d0::1af1:1

where the first two lines will be for the actual dns server in the Internet connected router in your case (this is an example from my test system).

Are you sure that your mesh network has fully converged, with full mesh-routing tables on every meshnode?

If it has not fully converged, then any layer 3 running on the incomplete layer 2 mesh can and will give strange results, if it works at all.
Check the mac-routing tables with mpath and mpp dump commands above.

Yes a bad thing.

  1. The HH5a has 128MB ram so not so bad.
  2. If you are using the 5GHz radio, the ath10k drivers are notorious for consuming vast amounts of ram, so not good.
  3. If you are using the built in vdsl modem, it too will use loads of memory, so not so good.
  4. Your log shows that dnsmaq requested some memory and failed - there was not enough left. Unlikely dnsmasq's fault - something else consumed it all most likely.

If I remember correctly, you are using 5GHz radio for the mesh network.
In this case, 128MB ram is not enough, and there is very little you can do about it unless I am mistaken. You will find all up to date 2GHz/5GHz routers have at least 250MB ram.
The HH5a is obsolete for a reason.
Some older devices use ath9k for the 2GHz radio and ath10k for the 5GHz. In these devices you can disable the 5GHz and use only the 2GHz for both AP and mesh, thus saving a very significant amount of ram.......

1 Like

Hi bluewavenet

thanks for your comments re 5GHz and RAM - yes you are correct I'm using hh5a with a 5GHz mesh and 2.4G AP but to compound the problem I have recently set an AP as well as a mesh on 5GHz !

This is likely what caused the lack of RAM.

I think I'll run both the mesh and AP on 2.4GHz to see if that helps the RAM situation - longer term I'll likely upgrade the hardware as these HH5a units only cost me £3.50 from the web so no big deal and they've been useful for learning.

Many thanks again for helping me out.

Regards Tim

Again thank you for taking the time to write such a detailed reply - very useful for a network novice like myself.

As I mentioned in my other post about the HH5a and lack of ram etc - I suspect this maybe the cause of some if not all my stability issues

The output from
iw dev mesh0 mpath dump and iw dev mesh0 mpp dump

all looks ok which is good and likewise nsloopup gave the exact results as your example.

I'll leave the mesh running again with AP on 2.4 and mesh on 5GHZ (but no 5G AP now) to see how things go before turning off the 5GHz radio and running both mesh and ap on 2.4GHz which may lead to other problems.

Right, I think I'll go and investigate a better router with more ram etc.

Regards Tim

If you're running your mesh on 5GHz, be sure that you're not using a DFS channel (you can see a list of DFS channels in this article). If you are using DFS, you could very likely have unexpected shutdowns of the 5G radio as a result of a positive radar "hit" on your channel.

Thanks for the radar tip - I'll check I don't have that problem, currently using channel 36.

Regards Tim

Channel 36 is not a DFS channel unless you are in China or South Africa. So, if you are in any other part of the world, the 5GHz radio should not be subject to DFS related issues.