Mac80211: 802.11s TCP/IP Connectivity on "master" Failing After 2018-09


#1

Edit: TL;DR -- 802.11s mesh appears to not transport TCP/IP even though the mesh appears up, for commits on master after late September, 2018. Early portions of this thread are around identifying the "first bad" commit. Later identified as "mac80211: update to version based on 4.19-rc4", commit db90c243a0 on master. This commit does not appear to be present in openwrt-18.06.

Diagnosis of issue begins at Post 9 of this thread.


I'm trying to bisect to find a problem related to 802.11s that seems introduced sometime shortly after

commit e9d92bf1e1af71ff19e4cdc753de3f65963c58a5
Author: Koen Vandeputte <redacted>
Date:   Wed Sep 26 12:53:35 2018 +0200

unfortunately, the source for backports-v4.18.5.tar.xz seems to have a different value than that indicated in the Makefile. Has anyone run into this before?

make[2]: Entering directory '/home/jeff/devel/openwrt/package/kernel/mac80211'
mkdir -p /home/jeff/devel/openwrt/dl
SHELL= flock /home/jeff/devel/openwrt/tmp/.backports-v4.18.5.tar.xz.flock -c '  	/home/jeff/devel/openwrt/scripts/download.pl "/home/jeff/devel/openwrt/dl" "backports-v4.18.5.tar.xz" "9c13660e98b9397260266f98c9db76bdad2b48462cb376b5862dfbd18369edf2" "" "http://mirror2.openwrt.org/sources"    '
+ curl -f --connect-timeout 20 --retry 5 --location --insecure http://mirror2.openwrt.org/sources/backports-v4.18.5.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404 Not Found
Download failed.
+ curl -f --connect-timeout 20 --retry 5 --location --insecure https://sources.lede-project.org/backports-v4.18.5.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6577k  100 6577k    0     0  1536k      0  0:00:04  0:00:04 --:--:-- 1538k
Hash of the downloaded file does not match (file: a0c272d29b415f34e3f841d8f6ab444ebd3046e49e2717934a7b11b28953ebfe, requested: 9c13660e98b9397260266f98c9db76bdad2b48462cb376b5862dfbd18369edf2) - deleting download.
+ curl -f --connect-timeout 20 --retry 5 --location --insecure https://mirror2.openwrt.org/sources/backports-v4.18.5.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404 Not Found
Download failed.
+ curl -f --connect-timeout 20 --retry 5 --location --insecure https://downloads.openwrt.org/sources/backports-v4.18.5.tar.xz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   184  100   184    0     0    272      0 --:--:-- --:--:-- --:--:--   272
100 6577k  100 6577k    0     0  1357k      0  0:00:04  0:00:04 --:--:-- 1720k
Hash of the downloaded file does not match (file: a0c272d29b415f34e3f841d8f6ab444ebd3046e49e2717934a7b11b28953ebfe, requested: 9c13660e98b9397260266f98c9db76bdad2b48462cb376b5862dfbd18369edf2) - deleting download.
No more mirrors to try - giving up.
Makefile:1936: recipe for target '/home/jeff/devel/openwrt/dl/backports-v4.18.5.tar.xz' failed

#2

It's not a question of hashsums, but that the file in question has never been uploaded to OpenWrt's source mirror. Master switched directly from backports-2017-11-01 (~4.14) to a 4.19 based backports code base, 4.18.x based snapshots were only use briefly in two developers' staging trees (and yes, the tarballs were available back then), before moving on to 4.19 as the next LTS kernel. 4.18 based tarballs being part of master's git history has most likely been an oversight (patches not squashed); for practical purposes the easiest approach would be to skip the affected commits (there aren't intermediate steps between 4.14 and 4.18 at all).


#3

https://sources.lede-project.org/backports-v4.18.5.tar.xz seems to exist though, although you will have to 'fix' the hash for that file (as it indeed doesn't match).


#4

As I'm seeing my name pop-up in your explanation,
could you share what the problem is exactly?

Thanks,

Koen


#5

Ah, OK, saw the name, so now I know the context -- the referenced commit e9d92bf1e1 is a good commit, from my testing! The "puzzling" tarball was committed after that commit as well.

I'll likely open a thread for it and/or query the dev mailing list, but the short of it is that 802.11s TCP/IP transport seems to have been "broken" in master, after commit e9d92bf1e1 was merged, by

commit db90c243a0b9bd72fc691cd09e58a96ac2a452cf
Author: Hauke Mehrtens <redacted>
Date:   Sun Sep 23 18:02:35 2018 +0200

    mac80211: update to version based on 4.19-rc4

confirmation of this through git bisect needed to use the just-prior commits, which were trying to use the the v4.18.5 tarball. Changing the hash allowed confirmation of the "update to version based on 4.19-rc4" commit as the "first bad" commit.

diff --git a/package/kernel/mac80211/Makefile b/package/kernel/mac80211/Makefile
index fcb7181655..66401984b8 100644
--- a/package/kernel/mac80211/Makefile
+++ b/package/kernel/mac80211/Makefile
@@ -13,7 +13,8 @@ PKG_NAME:=mac80211
 PKG_VERSION:=v4.18.5
 PKG_RELEASE:=1
 PKG_SOURCE_URL:=http://mirror2.openwrt.org/sources
-PKG_HASH:=9c13660e98b9397260266f98c9db76bdad2b48462cb376b5862dfbd18369edf2
+# PKG_HASH:=9c13660e98b9397260266f98c9db76bdad2b48462cb376b5862dfbd18369edf2
+PKG_HASH := a0c272d29b415f34e3f841d8f6ab444ebd3046e49e2717934a7b11b28953ebfe
 
 PKG_SOURCE:=backports-$(PKG_VERSION).tar.xz
 PKG_BUILD_DIR:=$(KERNEL_BUILD_DIR)/backports-$(PKG_VERSION)

The symptom under examination is that builds off master from early September, 2018, function properly with respect to establishment and use of 802.11s as either connectivity and routing, or as connectivity with batman-adv for routing. Builds off the "update to version based on 4.19-rc4" commit and later establish the 802.11s mesh, but there is no connectivity at the TCP/IP layer with either 802.11s routing, or batman-adv routing. The output of iw dev mesh0 station dump between a "good" and "bad" build doesn't provide any hints, as the mesh itself appears connected in both cases.

	mesh plink:	ESTAB
	mesh local PS mode:	ACTIVE
	mesh peer PS mode:	ACTIVE
	mesh non-peer PS mode:	ACTIVE
	authorized:	yes
	authenticated:	yes
	associated:	yes

Setting up authenticated mesh with wpad-mesh
#6

Thanks for the detailed explanation.

I notice batman-adv was updated to v2019.0 2 days ago.
Does the issue still occur in this version?


#7

It occurs with "plain" 802.11s with 802.11s routing as well, but with batman-adv present on another interface. I've been using the same "pull" of the packages and .config (as close as possible across 1200 commits) and batman-adv is at 2018.4 right now.

Thanks for the info. I'll definitely check 2019.0 as well as working down my running config to a minimal, testable case.


#8

I forgot to ask, is the wifi hardware in use ath9k or ath10k?
Could you share some more details about the used hardware setup? (board, wifi radio's, ...)

This should help a lot in trying to get it fixed.

Thanks again,


#9

Was hoping to get more details and confirm things in greater detail (that git bisect of something like 20 build/test cycles drained me last night, UTC+8), but this is what I've got so far:

Archer C7 v2, ath10k (non-CT), ath79 target

Setup functioning well since last build in early September with four units in mesh on 5 GHz and batman-adv routing over the mesh.

  • iw dev mesh0 station dump and iw dev mesh1 station dump suggest mesh is up in both cases
  • batctl n sees the peers in both cases
  • ping to a peer's mesh0 IPv4 address succeeds with "earlier" builds, fails with "later" builds
  • ping to a peer's IPv4 address on a bat0 bridge succeeds with "earlier" builds, fails with "later" builds

When ping succeeds, arp -a shows the MAC and IPv4 address. When ping fails, arp -a shows an unresolved (all zeros) MAC address.

Edit: Routing tables are the same in both cases.

Edit: Confirmed similar failure mode on master-based ath79 WIP for GL.iNet AR750S, QCA9887 and non-CT driver/firmware (802.11s only and no batman-adv). CT driver/firmware not yet able to support mesh. Looking into "backing up" to the last-known-good commit and testing there. The need for "rawmode" from the CT driver/firmware hints that the mesh connectivity may be handled differently than TCP/IP traffic.

Edit: Confirmed that the newer (newest?) firmware (ver 10.2.4-1.0-00041) works on the "old" build.


config wifi-device 'radio5'
        option type 'mac80211'
        option channel '149'
        option hwmode '11a'
        option path 'pci0000:00/0000:00:00.0'
        option htmode 'VHT80'
        option require_mode 'n'

config wifi-iface 'mesh0'
        option device 'radio5'
        option ifname 'mesh0'
        option network 'nwi_mesh0'
        option mode 'mesh'
        option mesh_id '<redacted id 0>'
        option mesh_fwding '1'
        option encryption 'psk2+ccmp'
        option key '<redacted key 0>'

config wifi-iface 'mesh1'
        option device 'radio5'
        option ifname 'mesh1'
        option network 'nwi_mesh1'
        option mode 'mesh'
        option mesh_id '<redacted id 1>'
        option mesh_fwding '0'
        option encryption 'psk2+ccmp'
        option key '<redacted key 1>'
config interface 'nwi_mesh0'
        option ifname 'mesh0'
        option mtu '2304'
        option proto 'static'
        option ipaddr '172.xx.yy.zz'
        option netmask '255.255.255.0'

config interface 'nwi_mesh1'
        option ifname 'mesh1'
        option mtu '2304'
        option proto 'batadv'
#       option routing_algo 'BATMAN_V'
        option mesh 'bat0'

bat0 has various VLAN sub-interfaces bridged with IPv4 addresses assigned to those bridges. For example:

config interface 'vlanNNNN'
        option type 'bridge'
        option stp '1'
        option ifname 'eth1.NNNN bat0.NNNN'
        option proto 'static'
        option ipaddr 'xx.yy.zz.kk'
        option netmask '255.255.255.0'
        option delegate '0'

jeff@office:~$ lsmod
ath                    18931  5 ath9k,ath9k_htc,ath9k_common,ath9k_hw,ath10k_core
ath10k_core           319481  1 ath10k_pci
ath10k_pci             24591  0 
ath9k                 100088  0 
ath9k_common           11949  2 ath9k,ath9k_htc
ath9k_htc              55715  0 
ath9k_hw              348460  3 ath9k,ath9k_htc,ath9k_common
batman_adv            157157  0 
cfg80211              231008  7 ath9k,ath9k_htc,ath9k_common,batman_adv,ath10k_core,ath,mac80211
compat                  2850  6 ath9k,ath9k_htc,ath9k_common,ath10k_pci,mac80211,cfg80211
crc_ccitt               1035  0 
crc16                   1031  2 batman_adv,ext4
crc32c_generic          1424  1 
crypto_hash            10002  4 libcrc32c,ext4,jbd2,crc32c_generic
ehci_hcd               35575  1 ehci_platform
ehci_platform           5072  0 
ext4                  382832  0 
gpio_button_hotplug     6416  0 
ip_tables              10029  0 
ip6_tables              9921  0 
jbd2                   51362  1 ext4
ledtrig_usbport         2784  0 
libcrc32c                663  1 batman_adv
mac80211              448654  3 ath9k,ath9k_htc,ath10k_core
mbcache                 3182  1 ext4
nf_conntrack           56440  5 nft_ct,nf_conntrack_ipv6,nf_conntrack_ipv4,nf_flow_table,nf_conntrack_rtcache
nf_conntrack_ipv4       4368  0 
nf_conntrack_ipv6       4608  0 
nf_conntrack_rtcache    2640  0 
nf_defrag_ipv4          1078  1 nf_conntrack_ipv4
nf_defrag_ipv6          4862  1 nf_conntrack_ipv6
nf_flow_table          13951  1 nf_flow_table_hw
nf_flow_table_hw        2160  0 
nf_reject_ipv4          2147  3 nft_reject_ipv4,nft_reject_inet,nft_reject_bridge
nf_reject_ipv6          2472  3 nft_reject_ipv6,nft_reject_inet,nft_reject_bridge
nf_tables              68617 22 nft_set_rbtree,nft_set_hash,nft_reject_ipv6,nft_reject_ipv4,nft_reject_inet,nft_reject_bridge,nft_reject,nft_quota,nft_numgen,nft_meta_bridge,nft_meta,nft_log,nft_limit,nft_exthdr,nft_ct,nft_counter,nft_chain_route_ipv6,nft_chain_route_ipv4,nf_tables_ipv6,nf_tables_ipv4,nf_tables_inet,nf_tables_bridge
nf_tables_bridge        1072  0 
nf_tables_inet           752  0 
nf_tables_ipv4           592  0 
nf_tables_ipv6           656  0 
nfnetlink               4487  1 nf_tables
nft_chain_route_ipv4     816  0 
nft_chain_route_ipv6    1008  0 
nft_counter             1744  0 
nft_ct                  4432  0 
nft_exthdr              3312  0 
nft_limit               3472  0 
nft_log                 1584  0 
nft_meta                3991  1 nft_meta_bridge
nft_meta_bridge          944  0 
nft_numgen              1632  0 
nft_quota               1712  0 
nft_reject              1057  4 nft_reject_ipv6,nft_reject_ipv4,nft_reject_inet,nft_reject_bridge
nft_reject_bridge       3728  0 
nft_reject_inet         1136  0 
nft_reject_ipv4          656  0 
nft_reject_ipv6          656  0 
nft_set_hash           13424  0 
nft_set_rbtree          2800  0 
nls_base                5152  1 usbcore
usb_common              2551  1 usbcore
usbcore               131363  4 ath9k_htc,ledtrig_usbport,ehci_platform,ehci_hcd
x_tables               13839  2 ip6_tables,ip_tables


Output of custom build-time script that captures feed information:

----- feeds/luci/ -----

2019-02-03 11:45:51 +0200
9f520b48d (HEAD -> master, origin/master, origin/HEAD)

    Merge pull request #2505 from aparcar/master

(clean)

Best remote: https://git.openwrt.org/project/luci.git

At local ref to origin/master

----- feeds/packages/ -----

2019-02-03 15:38:32 +0000
e5910b983 (HEAD -> master, origin/master, origin/HEAD)

    bcp38: Allow class-e through bcp38

(clean)

Best remote: https://git.openwrt.org/feed/packages.git

At local ref to origin/master

----- feeds/routing/ -----

2019-01-31 19:08:51 +0100
13a4dad (HEAD -> master, origin/master, origin/HEAD)

    Merge pull request #445 from diizzyy/patch-1

(clean)

Best remote: https://git.openwrt.org/feed/routing.git

At local ref to origin/master

----- feeds/telephony/ -----

2019-01-27 16:51:41 +0100
6c2b619 (HEAD -> master, origin/master, origin/HEAD)

    Merge pull request #408 from micmac1/dahdi-tools-execinfo2

(clean)

Best remote: https://git.openwrt.org/feed/telephony.git

At local ref to origin/master

.config is my "production" build

jeff@deb-devel:~/devel/openwrt$ ./scripts/diffconfig.sh 
CONFIG_TARGET_ath79=y
CONFIG_TARGET_ath79_generic=y
CONFIG_TARGET_ath79_generic_DEVICE_tplink_archer-c7-v2=y
CONFIG_DEVEL=y
CONFIG_BUSYBOX_CUSTOM=y
CONFIG_ATH10K_LEDS=y
# CONFIG_ATH9K_UBNTHSR is not set
CONFIG_BATMAN_ADV_BATMAN_V=y
CONFIG_BATMAN_ADV_BLA=y
CONFIG_BATMAN_ADV_DAT=y
CONFIG_BATMAN_ADV_MCAST=y
CONFIG_BUILD_LOG=y
CONFIG_BUSYBOX_CONFIG_FEATURE_EDITING_SAVEHISTORY=y
CONFIG_BUSYBOX_CONFIG_FEATURE_REVERSE_SEARCH=y
CONFIG_BUSYBOX_CONFIG_FEATURE_USERNAME_COMPLETION=y
# CONFIG_BUSYBOX_CONFIG_FIND is not set
# CONFIG_BUSYBOX_CONFIG_LESS is not set
CONFIG_CCACHE=y
CONFIG_LIBCURL_COOKIES=y
CONFIG_LIBCURL_FILE=y
CONFIG_LIBCURL_FTP=y
CONFIG_LIBCURL_HTTP=y
CONFIG_LIBCURL_NO_SMB="!"
CONFIG_LIBCURL_OPENSSL=y
CONFIG_LIBCURL_PROXY=y
CONFIG_OPENSSL_WITH_DEPRECATED=y
CONFIG_OPENSSL_WITH_EC=y
CONFIG_OPENSSL_WITH_NPN=y
CONFIG_OPENSSL_WITH_PSK=y
CONFIG_OPENSSL_WITH_SRP=y
CONFIG_PACKAGE_ALFRED_VIS=y
CONFIG_PACKAGE_alfred=y
CONFIG_PACKAGE_ath10k-firmware-qca988x=y
CONFIG_PACKAGE_ath10k-firmware-qca988x-ct=m
CONFIG_PACKAGE_ath10k-firmware-qca988x-ct-htt=m
CONFIG_PACKAGE_ath9k-htc-firmware=y
CONFIG_PACKAGE_batctl=y
CONFIG_PACKAGE_block-mount=y
CONFIG_PACKAGE_build-details=y
CONFIG_PACKAGE_ca-bundle=y
CONFIG_PACKAGE_diffutils=y
# CONFIG_PACKAGE_dnsmasq is not set
# CONFIG_PACKAGE_dropbear is not set
CONFIG_PACKAGE_findutils-find=y
# CONFIG_PACKAGE_firewall is not set
CONFIG_PACKAGE_git=y
CONFIG_PACKAGE_glib2=y
CONFIG_PACKAGE_htop=y
CONFIG_PACKAGE_ip-full=y
# CONFIG_PACKAGE_ip6tables is not set
# CONFIG_PACKAGE_iptables is not set
CONFIG_PACKAGE_kmod-ath10k=y
CONFIG_PACKAGE_kmod-ath10k-ct=m
CONFIG_PACKAGE_kmod-ath9k-htc=y
CONFIG_PACKAGE_kmod-batman-adv=y
CONFIG_PACKAGE_kmod-crypto-crc32c=y
CONFIG_PACKAGE_kmod-crypto-hash=y
CONFIG_PACKAGE_kmod-fs-ext4=y
CONFIG_PACKAGE_kmod-hwmon-core=m
# CONFIG_PACKAGE_kmod-ip6tables is not set
# CONFIG_PACKAGE_kmod-ipt-conntrack is not set
# CONFIG_PACKAGE_kmod-ipt-core is not set
# CONFIG_PACKAGE_kmod-ipt-nat is not set
# CONFIG_PACKAGE_kmod-ipt-offload is not set
CONFIG_PACKAGE_kmod-lib-crc16=y
CONFIG_PACKAGE_kmod-lib-crc32c=y
# CONFIG_PACKAGE_kmod-nf-nat is not set
CONFIG_PACKAGE_kmod-nfnetlink=y
CONFIG_PACKAGE_kmod-nft-bridge=y
CONFIG_PACKAGE_kmod-nft-core=y
# CONFIG_PACKAGE_kmod-ppp is not set
CONFIG_PACKAGE_less=y
CONFIG_PACKAGE_libattr=y
CONFIG_PACKAGE_libcares=y
CONFIG_PACKAGE_libcurl=y
CONFIG_PACKAGE_libdbi=y
CONFIG_PACKAGE_libffi=y
# CONFIG_PACKAGE_libip4tc is not set
# CONFIG_PACKAGE_libip6tc is not set
CONFIG_PACKAGE_libmnl=y
CONFIG_PACKAGE_libmosquitto-ssl=y
CONFIG_PACKAGE_libncurses=y
CONFIG_PACKAGE_libnftnl=y
CONFIG_PACKAGE_libopenssl=y
CONFIG_PACKAGE_libpcap=y
CONFIG_PACKAGE_libpcre=y
CONFIG_PACKAGE_librt=y
CONFIG_PACKAGE_libuuid=y
# CONFIG_PACKAGE_libxtables is not set
# CONFIG_PACKAGE_logd is not set
CONFIG_PACKAGE_mosquitto-client-ssl=y
CONFIG_PACKAGE_nftables=y
# CONFIG_PACKAGE_odhcp6c is not set
# CONFIG_PACKAGE_odhcpd-ipv6only is not set
CONFIG_PACKAGE_openssh-client=y
CONFIG_PACKAGE_openssh-keygen=y
CONFIG_PACKAGE_openssh-server=y
CONFIG_PACKAGE_patch=y
# CONFIG_PACKAGE_ppp is not set
CONFIG_PACKAGE_procps-ng=y
CONFIG_PACKAGE_procps-ng-free=y
CONFIG_PACKAGE_procps-ng-kill=y
CONFIG_PACKAGE_procps-ng-pgrep=y
CONFIG_PACKAGE_procps-ng-pkill=y
CONFIG_PACKAGE_procps-ng-pmap=y
CONFIG_PACKAGE_procps-ng-ps=y
CONFIG_PACKAGE_procps-ng-pwdx=y
CONFIG_PACKAGE_procps-ng-skill=y
CONFIG_PACKAGE_procps-ng-slabtop=y
CONFIG_PACKAGE_procps-ng-snice=y
CONFIG_PACKAGE_procps-ng-tload=y
CONFIG_PACKAGE_procps-ng-top=y
CONFIG_PACKAGE_procps-ng-uptime=y
CONFIG_PACKAGE_procps-ng-vmstat=y
CONFIG_PACKAGE_procps-ng-w=y
CONFIG_PACKAGE_procps-ng-watch=y
CONFIG_PACKAGE_sudo=y
CONFIG_PACKAGE_syslog-ng=y
CONFIG_PACKAGE_tcpdump-mini=y
CONFIG_PACKAGE_terminfo=y
CONFIG_PACKAGE_wpad-mesh-openssl=y
# CONFIG_PACKAGE_wpad-mini is not set
CONFIG_PACKAGE_zlib=y
CONFIG_PACKAGE_kmod-lib-crc-ccitt=y
CONFIG_PACKAGE_kmod-nf-flow=y
CONFIG_PACKAGE_kmod-nf-ipt=y
CONFIG_PACKAGE_kmod-nf-ipt6=y

#10

Cross referencing bug reports on this topic


#11

@jeff

Not sure if my issue is related, my recent attempts to build a origin/master build including libremesh from source.

the mesh is up and ipv6 pings works, however no ipv4 pings respond so no internet access via anygw

Same procedure using 18.06.2/1 have no issues


#12

@jeff
Hauke's staging tree contains a bump for mac80211.

Could you give that a spin?

I tried to simulate using my custom mesh protocol with IBSS but was unable to simulate it this way.

Will continue the search on this ..

Thanks!