WDS Configuration on 21.02 - Atheros Specific WDS Issue

No worries. We all try to help :slightly_smiling_face:

The problem is on ath9k as well, not just 10k.
Even though the tplink devices I've seen the problem on are still using swconfig, I strongly suspect it's something to do with the DSA changes as they are the most obvious areas that impact on bridge/vlan/packet movement. ymmv

Thanks everyone. I've decided to try compiling my own firmware to see if I can figure out which commit breaks WDS.

Unfortunately, I'm completely new to using git and compiling openwrt. Here is what I have done so far:

  • create a local git branch based off the RC3 tag
    git checkout tags/v21.02.0-rc3 -b wds_test

  • Pulled changes from the remote repository
    git fetch
    git merge
    ./scripts/feeds update -a
    ./scripts/feeds install -a

  • Setup the configuration file
    make menuconfig

  • compile firmware

What I was hoping you could help me out with is how I can easily step between different commits between RC3 and RC4. I want to start in the middle between RC3 and RC4 and see if that work. Then keep stepping 1/2 way between points where I know WDS works or doesn't. I found this but am not sure if it's the best approach:

  • git reset --hard <commit_id>

I don't know if this will reference the 21.02 branch or back to the master branch.

Thanks

There's no need to reset to a specific commit. You can use:

$ git checkout $commit_id

Which will return:

$ git checkout 134ac824c5
Note: switching to '134ac824c5'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 134ac824c5 OpenWrt v21.02.0-rc4: adjust config defaults

As you can see, git itself already tells you how to revert back to the default state (HEAD) of the branch. It won't be switching branches without you telling it so explicitly.

Edit: @chadneufeld What you could also try (and which would be more fine-grained) is reverting specific commits, build an image, see if that helps. As easy as:

$ git revert 134ac824c5

If it turns out that doesn't do anything, you can do a:

$ git reset --hard HEAD~1

That might return you to 21.02 HEAD instead of the 21.02.0 tag (not sure), but at this point changes beyond 21.02.0 are minimal (one mvebu commit which doesn't concern your devices).

After some trial and error I think I have found the offending commit. There are 4 NETIFD commits that happened between RC3 and RC4. The earliest one, fe498dd3f1, causes issues with the stability of the WDS link on Ath9k for me. I wasn't able to rollback just that commit, I had to rollback all 4 commits. GIT threw an error if I tried to rollback fe498dd3f1 without rolling back the other 3 commits.

55d9c020a1 netifd: update to the latest version
089efd61e9 netifd: update to the latest version
f3f70fb956 netifd: update to the latest version
fe498dd3f1 netifd: update to the latest version

I think these are the individual fixes within the larger fe498dd3f1 commit. @Borromini, is there a way that I could work with these smaller commits inside of the main commit?

61a71e5e49c3 bridge: dynamically create vlans for hotplug members
cb6ee9608e10 bridge: fix dynamic delete of hotplug vlans
7f199050f395 wireless: pass the real network ifname to the setup script
50381d0a2998 bridge: allow adding/removing VLANs to configured member ports via hotplug
f12b073c0cc3 wireless: add some comments to functions
b0d090688302 bridge: fix setting pvid for updated vlans
ff3764ce28e0 device: move hotplug handling logic from system-linux.c to device.c
16bff892f415 ubus: add a dummy mode ubus call to simulate hotplug events
7f30b02013f2 examples: make dummy wireless vif names shorter
013a1171e9b0 device: do not treat devices with non-digit characters after . as vlan devices
f037b082923a wireless: handle WDS per-sta devices
db0fa24e1c17 bridge: fix enabling hotplug-added VLANs on the bridge port
4e92ea74273f bridge: bring up pre-existing vlans on hotplug as well
1f283c654aeb bridge: fix hotplug vlan overwrite on big-endian systems
2 Likes

Glad you narrowed it down! It's pretty normal you can't roll back just the one commit, since those build on the previous ones. When you check the commit log, you'll see the commits in the external git tree that are part of the bump:

$ git show fe498dd3f1
commit fe498dd3f108de494594ae8e0eba207fdbf14594
Author: Felix Fietkau <nbd@nbd.name>
Date:   Fri Jun 4 09:11:37 2021 +0200

    netifd: update to the latest version
    
    61a71e5e49c3 bridge: dynamically create vlans for hotplug members
    cb6ee9608e10 bridge: fix dynamic delete of hotplug vlans
    7f199050f395 wireless: pass the real network ifname to the setup script
    50381d0a2998 bridge: allow adding/removing VLANs to configured member ports via hotplug
    f12b073c0cc3 wireless: add some comments to functions
    b0d090688302 bridge: fix setting pvid for updated vlans
    ff3764ce28e0 device: move hotplug handling logic from system-linux.c to device.c
    16bff892f415 ubus: add a dummy mode ubus call to simulate hotplug events
    7f30b02013f2 examples: make dummy wireless vif names shorter
    013a1171e9b0 device: do not treat devices with non-digit characters after . as vlan devices
    f037b082923a wireless: handle WDS per-sta devices
    db0fa24e1c17 bridge: fix enabling hotplug-added VLANs on the bridge port
    4e92ea74273f bridge: bring up pre-existing vlans on hotplug as well
    1f283c654aeb bridge: fix hotplug vlan overwrite on big-endian systems

I'm betting your issue is caused by one of the wireless changes, the repo is here. A final test would be to build directly from the git repo instead of using the git tarball the build environment pulls. At this point, you can just recompiled the single netifd package and upgrade/downgrade it with opkg to your liking. No need to keep reflashing firmwares.

My money would be the f037b082923a netifd commit. I've tried myself to revert just that in the netifd tree but it won't let me, so its time to call in the cavalry. I'd recommend you try IRC and ping @nbd there to see if he can assist, he committed fe498dd3f1 and did a lot of work on refactoring the networking code. Not sure if he replies on/reads the forum, and IRC makes for an easier back and forth (more realtime than a forum).

I'll include a quick write-up for completeness' sake on how to build straight from the git tree. It might expedite things if you already test a few commits.

  1. Enable CONFIG_SRC_TREE_OVERRIDE in the buildroot settings.
  2. Grab the netifd git tree and link it into your OpenWrt buildroot as follows:
$ git clone git://git.openwrt.org/project/netifd.git
$ cd path/to/21.01_tree/
$ ln -sv /path/to/netifd_git_tree package/network/config/netifd/git-src/
  1. Compile netifd
    $ make package/netifd/clean,compile} V=s

  2. Install new netifd package on your router and test.

If you need to test a specific earlier commit, extend 2. with the following commands:

$ cd /path/to/netifd_git_tree
$ git checkout $git_hash

To revert a specific commit:

$ git revert f037b082923a

After that you can run 3. again. What I'd recommend is you build from netifd source before prodding nbd (he'll probably ask you to do so anyway so you can test patches at some point). Commit 013a117 is the commit prior to f037b08 (the WDS change), so if you check out that first commit, build from there, and it works, then build an image from commit db0fa24 (the commit after), you can be pretty sure the WDS one is the one causing trouble.

1 Like

This might be a super simple answer, but how do I enable CONFIG_SRC_TREE_OVERRIDE?

Thanks

.... Found it

@Borromini Thanks again for all the help. I think I have the NETIFD git tree setup and compiling with my openwrt main tree.

I'll reach out to nbd and see if we can sort it out.

2 Likes

I did some more detailed testing with the NETIFD package. I think the problem started with the WDS specific commit in NETIFD. @nbd or @Borromini would either of you be able to help trouble shoot?

This commit added a wireless_device_hotplug_event among some other changes.

f037b082923abc2dad0d14c8401ebe0afd816b5c wireless: handle WDS per-sta devices

Some more info... I went through the NETIFD code and manually undid the changes from f037b082923abc2dad0d14c8401ebe0afd816b5c related to the wireless_device_hotplug_event.

After undoing just those changes, WDS seems to work perfectly with the updated NETIFD (minus hotplug changes) and 21.02.0.

2 Likes

I cannot help you debug this, I can barely even read C code. Try pinging nbd on openwrt-devel on IRC and link to the bug report.

2nd post this thread

Quick update. There is a patch available for NETIFD that I am testing. So far it looks like the patch fixes the WDS connection issue. I am running 21.02.0 (official release) with an updated NETIFD package.

6 Likes

According to https://bugs.openwrt.org/index.php?do=details&task_id=3961 the fix is in the snapshots.
Could I therefore just download the netifd package from snapshot and install it on 21.02?
e.g. from https://downloads.openwrt.org/snapshots/packages/mipsel_24kc/base/

You should be able to, yes.

Hi @chadneufeld , hi community,

on some postings you write you do run the actual 21.02.0 (official release) with an updated NETIFD package. I am kind of new to OpenWrt but an experienced linux user. Could you tell us the procedure on how to update NETIFD to the desired version while keeping official 21.02.0 firmware untouched? Please give advice on how to update NETIFD on 21.02.0 on TP-Link Archer C7 v5 Routers.

Thank you very much in advance!
Raphael

I have a similar problem between 2 * Tenbay T-MB5EU-V01 ( Master & Client WDS ) MT7915
in 2.4 Ghz no problem

in 5Ghz ( Wifi AX or AC ) error : wlan1: failed to insert STA entry for the AP (error -22)

@raphael Just read what fireburner posted, two posts above yours, and then my answer, one post above yours.

With Openwrt problem between 2 * Tenbay T-MB5EU-V01 ( Master & Client WDS ) MT7915
in 2.4 Ghz no problem
in 5Ghz ( Wifi AX or AC ) error : wlan1: failed to insert STA entry for the AP (error -22)

With 2 * X-WRT same problem

Between OpenWrt & X-WRT not this problem

@Borromini fireburner's post was about mipsel_24kc architecture. 4-5days ago i could not find an up to date package for "TP-Link Archer C7 v5" routers. But am i right that the package under this link https://downloads.openwrt.org/snapshots/packages/arc_arc700/base/netifd_2021-10-21-f78bdec2-1_arc_arc700.ipk is for Archer C7 architecture?

if so i guess the command to update the NETIFD ipk package is:

opkg install --force-reinstall packagename.ipk

i am not with the devices yet, but would be great to solve this tomorrow evening.

Kind regards,
Raphael