WNDR3800 support for more than 15 VLAN

Hello,

I believe there is a bug in the VLAN configuration for the NETGEAR WNDR3800 and more precisely a regression since a previous release.

Unfortunately, I am not able to pinpoint exactly when the issue started so let me explain:

Around 2014 (upgrade to latest release in July 2014 as a minimum working version), I add a working configuration with 2 WANs (using mwan3 package):

  • ISP1 using ADSL on WAN port, nothing fancy.
  • ISP2 using FTTH with the ONT plugged on LAN4 port (much faster but filtered, so SMTP and whatnot were routed through ISP1).

To make this work, I had to setup VLAN 835 tagged on the ONT and it worked just fine. ==> VLAN greater than 15 is possible on this device.

This configuration is no longer in use because I moved and none of the previous subscriptions are active so I cannot test if it still works.

However, in the LuCI web interface, I get the following message:

Switch rtl8366s has an unknown topology - the VLAN settings might not be accurate.

In particular, it complains about the VLAN 835 which is displayed in red and I cannot validate any modification if I keep it.

My issue is that, for my current connection, I need to setup VLAN 100 and 400 (because of the ISP, I don’t have a choice with the numbers) and I’m not sure if the configuration is wrong because I made a mistake or because OpenWRT is blocking the use of VLAN ids above 15 (as explained, I cannot test the VLAN 835 although the configuration part should still be valid).

I found 2 data sheets from Realtek and both seem to indicate that all 4096 VLANs are supported by the switch and in both cases that all 4096 table entries are possible:

While it is possible that the OpenWRT implementation only takes care of the 16 directly accessible entries, it should allow the use of VLAN IDs up to 4K.

In the latest version available in July 2014, everything was working fine.
There were probably some upgrades in between (Attitude Adjustment, Barrier Breaker, LEDE 17.01.2, OpenWRT 19.07.3 if I believe my archives) all of them accepting the VLAN 835 without complaining as far as I remember.
In the latest version as of today (23.05.3), there is an issue.

Before opening the issue on Github (I searched already open issues to no avail), I have a few questions to make sure I open it a the right place:

  1. Where is this limitation set? OpenWRT core, uci, LuCI? Source or config file? If you can point me towards the expected path (on the principle, at least), I might even be able to propose a patch.
  2. Is the limitation associated to the board (NETGEAR WNDR3800) or the chip (Realtek RTL8366S)?
  3. Is there a way to remove (at least temporarily) this limitation on a current build without recompiling to test?

Thanks,

Edit: I have just found the info about enable_vlan4k which did not appear in my previous searches but after checking, swconfig dev switch0 get enable_vlan4k gives me 1 which, I’m guessing means “enabled” so it would point towards a LuCI issue if I understand correctly.

1 Like

You are omitting ton of detail. Does swconfig show "correct" configuration?
LuCI shows error - OK, thats only indication in your whole essay.

Are ports tagged as expected?
If you backup configuration, reset device, then recreate vlan835 - does it look ok?
Now compare /etc/config/network , probably migration did not consider something?

Part of the problem is that I’m not sure where I need to look at so thanks for the pointers.

Does swconfig show "correct" configuration?

Yes. Even though the syntax is slightly different from the one in /etc/network/config, everything seems to match.

Are ports tagged as expected?

Not certain as I will need to use more machines to check (I’ll get a managed switch to mirror the interesting port next week).

If you backup configuration, reset device, then recreate vlan835 - does it look ok?

I had to wait for a time where no other user would be impacted by the reset and the result is that the web interface look very different from before. In particular, the different ports are labelled differently which seems to be related to DSA (I’ll need to dig into the changes) but vlan 835 does not seems to be rejected any more.

It is possible that my configuration was weird enough for the migration from swconfig to DSA to not happen smoothly.

I’ll have to write down every pages of configuration to recreate them with the new format.

Meanwhile, there’s no need to investigate more and I’ll post the differences when I manage to recreate the configuration under the new format in case anyone faces the same issue.

Sorry for the incomplete report.

Once you back up config it is like 10min to test, just set 83p check it is valid internet connection and reset and restore old config.

Yes, but for some users, 10m is already too much and I didn’t want to handle the commotion. :scream:

1 Like

For those who face the same kind of issue (or are simply interested in the required configuration changes), I was finally able to mostly reproduce the configuration after resetting the device.

Please keep in mind that due to the end of my subscription, I was not able to test it and mostly tried to get the LuCI interface to look the same.

A possible cause of the issue was that the upgrade included for this device a transition to DSA, associated with a pretty heavy configuration:

  • Multiple WAN connexions configured with mwan3 (which worked wonderfully, by the way).
  • Use of different VLANs depending on the ethernet port.
  • The FTTH connection was checking all weirdness I can think of to break an automated upgrade:
    • The ONT was plugged on a LAN port, therefore used as WAN.
    • The ONT required the use of a specific VLAN (tagged, of course).
    • The FTTH connection used PPPoE above all that.

The normal ethernet connections were working fine but I could not use VLAN above 15 anymore (the existing ones appeared red) and the wireless lan was out-of-service (I initially thought that the faulty power supply I just replaced killed it and didn’t associate it with the upgrade).

Below are the changes observed between both formats for relevant configuration files (other ones changed but would not have been impacted enough for the .

/etc/config/wireless

The wifi-device configuration sections for the hadware are so different that I’m not surprised it no longer worked, a reset is basically the only sane path.

The wifi-iface, on the other hand, did not change at all other than the auto-generated name as I did not create them in the same order as the original configuration.

/etc/config/firewall

For the redirect configurations, all option proto became list proto, which is quite a trivial change (and a nice simplification in some cases).

Additionally, option family 'ipv4' was added to all redirect but I’m not sure if the redirection were working (the web interface showed IPv4 so I think it was properly interpreted).

/etc/config/network

This is the main configuration file and the one where the VLAN migration failed.

The first thing I noted was that the list of ports in the Switch configuration page went from Port 1Port 5 to LAN 1LAN 4, hiding the 5th port that is not linked to anything (number 4 in the file). I think initially, the ports were in fact numbered Port 0Port 4 matching the internal numbers (that are in the reverse order : LAN 4…1, nothing, CPU) while the new LAN1LAN 4 names match the printed names on the device and LED, which is much more convenient (this alone is enough to prefer DSA).

The second thing I noted was that the name of the switch changed from rtl8366s to switch0 in the DSA-enabled version with the additional information (RTL8366S), ports: 6 (cpu @ 5), which probably is the indication that the rtl8366s name is no longer valid and why the configuration was no longer fine.

It was visible in the configuration file that the migration took place as the ifname options were already replaced with their device counterpart and _orig_ifname was put as a reminder.

Given the configuration was a bit of a mess due to the numerous attempts at adding the new VLAN 100, I wouldn’t rely too much on it but it seems that the rest of the configuration did not change much and the automated migration worked fine (except for the switch name).

Conclusion

Hopefully, it will help if anyone faces the same kind of issue but the main point is that I don’t think it’s worth the trouble converting manually the old configuration files to the new format. It’s probably easier to reset everything and then apply all configurations, even if it takes a couple hours.

One thing can be slightly faster when you have a lot of similar configurations (maclist for allow or deny lists, DHCP fixed addresses, DNS, …):

  • Manually add a few entries depending on the variations you have.
  • Make a backup of the new configuration.
  • Find the differences between old and new format (hopefully not too much).
  • Reintroduce the batch in the new backup (make a copy, just in case).
  • Import the new backup.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.