VLAN, Cannot access LuCI (but ssh fine) [Solved]

I agree that it is unusual that you don't see the VLANs in the interface. It could be that the VLAN doesn't show up if the VLAN is off on the CPU port. Try enabling the CPU (tagged) and see if it shows up.

I would stay with the release builds as your devices are mature enough not to need to be running on snapshot/master.

1 Like

That's it! Whew :laughing:.

Agreed, NP!

OK, now - re-enabled tagging on my switch (external to this device), connected to LAN1 => and VLAN 2 is all happy. LuCI works, ssh, scp, DHCP. All good ... except (there always has to be an except :stuck_out_tongue_winking_eye:). WAN is still down. Whenever VLAN is on / enabled. I did assign the new interface to the LAN (firewall zone), so that's not it. Even on the router (i.e. ssh), can't ping upstream. Hmm. Let me look deeper now at your file vs. mine, see if I see anything.

BTW, one thing I did see earlier ... your wan is eth0.1 (i.e. VLAN 1 on eth0, agreed?), whereas mine is eth1. Not sure that's an issue, but just mentioning it.

BTW, unplugged, re-plugged wan cable ... yep, it is eth1 :smiley:

[76191.732496] ess_edma c080000.edma: eth1: GMAC Link is down
[76197.972731] ess_edma c080000.edma: eth1: GMAC Link is up with phy_speed=1000

Try tagging vlan1 on the cpu.

Nope - and it works as expected (even an old dog like me can learn a trick or two :laughing:) => I lose access on untagged ports, but I still have VLAN 2 ... and that access works great.

Dang it! It's so close - just the traffic not getting out / to the wan interface. And I really do think it's on the router - I say that because I checked the ARP table for that port on my switch. Empty ... so no traffic going there (from the router wan port).

Thanks!

OK, tried wide open firewall ... nope. Static IP, nope. ARP on the switch => never any traffic.

Then I found this link! This is very much what I am seeing. But a long thread, lots of back and forth - need to wrap my head around it, see if this helps :laughing:

Thanks!

Now that you have the basics down, can you try replicating this on your other device? I would be very surprised if the same issue would be present in both architectures (they use different chipsets), be it hardware or software.

I'd recommend using LuCI to configure the VLANs to ensure that the syntax is right. Start with a clean install of 19.07.8. Then visit the switch page and add a VLAN (connect it to the CPU as tagged, set it untagged on one of the ports and turn the default VLAN off on that port). Then add a new network interface with a static IP that doesn't overlap with your other networks (including upstream) and attach it to the new VLAN interface, setup your DHCP server on that new network, and associate the new network with the LAN firewall zone. Then test and see what happens.

1 Like

Will give it a try, thanks!

OK, some good news :smiley:

Close ... LOL. I did re-flash clean, but had to use 21.02. Long story there, but I'm rolling my own, and have an issue going all the way back to 19.07 (files have moved around, can't really apply my latest kernel fixes). But that's OK, because ...

Again, sort of :slight_smile:. I did use LuCI, like you suggest - but in 21.02, for this router, no switch page (DSA router, right?). Rather, I went to Devices ... created two VLAN's. VLAN 1 on Port 3 and 4 (untagged), and VLAN 250 on Port 1 and 2 (tagged) => keep them isolated.

First test, VLAN 1 ... only Port 3 and 4 included, and untagged. Connected to my PC (untagged, access port, like you said) ... works! But a slight note - I had to modify the lan interface, to point to this new VLAN (i.e. br-lan.1, vs br-lan). Agreed? At least to me it makes sense to do that.

Yep. Next up, new interface, connected to br-lan.250. Similar to above, have to connect the interface to the VLAN device (i.e. br-lan.250) ... set up the static IP (different subnet), DHCP on. Connect to my switch (tagged traffic to this port) - and it works! And LuCI is happy, also ssh, scp. It's all flying :laughing:. And WAN still works. So cool.

As above, and then I decided to take the training wheels off - or that's what it felt like, when I disconneted the access port (Port 4). Still all good. over the VLAN.

BTW, I also did check, for interest, tcpdump -ennv -i br-lan 'vlan 250 and (port 67 or port 68)'. I could see the VLAN tagged traffic (correct vlan), and the DHCP packets flying around. So cool.

Really appreciate the help and pointers - thanks!

Now, do we want to go back to fighting with the other router (switch issues?)? Ya, I'm a glutton for punishment ... LOL! But that is the router I'd actually like to use, if possible. And I think it should be workable?

In any case - sincere thanks!!!

Glad you're in a better spot with this, even if it's not the router you want to use. This gives you the ability to design the architecture of your network and verify that everything is working as expected upstream and downstream... when you swap in your other device, you'll have confidence in the rest of the network. Everything you've done sounds right.

I have no idea what is happening with the other device, but the fact that you have an alternate router for the immediate implementation is fortunate.

I haven't yet messed with DSA. I am looking forward to getting used to the new architecture, but I'm also kind of dreading this because of the cognitive context switching that may be necessary when working with devices that do and don't support it. It's not complicated, but it is just similar enough and yet different enough to be a minor annoyance. Long term, though, this is the right path :slight_smile: .

1 Like

So very much agreed - again, really appreciate it. Thanks!

Will play a bit with the other router, see if I can get anywhere. Will report back - if nothing else, if I can help someone else avoid a headache or two, then it's worth it.

1 Like

OK, I think I see what is going on on the "other" (rt-ac58u) router ... and where LuCI is going awry :grinning_face_with_smiling_eyes:.

So, clean setup, access port configured (that is so handy, thanks for the pointer!), and I on purpose let LuCI configure my VLANs. Added 1 (untagged), and 250 (tagged) - just like the EA3500 above. But ... here is the resulting snippet from /etc/config/network,

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option ports '0t 2t 1t'
        option vid '250'

What the heck?!?! Why is vlan '2'? Upstream is broken, as I sort of expect - I'm thinking vlan 2 is for WAN. So ... manually changed (vlan to 250), restart network, and ... uplink is happy again! So the issue (I think) is LuCI setting the vlan to 2, when it should be 250 (or at least, not 2 :stuck_out_tongue_winking_eye:). Agreed?

OK, a couple other observations though,

  1. With the 250 vlan added, I do get a new device showing up ... eth0.250. As I'd expect. And I add a new interface to this. But ... it's not working, more on this below.
  2. I don't get an eth0.1 device - like I did on the other router. I leave lan connected to eth0, but thinking it really should be eth0.1, no?

OK, and the odd thing. I don't get traffic through to eth0.250 (even checked with tcpdump). And I think the reason is ... I check the MAC table on my switch, when connected to this router - it's empty, for this port. I take the cable, just move it over to the "old" router (EA3500) ... immediately I see a MAC entry for VLAN 250. It's like the "new" (rt-ac58u) router isn't broadcasting some needed info on vlan? I may be off base there.

But basically ... I can make it configure, and fix upstream - by correcting the LuCI vlan setting. Please let me know if you have any thoughts on the MAC issue (not getting to the switch). I do need to dig as well, not sure what the difference is between vlan and vid.

Thanks!

I was wondering this, too, but dismissed it... I was thinking that maybe it was like VLAN entry 2, but not VLAN ID 2. But this is a good find, for sure. So if removing it fixes the upstream issues, it makes a ton of sense. Why it is setting VLAN 2 in the first place is a mystery to me.

Let's see the whole network config file.
Regarding the 2nd question, it should actually be okay to have the LAN associated with eth0, but I can't be certain without playing with the hardware. eth0 and eth0.1 would really only be different in that eth0.1 is VLAN 1 tagged on the CPU. If VLAN 1 is currently untagged on the CPU, that would likely explain it.

let's compare the respective VLAN/port configuration on each of the routers as it pertains to the connection to the switch.

Me too! :laughing:. I admit, not sure how to get a display of the vlan settings (kinda like ifconfig or brctl, but show vlan info). Still looking for that.

Sorry, not sure I follow that one - and interestingly, it is different on the two routers (though that may be DSA vs. non-DSA).

Good point! Let me paste them below - for the switch, just to not past too much junk here. I can add more if you want, no issue!

  1. Working router (EA3500)
config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'ethernet1'
        list ports 'ethernet2'
        list ports 'ethernet3'
        list ports 'ethernet4'

config bridge-vlan
        option device 'br-lan'
        option vlan '1'
        list ports 'ethernet3'
        list ports 'ethernet4'

config bridge-vlan
        option device 'br-lan'
        option vlan '250'
        list ports 'ethernet1:t'
        list ports 'ethernet2:t'

config interface 'lan'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option device 'br-lan.1'

config interface 'vlan250'
        option proto 'static'
        option device 'br-lan.250'
        option ipaddr '192.168.250.1'
        option netmask '255.255.255.0'
  1. Non-working router (rt-ac58u)
config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'eth0'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option vid '1'
        option ports '0 4 3'
		
config switch_vlan
        option device 'switch0'
        option vlan '250'
        option ports '0t 2t 1t'
        option vid '250'

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config interface 'vlanWG'
        option proto 'static'
        option device 'eth0.250'
        option ipaddr '192.168.250.1'
        option netmask '255.255.255.0'

Thanks!

FYI, in case this was the switch (being "fussy"), I tried direct NIC (PC) connection, to the VLAN 250 port. No go. But take that same cable to the other router ... no issues!

Something odd about that VLAN. Hmmm.

What about if you change the VLAN ID to 5 instead of 250. This would test the possibility that the VLAN aware switch chip might only be able to handle up to VLAN ID 15. I don't know if this is the issue, but worth trying.

Be sure to change it everywhere, of course, so things are consistent. And likewise your VLAN configuration on the other switch and/or your PC would need to be set to work with VLAN 5 tagged.

That's it! It's working now, through the switch and all :smiley:. Really appreciate it! But I do have to ask ... what made you think of that. LOL. Just wondering.

Thanks again so much for the help and pointers.

PS: tcpdump only shows vlan headers on eth0, not eth0.5. Still need to wrap my head around that one ... LMAO.

If your problem is solved, please consider marking this topic as [Solved]. See How to mark a topic as [Solved] for a short how-to.

1 Like

Awesome! Glad you're running now!

TBH, I don't recall who on this forum had contributed similar advice... I learned about the VLAN limitation of some switch chips from that person/people, but I've never experienced it myself. Fortunately, though, the info remained resident in my mind and I could offer that suggestion to you. Once we had gotten as far as not bringing down the upstream interface (good work, btw!), that seemed like the only other thing that could explain the behavior you were seeing. I was out of ideas, so glad this one worked!

1 Like

Yes, I plan to! I'll summarize the key items, so they are in one place - make it easier for folks (like me, thinking most folks don't want to read the entire thread :laughing:). I don't really like marking my summary as the solution, but seems from the link that's the way to do it.

@psherman, thinking these are the key items I need / plan to capture. Anything to add?

  1. LuCI incorrectly sets vlan (for the ipq40xx) ... the first one added is set to ID 2, which breaks wan. Change it, to a different value (and restart network) ... wan should return, and vlan will be OK (with the new ID).
  2. The ipq40xx switch seems to be limited in terms of vlan ID that can be used. Stick to 15 or less.

And some key notes I plan to also capture,
a) Keep one switch port as access (untagged vlan). It's really helpful for debugging, and is always accessible (even when you screw things up, trust me :grinning_face_with_smiling_eyes:)
b) Hats off to @psherman for the help an patience, working through this

BTW, I'll then post to the ipq40xx thread this solution as well, try to help folks there out.

Thanks!

1 Like