IPQ40xx Switch Config "Strangeness"

Hi...
I'm one of the developers that managed to get support for the EA6350v3. Studying the OEM firmware, I managed to get a solution for reverting back to stock. And now I want to get involved into this.

Again, by studying the OEM firmware I found this piece of text:

# switch configuration
#
# Defines how the interfaces (switches) are configured. This information is
# used by linkstate, linkmgr, and other processes to access the correct ports
# for information, and in case of Broadcom, this is also used to configure
# the switches.
#
# Each interface is numbered by mode.
# <mode> is currently "router" or "bridge"
#
# switch::<mode>_max - the number of interfaces defined for this mode
#
# switch::<mode>_#::<parameters> - where parameters are vendor dependent
#   Required:
#     ifname - the OS interface name
#     physical_ifname - the physical interface (may be the same as ifname)
#     port_numbers - list of ports that belong to this interface
#     port_names - list of human-readable names for each port
#     wan_monitor_port - the port to be monitored (but only if this is WAN)
#   Optional (vendor dependent):
#     vlan_num - if vlan is used, the vlan number
#     cpu_port_number - number designated as CPU port
#     driver_hw_name - name of the Ethernet device (according to Broadcom
#                      driver) for this interface
#     interface_suffix - can be empty, "*" (default), or "u" (untag)
#

# router mode
$switch::router_max=2

$switch::router_1::ifname=eth1
$switch::router_1::physical_ifname=eth1
$switch::router_1::port_numbers=1 2 3 4
$switch::router_1::port_names=port1 port2 port3 port4

$switch::router_2::ifname=eth0
$switch::router_2::physical_ifname=eth0
$switch::router_2::port_numbers=5
$switch::router_2::port_names=wan
$switch::router_2::wan_monitor_port=5

# bridge mode
$switch::bridge_max=1

$switch::bridge_1::ifname=eth1
$switch::bridge_1::physical_ifname=eth1
$switch::bridge_1::port_numbers=1 2 3 4 5
$switch::bridge_1::port_names=port1 port2 port3 port4 wan
$switch::bridge_1::wan_monitor_port=5

I'm modifying the firmware now because given that it works currently by adding the port 5 in /etc/config/network, and given the fragment cited above from Linksys, I think that:

  1. Exposing the CPU eth1 port
  2. Exposing the port 5 as “wan” role
    Will fix the problem of configuring the switch in the LuCI UI.

If you have a Linksys EA6350v3, you may want to test the next build (v0.17) and provide me with feedback as I have no Ethernet devices to test (100% wireless ;).

2 Likes

So, after some investigation more focused to LuCI and UCI rather than kernel or hardware stuff... I found some interesting things.

  1. Most configurations of 02_network plain don’t work. I’ve destroyed my device’s flash by trying so many configurations inside 02_network and almost none work out of the box (no wan traffic at all)
  2. Applying the default switch settings (1: 4 3 2 1 0; 2: 5 0t) in UCI always fail. When you touch anything in vlan2 (even if you reapply the default setting) everything stops working.

Some findings (I cannot test further because I don’t use VLAN)

  • Setting up port 5 as WAN does not work, the router cannot talk to WAN
  • Setting up port 5 as CPU (eth1) do not break the normal WAN functionality
  • Changing anything in vlan2, irrespective of the setting and the 02_network will break the normal WAN
  • Looks like double tagging is required both in CPU (eth0): 0 and CPU (eth1): 5

So, after a painful test from my part, I would like some people that use VLAN to test LuCI.
In order to make LuCI (and the users) to ignore VLAN2, it’s important to add a configuration where all ports are “off” in vlan2.

It “should work” if you don’t touch vlan2 and if you tag both CPUs (eth0 and eth1). It shows two eth1 devices but that is just to remember users to tag both (and also because eth1 is in fact ports 0 and 5 at the same time, according to my tests).


CPU (eth0) | CPU (eth1) | CPU (eth1) | lan1 | lan2 | lan3 | lan4

More images:

  1. Applying the configuration in UCI
  2. /etc/config/network
  3. swconfig

It should work, it’s dirty and a real fix is needed but if I’m not wrong, you should be able to setup vlan in LuCI.

VLAN 1 and VLAN 2 are used at the driver level for GMAC0 and GMAC1 (see above, or target/linux/ipq40xx/patches-4.19/711-dts-ipq4019-add-ethernet-essedma-node.patch).

In my experience, VLAN 1 and VLAN 2 are best avoided in any "custom" configuration of the switch.

Bridging the same VLAN across eth0 and eth1 has some "interesting" challenges as well, seemingly related to one MAC for a given IP address being needed for connections on the "WAN" port and another on the "LAN" ports.

Interestingly, the default configuration shown in swconfig from the configuration upstream and fresh installed does not work at all when applied in LuCI or UCI.
Bridging is not happening because LuCI just ignore the vlan2. However, if you do anything to vlan2 except from ignoring it, it stop working at all. No network can flow from localhost or lan to the world.

Even tho when vlan2 is set as shown in swconfig from UCI, eth1 and the software vlan eth1.2 will show in LuCI and a bridge between both is created as br-wan.

The device gets an IP and an IPv6 (gateway, DNS, you know, everything needed to establish a connection) but I cannon reach the Internet. However, I can reach my gateway’s configuration webpage (the gateway is on wan/eth1).

I don’t understand why, my theory is that traffic is being tagged as vlan2 and the ISP just ignores it. Even tho, port 5 is marked as untangged.

Recently @Deoptim has posted a bunch of datasheets, (QCA9563 Datasheet), or directly here: https://github.com/Deoptim/atheros. The IPQ40xx datasheets might bring some light into the "darkness of the builtin switch".

Cheers,
Thomas

Well, I don't actually want to do hardware stuff (even if I'm an electronics engineer) but testing has shown to go in a good direction. I will call this more empirical results.

After applying the kernel patches and modifying /etc/board.d/02_network, it works out-of-the box in my configuration (ISP does not use or even support VLANs) and I've heard from a tester (I admit, more testing is needed) that the "hacky version" works.

In this case, I have a prebuilt here for the EA6350v3 which applies the never included kernel patches from @chunkeey and modifies 02_network to reflect the changes as show here.

The result looks like this is the correct configuration for OpenWrt (OEM firmware apart, I studied it and it's bull****). I've not modified the wan/wan6 interfaces nor the switch. This is what we get with the changes and it works out-of-the-box (as opposed to hacking the configuration in the previous attempts)


2 Likes

Sources?

If the DSA patches, it doesn’t support VLANs, at least last I checked, even in 5.x. There’s also an old patch to remove the code that sets a default VLAN tag that I’ve referenced.

Sources seen at https://github.com/NoTengoBattery/openwrt/commit/3433c6308d59bc84e80a03e7a89a4a89ce646aa0

which appear to be the removal of the default VLAN tagging.

Apologies for not getting back with you on this sooner. What is going on, as I understand it, is that there is a proprietary "Atheros header"1 on the RGMII that indicates, among other things, on which switch port the packet arrived. At least as I read the driver code, if the packet is untagged, it is tagged by the driver as VLAN 1 or VLAN 2

Code references:

  • patches-4.19/710-net-add-qualcomm-essedma-ethernet-driver.patch
  • patches-4.19/711-dts-ipq4019-add-ethernet-essedma-node.patch

1 See, for example, section 3.6 of the QCA8337N data sheet.

1 Like

Dear Jeff
You can check the source code by yourself.

1 Like

Hi @NoTengoBattery
Thanks for the work
I'm not 100% clear though should vlans be working with your firmware?

I tried creating a situation where I have 1 trunked port with 3 vlans and the rest untagged (I need the trunk for guest wireless and another wireless network I want to isolate) and I keep getting errors that the traffic is coming to eth0 on the untagged ports which does not have an address (since it should be coming to eth0.1).

I really can't say if it works for you, because I can't test VLAN too much as I'm a wireless guy and my ISP is not the most advanced in the world.
But in theory and for what other testers have said to me, it should work as expected. You probably want to give it a try and let all us know.
I can tell you, tho, that the changes included in the frimware make quite a difference in comparison to OpenWrt's firmware. You may want to give it a try.

I'm using VLANs locally, I have 2 routers and I have 3 SSIDs that I wish to be separate networks.

Current situation on the router I was hoping to replace with this switch.

All works well on this switch if you undefrstand how it works and don't use Luci for switchconfig.

1 Like

Pilot care to expand?

If I bind my LAN to eth0.1 then I start getting complaints in the logs for Ethernet traffic that is coming in on untagged ports that the device sent it to eth0 which has no IP address/network configured. While the wireless network connected to the same VLAN will continue to work.

Stay away from VLAN 1 and VLAN 2 in your configs.

Be aware that there may be ARP problems with bridging LAN and WAN ports (Edit: What I am guessing happens is that the ARP request is answered by the bridge master, which is "wrong" either for requests coming in from the "LAN" ports or from the "WAN" ports, depending on how the bridge was configured. I haven't pursued this further to confirm.)

1 Like

Indeed things worked with VLAN IDs other then 1-2.

That solves that, though there is one strange thing I guess (or maybe that is what this whole topic is about and I just missed the point completely)

For the WAN port to work I had to explicitly assign it and the CPU vlan 2 in the configuration file:

config switch_vlan
	option device 'switch0'
	option vlan '5'
	option ports '5 0t'
	option vid '2'

It still doesn't show in the luci GUI and as a result changes to the configuration from luci tend to remove port 5 and thus disable it again, but other then that things seem to be working OK.

I also completely removed the configuration for VID 1 since it had no ports associated with it, this does not seem to have caused any issues.

Add to above, "Don't use LuCI to configure the switch" :wink:

The reason I avoid using VLAN tag 1 or 2 (either explicitly or with vid) is that the driver reserves them for untagged packets. As I understand the driver, GMAC0 always gets packets coming in from the "LAN" ports and they are "hard-coded" to be tagged with VLAN 1 if they are untagged. GMAC1, similarly, always gets packets coming in from the "WAN" port, tagged VLAN 2 if they are untagged. The reverse mapping is also true, as far as I know; GMAC0-originated packets can only exit a "LAN" port and GMAC1-originated packets can only exit the "WAN" port.

By avoiding any use of VLAN 1 and VLAN 2, the behavior is a lot more predictable, at least for me.

It's strange though because if I associate all untagged ports with specific vlans then they should get that VID and not 1/2

I'll see now if configuring other parts of the settings with LuCI that touch that file (interfaces, wireless networks) will mess these settings back up again or not....