IPQ40xx Switch Config "Strangeness"

Impacting EA8300 and apparently EA6350v3, the IPQ40xx switch in these devices seems to have a mind of its own about VLANs and configuration.

From what I've read, this is a "non-standard" switch that may require additional development work to have functional VLANs. It seems as though the two Ethernet phys are internally "wired" to VLANs off switch port 0.

These devices appear to be dual-NIC devices, which may be almost certainly are a different wrinkle than dealt with in other threads / patches as the NICs appear "hard wired" to various ports by the DT at driver-init time.

The goal of this topic is to collect any additional information on these or related devices and what might be done to get VLAN configuration into OpenWrt (especially as @chunkeey's patch, referenced below, doesn't look like it was accepted).


Current Synopsis


References

http://wiki.dreamrunner.org/public_html/Embedded-System/Qcom-ipq40xx/ipq40xx-device-tree-overview.html

  1. QCA Ethernet Switch ESS-SWITCH is used to forward the packet among LAN, WAN and Host processor. Documentation/devicetree/bindings/arm/msm/ess-switch.txt

(I have been unable to locate ess-switch.txt so far.)

http://wiki.dreamrunner.org/public_html/Embedded-System/Qcom-ipq40xx/ipq40xx-ethernet-analysis.html

Potentially Related Threads / Topics

https://bugs.openwrt.org/index.php?do=details&task_id=1652


Observed Behavior

If you accept the default configuration, you end up with a functional switch for OEM-style use

VLAN 1:
	vid: 1
	ports: 0t 1 2 3 4 
VLAN 2:
	vid: 2
	ports: 0t 5 

if you try something different, it fights back

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'
 
config switch_vlan
        option device 'switch0'
        option vlan '1' 
        option ports '0 1 2 3 4'
 
config switch_vlan
        option device 'switch1'
        option vlan '2'
        option ports '5 6'
VLAN 1:
        vid: 1
        ports: 0 1 2 3 4 
VLAN 2:
        vid: 2
        ports: 0t 5 

or

config switch_vlan
	option device 'switch0'
	option vlan '10'
	option ports '0 1 2 3 4'

config switch_vlan
	option device 'switch1'
	option vlan '56'
	option ports '5 6'
VLAN 1:
        vid: 1
        ports: 0t 1t 2t 3t 4t 
VLAN 2:
        vid: 2
        ports: 0t 5 
VLAN 10:
        vid: 10
        ports: 0 1 2 3 4 

(Edit: Yes, I have since confirmed that the QCA AR40xx only has six ports supported by the driver -- I was poking at a "black box" to try to figure out where that second phy was actually connected.)

2 Likes

A firm believer in helping yourself, when you can, and hopefully @guidosarducci as well:

[PATCH] net: essedma: disable default vlan tagging

Is close to applying on Linux 4.14. The second "hunk" is now at around line 1400 and is clearly recognizable, with mainly the addition of the unlikely() branch helper macro.

That section now is also wrapped with if (!adapter->edma_cinfo->is_single_phy) (which appears to be be case with the two Linksys devices.

Edit: Setting the first value in the vlan_tag property to zero for the two gmac devices may have the same impact as the chunkeey patch (causing the conditionals change the behavior similarly to the patch). Assuming no other, adverse impacts, and that it improves the situation this may be a non-patch way to move forward.

I don't have enough information to understand the vlan_tag properties of the gmac nodes (Edit: see later in this post for "decoding" of this), though changing from qcom,num_gmac = <2>; to <1> as that patch did for single-phy devices probably isn't the proper approach for dual-mac devices.`

There is an "interesting" correspondence between some of the ess-switch properties and the two gmac properties:

                        switch_cpu_bmp = <0x1>;
                        switch_lan_bmp = <0x1e>;
                        switch_wan_bmp = <0x20>;
                        gmac0: gmac0 {
                                local-mac-address = [00 00 00 00 00 00];
                                vlan_tag = <1 0x1f>;
                        };

                        gmac1: gmac1 {
                                local-mac-address = [00 00 00 00 00 00];
                                [...]
                                vlan_tag = <2 0x20>;
                        };

Edit: The three *_bmp seem to be which physical ports are associated with each of two VLANs see drivers/net/phy/ar40xx.c: ar40xx_vlan_init()

0x01  0000 0001 -- CPU
0x1e  0001 1110 -- LAN
0x20  0010 0000 -- WAN

drivers/net/ethernet/qualcomm/essedma/edma_axi.c as amended by patches/platform/710-net-add-qualcomm-essedma-ethernet-driver.patch should reveal what vlan_tag does.

First number is the adapter[idx]->default_vlan_tag (which, as I recall, the chunkeey patch "defangs"Edit: as likely would setting it to 0, though full impact of that not evaluated yet.)

Second number is the ->dp_bmp and is also used as portid_bmp bitmap (after shifting off the CPU bit) for populating edma_cinfo->portid_netdev_lookup_tbl[<port index>] (chasing this down now -- though that there can only be one value per port is "going to be interesting")

Yep, the netdev instances appear "hard wired" to ports through that map in drivers/net/ethernet/qualcomm/essedma/edma.c



So it looks like with two phys, one gets to be "permanently" associated with one port or set of ports, and the other with "the rest" of them (short of modifying the DT and re-initializing the driver).

(TBD if one phy can transmit out a port typically associated with the other phy, but that doesn't seem to be too practical for most applications.)

Both seem to go through a single, logical switch port, making me wonder if that pathway is limited to 1 Gbps, or if it is actually faster. Hmmm, wonder how (and how well) that OEM "bonding" feature works.

Going to have to sleep on this, as well as hope that there are magically more than two phys in there.

4 Likes

Unfortunately, I don't have much to add here right now. I too discovered the problem with vlan and the switch when I started all this. But so far, I don't know of any solution that doesn't come with problems of their own. Ideally, ipq40xx will move away from essedma and ar40xx to ipqess + qca8k_mmio. This will go along with the change from the current vlan-based switch setup to a port-based switch setup with DSA. Trouble is that especially the integrated switch isn't cooperating as the external QCA8337 on which it is based. I had some luck with getting these both going but the single-external-phy devices like MR33 are the show-stopper (and a big one). But I'm still hoping to get at least the working preview version ready by the time ipq40xx is switched to 4.19.

2 Likes

I did note that Google ChromeOS dropped the essedma drivers some time after chromeos-3.18. I haven't dug in yet to see if / how it was replaced.

https://chromium.googlesource.com/chromiumos/third_party/kernel

drivers/net/ethernet/qualcomm/essedma

(Jeff requested to take a look)

As said in the commit states, this was only tested on the RT-AC58U. The switch and stuff comes up and the very basic stuff should be working.

But as a warning DSA changes the way of VLANs will be setup. There will be no more swconfig tool and ucidev_switch integration, instead the vlan need to be setup through the standard linux command "ip"
for example

# ip link add link wan name wan.100 type vlan id 100

This should open up the vlan with id 100 on the wan port.

1 Like

A couple notes for anyone else deciding to pursue this path.

First, I'm moving to 4.19 as I'm not going to be "done" with the EA8300 prior to the v19 branch, so might as well move to 4.19 as I'm going to have to do so anyway. Unless you like being on the "bleeding edge", this isn't a turn-key solution to VLAN configuration.

Rebase / merge of current work onto chunkeey's branch was as painless/painful as any rebase / merge.

While obvious in retrospect, changes in target/linux/ipq40xx/files-4.14/ and target/linux/ipq40xx/patches-4.14/ need to be applied to their -4.19 counterparts.

Initial builds had all kinds of strange problems with the Linux build wanting to recreate .config at first, with

*
* Restart config...
*

appearing in the verbose logs, even with make clean download world. Wiping build_dir/ resolved them.

Make sure you've got serial access to the device and a way of flashing images that doesn't require OpenWrt to have network connectivity (such as U-Boot TFTP) as, while my first build runs, it has no network connectivity from a "cold start".

As most of what I'll pursue next is potentially device-specific, I'll post progress on the EA8300 thread, unless I'm confident that it is general in nature.

On a "generic" note, linux-4.19.25/arch/arm/boot/dts/qcom-ipq4019.dtsi
has some problems, as it defines label = "lan1" twice.

maybe you could try this branch


The same device is used on the Compex WPJ428,

is the device using double VLAN tags ? ( tagged VLAN 1 and 2 from devicetree )
There is only 1 switch ( QCA8072 ) that has ports 4 and 5 connected to the external RJ45 sockets, so the LuCi switch web config doesn't always work on this system :

I have to set it in /etc/config/network and specify eth1.x or eth0.x even tho they are both on the same switch- makes it look like they are on separate switches ( or the same switch with 2 gmac / mdio busses ) .. strange indeed

I've gotten VLANs to work with it by manually adding the switch configs to /etc/config/network : ( part of a mwan3 config we were doing - the compex only has 2 physical ports on the board so I extended the ports via a managed switch)

config switch
   option name 'switch0'
   option reset '1'
   option enable_vlan '1'

config switch_vlan 'eth1_11'
   option device 'switch0'
   option vlan '11'
   option vid '11'
  option ports '0t 4t 5t'

config switch_vlan 'eth1_12'
   option device 'switch0'
   option vlan '12'
   option vid '12'
   option ports '0t 4t 5t'

config switch_vlan 'eth0_15'
   option device 'switch0'
   option vlan '15'
   option vid '15'
   option ports '0t 4t 5t'

from the devicetree it looks like vlan tags are set for "wan" and "lan"

&gmac0 {
	qcom,phy_mdio_addr = <4>;
	qcom,poll_required = <1>;
	qcom,forced_speed = <1000>;
	qcom,forced_duplex = <1>;
	vlan_tag = <2 0x20>;
};

&gmac1 {
	qcom,phy_mdio_addr = <3>;
	qcom,poll_required = <1>;
	qcom,forced_speed = <1000>;
	qcom,forced_duplex = <1>;
	vlan_tag = <1 0x10>;
};

but in 02_network its called"eth0 and eth1"

compex,wpj428
	ucidef_set_interface_lan "eth0 eth1"
1 Like

Hi...
I'm one of the developers that managed to get support for the EA6350v3. Studying the OEM firmware, I managed to get a solution for reverting back to stock. And now I want to get involved into this.

Again, by studying the OEM firmware I found this piece of text:

# switch configuration
#
# Defines how the interfaces (switches) are configured. This information is
# used by linkstate, linkmgr, and other processes to access the correct ports
# for information, and in case of Broadcom, this is also used to configure
# the switches.
#
# Each interface is numbered by mode.
# <mode> is currently "router" or "bridge"
#
# switch::<mode>_max - the number of interfaces defined for this mode
#
# switch::<mode>_#::<parameters> - where parameters are vendor dependent
#   Required:
#     ifname - the OS interface name
#     physical_ifname - the physical interface (may be the same as ifname)
#     port_numbers - list of ports that belong to this interface
#     port_names - list of human-readable names for each port
#     wan_monitor_port - the port to be monitored (but only if this is WAN)
#   Optional (vendor dependent):
#     vlan_num - if vlan is used, the vlan number
#     cpu_port_number - number designated as CPU port
#     driver_hw_name - name of the Ethernet device (according to Broadcom
#                      driver) for this interface
#     interface_suffix - can be empty, "*" (default), or "u" (untag)
#

# router mode
$switch::router_max=2

$switch::router_1::ifname=eth1
$switch::router_1::physical_ifname=eth1
$switch::router_1::port_numbers=1 2 3 4
$switch::router_1::port_names=port1 port2 port3 port4

$switch::router_2::ifname=eth0
$switch::router_2::physical_ifname=eth0
$switch::router_2::port_numbers=5
$switch::router_2::port_names=wan
$switch::router_2::wan_monitor_port=5

# bridge mode
$switch::bridge_max=1

$switch::bridge_1::ifname=eth1
$switch::bridge_1::physical_ifname=eth1
$switch::bridge_1::port_numbers=1 2 3 4 5
$switch::bridge_1::port_names=port1 port2 port3 port4 wan
$switch::bridge_1::wan_monitor_port=5

I'm modifying the firmware now because given that it works currently by adding the port 5 in /etc/config/network, and given the fragment cited above from Linksys, I think that:

  1. Exposing the CPU eth1 port
  2. Exposing the port 5 as “wan” role
    Will fix the problem of configuring the switch in the LuCI UI.

If you have a Linksys EA6350v3, you may want to test the next build (v0.17) and provide me with feedback as I have no Ethernet devices to test (100% wireless ;).

2 Likes

So, after some investigation more focused to LuCI and UCI rather than kernel or hardware stuff... I found some interesting things.

  1. Most configurations of 02_network plain don’t work. I’ve destroyed my device’s flash by trying so many configurations inside 02_network and almost none work out of the box (no wan traffic at all)
  2. Applying the default switch settings (1: 4 3 2 1 0; 2: 5 0t) in UCI always fail. When you touch anything in vlan2 (even if you reapply the default setting) everything stops working.

Some findings (I cannot test further because I don’t use VLAN)

  • Setting up port 5 as WAN does not work, the router cannot talk to WAN
  • Setting up port 5 as CPU (eth1) do not break the normal WAN functionality
  • Changing anything in vlan2, irrespective of the setting and the 02_network will break the normal WAN
  • Looks like double tagging is required both in CPU (eth0): 0 and CPU (eth1): 5

So, after a painful test from my part, I would like some people that use VLAN to test LuCI.
In order to make LuCI (and the users) to ignore VLAN2, it’s important to add a configuration where all ports are “off” in vlan2.

It “should work” if you don’t touch vlan2 and if you tag both CPUs (eth0 and eth1). It shows two eth1 devices but that is just to remember users to tag both (and also because eth1 is in fact ports 0 and 5 at the same time, according to my tests).


CPU (eth0) | CPU (eth1) | CPU (eth1) | lan1 | lan2 | lan3 | lan4

More images:

  1. Applying the configuration in UCI
  2. /etc/config/network
  3. swconfig

It should work, it’s dirty and a real fix is needed but if I’m not wrong, you should be able to setup vlan in LuCI.

VLAN 1 and VLAN 2 are used at the driver level for GMAC0 and GMAC1 (see above, or target/linux/ipq40xx/patches-4.19/711-dts-ipq4019-add-ethernet-essedma-node.patch).

In my experience, VLAN 1 and VLAN 2 are best avoided in any "custom" configuration of the switch.

Bridging the same VLAN across eth0 and eth1 has some "interesting" challenges as well, seemingly related to one MAC for a given IP address being needed for connections on the "WAN" port and another on the "LAN" ports.

Interestingly, the default configuration shown in swconfig from the configuration upstream and fresh installed does not work at all when applied in LuCI or UCI.
Bridging is not happening because LuCI just ignore the vlan2. However, if you do anything to vlan2 except from ignoring it, it stop working at all. No network can flow from localhost or lan to the world.

Even tho when vlan2 is set as shown in swconfig from UCI, eth1 and the software vlan eth1.2 will show in LuCI and a bridge between both is created as br-wan.

The device gets an IP and an IPv6 (gateway, DNS, you know, everything needed to establish a connection) but I cannon reach the Internet. However, I can reach my gateway’s configuration webpage (the gateway is on wan/eth1).

I don’t understand why, my theory is that traffic is being tagged as vlan2 and the ISP just ignores it. Even tho, port 5 is marked as untangged.

Recently @Deoptim has posted a bunch of datasheets, (QCA9563 Datasheet), or directly here: https://github.com/Deoptim/atheros. The IPQ40xx datasheets might bring some light into the "darkness of the builtin switch".

Cheers,
Thomas

Well, I don't actually want to do hardware stuff (even if I'm an electronics engineer) but testing has shown to go in a good direction. I will call this more empirical results.

After applying the kernel patches and modifying /etc/board.d/02_network, it works out-of-the box in my configuration (ISP does not use or even support VLANs) and I've heard from a tester (I admit, more testing is needed) that the "hacky version" works.

In this case, I have a prebuilt here for the EA6350v3 which applies the never included kernel patches from @chunkeey and modifies 02_network to reflect the changes as show here.

The result looks like this is the correct configuration for OpenWrt (OEM firmware apart, I studied it and it's bull****). I've not modified the wan/wan6 interfaces nor the switch. This is what we get with the changes and it works out-of-the-box (as opposed to hacking the configuration in the previous attempts)


2 Likes

Sources?

If the DSA patches, it doesn’t support VLANs, at least last I checked, even in 5.x. There’s also an old patch to remove the code that sets a default VLAN tag that I’ve referenced.

Sources seen at https://github.com/NoTengoBattery/openwrt/commit/3433c6308d59bc84e80a03e7a89a4a89ce646aa0

which appear to be the removal of the default VLAN tagging.

Apologies for not getting back with you on this sooner. What is going on, as I understand it, is that there is a proprietary "Atheros header"1 on the RGMII that indicates, among other things, on which switch port the packet arrived. At least as I read the driver code, if the packet is untagged, it is tagged by the driver as VLAN 1 or VLAN 2

Code references:

  • patches-4.19/710-net-add-qualcomm-essedma-ethernet-driver.patch
  • patches-4.19/711-dts-ipq4019-add-ethernet-essedma-node.patch

1 See, for example, section 3.6 of the QCA8337N data sheet.

Dear Jeff
You can check the source code by yourself.

1 Like

Hi @NoTengoBattery
Thanks for the work
I'm not 100% clear though should vlans be working with your firmware?

I tried creating a situation where I have 1 trunked port with 3 vlans and the rest untagged (I need the trunk for guest wireless and another wireless network I want to isolate) and I keep getting errors that the traffic is coming to eth0 on the untagged ports which does not have an address (since it should be coming to eth0.1).