I've just started developing support for some and thereby also using Qualcomm IPQ4019 devices.
During development I stumbled upon the strange VLAN and network interface configuration required for IPQ40xx devices which has also been discussed here on multiple threads before.
It struck me as rather odd and after doing a deep dive through the ess-edma driver I'd like to propose a set of conceptual changes to that driver.
But let me first describe how it is set up current just so that we are all on the same page:
The IPQ4019 integrates a single MAC and an internal VLAN-capable switch with six ports. One of the ports is connected to the MAC.
Notably the remaining five ports are not exposed via five individual physical interfaces. The remaining ports are in fact exposed via what Qualcomm calls PSGMII. PSGMII stands for Penta Serial Gigabit Media Independent Interface. As the name suggests it is an interface for connecting (up to) five PHYs via a single serial link. There are special PHY chips offered by Qualcomm that have a single PSGMII interface and up to five gigabit media dependent Ethernet interfaces. The important takeaway is that this external chip is just a PHY. It is not a switch. Thus all packets received by the PHY are sent to the VLAN-capable switch inside the IPQ40xx and right back out via PSGMII again if they need to be forwarded to any other switch ports.
If you have used a IPQ40xx device before you will be aware that there are usually two network interfaces, eth0 and eth1. But as previously stated there is only one MAC which is hardwired to the integrated switch of the IPQ40xx. This is where the architecture of the ess-edma driver comes in. The two interfaces are not a hardware feature but a software emulation feature of the ess-edma driver. In fact the number of interfaces created by the ess-edma driver can be configured through the device tree. In the device tree each interface is associated with two parameters: A default VLAN ID and a port bitmap. These two values influence packet reception and transmission.
Whenever a packet is transmitted through one of the interfaces created by the ess-edma driver its VLAN tag is inspected by the driver. If the packet is untagged it is then tagged with the default VLAN ID configured for this interface by the driver. If the packet is already tagged the VLAN tag is left untouched. Next the packet gets also tagged with the port bitmap configured for the interface. (I think this limits the set of switch ports this packet can be forwarded to?) After that the packet is enqueued for transmission to the internal switch.
When a packet is received on a switch port the switch tags it with the id of the port the packet was received on. If the switch forwards the packet to the CPU port the ess-edma driver inspects the port id tag. It then searches for a network interface where BIT(port id) is set in the associated port bitmap. That is how ess-edma identifies the network interface a packet should be received on. Additionally the ess-edma driver inspects the VLAN tag of the packet. If the VLAN tag matches the default VLAN ID for for that network interface it drops the VLAN tag before passing the packet to the network stack. Else the VLAN tag is left in place.
The procedures described above create multiple problems:
In the ess-edma driver every network interface must be associated with a default VLAN ID. Since a VLAN tag for this VLAN ID will be automatically added to and removed from packets it renders the VLAN ID unusable in the network external to the IPQ40xx based device if that network uses multiple VLANs.
Lets explore a simple example setup to illustrate this issue:
Lets assume the IPQ40xx (from here on called "router") has the ess-edma driver configured to set up two interfaces. eth0 has default VLAN ID 1, while eth1 has default VLAN ID 2. The specific port bitmap used on those interfaces is not relevant to this example.
The network the router is supposed to be used in uses VLANs. Currently VLANs 1, 2, and 3 are in use there.
To support reception of data for all those VLANs the CPU port of the switch in the router is set up as a tagged member of all those VLANs. Additionally one of the switch ports in eth0's port bitmap is also set up as a member of all those VLANs. This switch port is then connected to the rest of the network.
On the router VLAN interfaces need to be set up to handle the VLAN traffic. For VLAN ID 3 this is straight forward, adding a VLAN interface with id 3 to eth0 as eth0.3 does the trick.
VLAN ID 2 is a bit more complicated. Since it is configured as the default VLAN ID of eth1 the ports from eth1's port bitmap must be included in that VLAN on the switch, too. Else reception of packets on these ports becomes impossible. Depending on the exact setup this forces bridging of ports that are not intended to be bridged. Thus configuring VLAN ID 2 is impossible without side effects on switch level alone.
For VLAN ID 1 it is also a bit more complicated. Here the traffic must be sent either tagged or untagged on eth0 and always received untagged since the ess-edma driver will automagically add and remove the VLAN tag because it matches eth0's default VLAN ID. Note that it is not possible to send untagged traffic from eth0 to the switch. Neither is it possible to distinguish between untagged packets and packets tagged with the default VLAN ID received on eth0.
As you can see VLAN configuration becomes a lot more complicated with the current ess-edma network device model and may even be impossible in certain configurations.
The fact that every interface created by ess-edma has an associated port bitmap can cause packets on the same VLAN to arrive on different interfaces, depending on the switch port they were received on. This can complicate configuration even further, e.g. requiring multiple netfilter rules where otherwise only one would be required.
A less pragmatic problem is the fact that VLANs should not be part of hardware description. VLANs are inherently very flexible and should be fully software configureable. Thus I'd argue that
the whole concept of configuring VLANs and port bitmaps in the dts is not a good idea.
In summary the current network device model presented by ess-edma is unintuitive, complicated to configure and encourages bad dts style.
The only upside over the usual single interface representation of MACs connected to a switch is that it provides a method for propagating port link state to interface link state via association of Ethernet PHYs with network interfaces created by ess-edma. Since most other OpenWrt targets don't offer this feature on switches either I consider it neither a big problem nor one that should be solved by ess-edma.
I'd like to drop the support for multiple interfaces in ess-edma. This would make IPQ40xx behave like most other SoCs supported by OpenWrt. No more default VLANs and port bitmaps. I suspect this change would also help greatly if want to mainline the driver. As far as I can tell the change should not have any ill effects other than necessitating a configuration change for existing devices. While this is of course not desirable I'd like to propose the following options forward:
The first option would be to just make it a breaking change, changing behaviour of newly introduced and old devices in the ipq40xx target. This would of course be the simplest option.
The second option would be to add a software switch (maybe depending on dts) that can enable either the old multi-interface legacy behaviour or the new single-interface.
Yet another option would be to combine Option 1 with a switchover to DSA. Since switching to DSA will probably break all old configurations anyway we might as well combine the two. I'm not sure if I will have the necessary spare time to pull this one off but I would at the very least like to consider it.
I've only been working with IPQ40xx for about one and half days now. It is entirely possible I did not understand some of the inner workings of the SoC correctly. Please let me know if I did get something wrong.
Above text describes only the situation with multi-port PSGMII PHYs. I'm pretty sure my proposed changes would also work for single phy setups. But again, it is entirely possible I did miss something.
Has this been discussed before? What are your thoughts on this topic?