RFC: Represent IPQ40xx ess-edma MAC as single interface

Hey all,

I've just started developing support for some and thereby also using Qualcomm IPQ4019 devices.
During development I stumbled upon the strange VLAN and network interface configuration required for IPQ40xx devices which has also been discussed here on multiple threads before.
It struck me as rather odd and after doing a deep dive through the ess-edma driver I'd like to propose a set of conceptual changes to that driver.
But let me first describe how it is set up current just so that we are all on the same page:

Preface

The IPQ4019 integrates a single MAC and an internal VLAN-capable switch with six ports. One of the ports is connected to the MAC.
Notably the remaining five ports are not exposed via five individual physical interfaces. The remaining ports are in fact exposed via what Qualcomm calls PSGMII. PSGMII stands for Penta Serial Gigabit Media Independent Interface. As the name suggests it is an interface for connecting (up to) five PHYs via a single serial link. There are special PHY chips offered by Qualcomm that have a single PSGMII interface and up to five gigabit media dependent Ethernet interfaces. The important takeaway is that this external chip is just a PHY. It is not a switch. Thus all packets received by the PHY are sent to the VLAN-capable switch inside the IPQ40xx and right back out via PSGMII again if they need to be forwarded to any other switch ports.

ESS-EDMA

If you have used a IPQ40xx device before you will be aware that there are usually two network interfaces, eth0 and eth1. But as previously stated there is only one MAC which is hardwired to the integrated switch of the IPQ40xx. This is where the architecture of the ess-edma driver comes in. The two interfaces are not a hardware feature but a software emulation feature of the ess-edma driver. In fact the number of interfaces created by the ess-edma driver can be configured through the device tree. In the device tree each interface is associated with two parameters: A default VLAN ID and a port bitmap. These two values influence packet reception and transmission.

Packet TX

Whenever a packet is transmitted through one of the interfaces created by the ess-edma driver its VLAN tag is inspected by the driver. If the packet is untagged it is then tagged with the default VLAN ID configured for this interface by the driver. If the packet is already tagged the VLAN tag is left untouched. Next the packet gets also tagged with the port bitmap configured for the interface. (I think this limits the set of switch ports this packet can be forwarded to?) After that the packet is enqueued for transmission to the internal switch.

Packet RX

When a packet is received on a switch port the switch tags it with the id of the port the packet was received on. If the switch forwards the packet to the CPU port the ess-edma driver inspects the port id tag. It then searches for a network interface where BIT(port id) is set in the associated port bitmap. That is how ess-edma identifies the network interface a packet should be received on. Additionally the ess-edma driver inspects the VLAN tag of the packet. If the VLAN tag matches the default VLAN ID for for that network interface it drops the VLAN tag before passing the packet to the network stack. Else the VLAN tag is left in place.

Problems

The procedures described above create multiple problems:

VLAN IDs

In the ess-edma driver every network interface must be associated with a default VLAN ID. Since a VLAN tag for this VLAN ID will be automatically added to and removed from packets it renders the VLAN ID unusable in the network external to the IPQ40xx based device if that network uses multiple VLANs.

Example

Lets explore a simple example setup to illustrate this issue:
Lets assume the IPQ40xx (from here on called "router") has the ess-edma driver configured to set up two interfaces. eth0 has default VLAN ID 1, while eth1 has default VLAN ID 2. The specific port bitmap used on those interfaces is not relevant to this example.
The network the router is supposed to be used in uses VLANs. Currently VLANs 1, 2, and 3 are in use there.
To support reception of data for all those VLANs the CPU port of the switch in the router is set up as a tagged member of all those VLANs. Additionally one of the switch ports in eth0's port bitmap is also set up as a member of all those VLANs. This switch port is then connected to the rest of the network.
On the router VLAN interfaces need to be set up to handle the VLAN traffic. For VLAN ID 3 this is straight forward, adding a VLAN interface with id 3 to eth0 as eth0.3 does the trick.
VLAN ID 2 is a bit more complicated. Since it is configured as the default VLAN ID of eth1 the ports from eth1's port bitmap must be included in that VLAN on the switch, too. Else reception of packets on these ports becomes impossible. Depending on the exact setup this forces bridging of ports that are not intended to be bridged. Thus configuring VLAN ID 2 is impossible without side effects on switch level alone.
For VLAN ID 1 it is also a bit more complicated. Here the traffic must be sent either tagged or untagged on eth0 and always received untagged since the ess-edma driver will automagically add and remove the VLAN tag because it matches eth0's default VLAN ID. Note that it is not possible to send untagged traffic from eth0 to the switch. Neither is it possible to distinguish between untagged packets and packets tagged with the default VLAN ID received on eth0.

As you can see VLAN configuration becomes a lot more complicated with the current ess-edma network device model and may even be impossible in certain configurations.

Inflexible port use

The fact that every interface created by ess-edma has an associated port bitmap can cause packets on the same VLAN to arrive on different interfaces, depending on the switch port they were received on. This can complicate configuration even further, e.g. requiring multiple netfilter rules where otherwise only one would be required.

Superfluous configuration in DTS

A less pragmatic problem is the fact that VLANs should not be part of hardware description. VLANs are inherently very flexible and should be fully software configureable. Thus I'd argue that
the whole concept of configuring VLANs and port bitmaps in the dts is not a good idea.

Summary

In summary the current network device model presented by ess-edma is unintuitive, complicated to configure and encourages bad dts style.
The only upside over the usual single interface representation of MACs connected to a switch is that it provides a method for propagating port link state to interface link state via association of Ethernet PHYs with network interfaces created by ess-edma. Since most other OpenWrt targets don't offer this feature on switches either I consider it neither a big problem nor one that should be solved by ess-edma.

Proposal

I'd like to drop the support for multiple interfaces in ess-edma. This would make IPQ40xx behave like most other SoCs supported by OpenWrt. No more default VLANs and port bitmaps. I suspect this change would also help greatly if want to mainline the driver. As far as I can tell the change should not have any ill effects other than necessitating a configuration change for existing devices. While this is of course not desirable I'd like to propose the following options forward:

Option 1: Breakage

The first option would be to just make it a breaking change, changing behaviour of newly introduced and old devices in the ipq40xx target. This would of course be the simplest option.

Option 2: Keep behaviour for old devices

The second option would be to add a software switch (maybe depending on dts) that can enable either the old multi-interface legacy behaviour or the new single-interface.

Option 3: Breakage + DSA

Yet another option would be to combine Option 1 with a switchover to DSA. Since switching to DSA will probably break all old configurations anyway we might as well combine the two. I'm not sure if I will have the necessary spare time to pull this one off but I would at the very least like to consider it.

Disclaimer

I've only been working with IPQ40xx for about one and half days now. It is entirely possible I did not understand some of the inner workings of the SoC correctly. Please let me know if I did get something wrong.
Above text describes only the situation with multi-port PSGMII PHYs. I'm pretty sure my proposed changes would also work for single phy setups. But again, it is entirely possible I did miss something.

Has this been discussed before? What are your thoughts on this topic?

Cheers,
Tobias

3 Likes

The only correct approach is a proper ethernet controller driver + DSA.
There has been an attempt though its not working properly.

3 Likes

Hi @robimarko,

Are these the latest attempts?

Cheers

Internally we are still working on it, so there is newer stuff.
I am working on getting everyone to prepare and release the current version.

2 Likes

Hi Tobias,
may I draw your attention to this post ?

The OP also has a github repository with some patches.

Cheers,
Thomas

So as far as I know, my fixes make the VLAN act as expected. The DTS gets simplified but it can't be removed directly without and even deeper change to the driver that I would not work on alone.

It goes have breaking changes and I am unable to test all devices required to send a proper patch upstream. I add more devices as willing testing users request them.

Long story short: it works, but need more extensive testing. At least one device for each shared configuration.

In my case, I have poor-country problems, which means I have constrained budget and even with money tech doesn't sell here (so I need to pay customs, shipping expenses, intermediaries and wait weeks for stuff to come). I can't just buy each device and make it happen.

So if we can collect at least one user from each missing config, we can actually send this to OpenWrt with a solid argument.

Yes, it is true
i using your patch on HAP AC2. Grateful for single eth / free vlans assigment
thank you

This "feature" bit me really hard this summer. You can read my 2 month struggle here: Issues with multiple networks / VLANs with a router and a dumb AP - #20 by paraskevas

tl;dr: I have an ASUS RT-AC58U configured as a dumb AP that mirrors packets back and I suspect its due to how its ports work. The two issues I faced was DHCP working erratically (if at all for some devices) and main router's syslog getting flooded. Performance was also really degraded. My solution was to change the LAN cable coming from the main router from WAN to LAN1. This fixed everything (and my sleeping schedule). I'm still not using its WAN port and I'm going to try @NoTengoBattery firmware when I have the time

Regarding this topic, since I've tried DSA on another router and since (I believe) this is the way forward I find Option 3 the best. A lot of extremely popular routers broke when upgraded to 21.02, so I don't think this is an issue with this platform. Its less popular and this is interface issue is known for quite a while. Also this kind of config breakage is supported by OpenWrt internally, so no work to be done there. Users will be prompted before flashing and chose to either upgrade or stay at the old version

BTW, I am working on DSA currently:

4 Likes

So in order to support my device and check, all I have to do is do a similar change to this and build your branch?

Yes, it should be straightforward for most devices as the current network config tells you how many ports there are as well as how they are connected

1 Like