Help me understand VLAN tagging performance penalty

So the first time I experienced performance issues due to adding VLAN to an interface was with the Netgear R7800, when I discovered that wired to wired speed was capped at around 80MB per second when I added a VLAN tag to an otherwise untagged port to transfer two subnets with one cable.

As I'm using a lot more VLANs these days on my Proxmox server which hosts Openwrt, and the fact that I'm planning for future 10Gb or more home networks, I'm really looking to get to the root of the performance issue.

What I understand is most modern NICs are built with hardware accelerated VLAN tagging, i.e. VLAN offload, and I'm certainly NOT having any issues with my x86 hardware, why did that issue happen with the R7800 in the first place, since with no doubt R7800 does use VLAN and a managed switch to offer its five ports? Is this the same kind of performance penalty as software NAT?

I'm hoping with this question answered I'll be able to make plans for my next level home network, I know and have used a managed switch to offload the tagging process, but it'll be more helpful to learn about the knowledge behind. Thanks.

Are you measuring inter-VLAN routing throughput between different subnets - or the throughput of a single VLAN between two different switch ports (one tagged, one untagged)?

I personally do not see anything like this limit on my very similar nbg6817, all LAN clients connecting via a single trunk port on my router to a managed switch and diverisifying from there (and even my WAN speed exceeds 80 MBit/s by a multitude, so I would notice).

1 Like

Thank you for bringing that up, I'm also looking for more probable causes.

When I had that issue with R7800, I was doing a hybrid port with one of the "LAN" ports, using "unmanaged" as the protocol to bridge the untagged and tagged interfaces to two SSIDs, of course when accessing wirelessly I thought the 80MB/s ceiling was just due to the wireless capabilities. And then one day I accessed from another LAN port that's untagged, hitting the same limit, the whole time I was just using the r7800 as a managed switch without getting into Layer 3 territory I think, which led me to think there's some deficit in the process.

Whether that data ever flew through the switched port to the CPU and back, or was dealt with inside the managed switch never reaching the CPU I don't know, probably the former as there was some high CPU load.

But I do hope to also get your opinion on the other matter, cross subnet routing. I do use multiple subnets and the CPU in my x86 machine is fully capable of routing at least 10Gbps I think, but it does raise the CPU usage when doing cross subnet traffic as I've noticed. What would be a more proper way to handle a future whole home 10Gb or more cross subnet routing? Is cross subnet routing itself causing the CPU usage or something else, is a Layer 3 switch the answer for that? Thank you.

In short @l3 switch yes... personally... I found the whole cisco 'core distribution edge' philosophy significantly helpful when it comes to understanding overall network design/performance...

you may wish to read some of their docs regarding this...

Just a quick search and I think it's similar to what I do with my Openwrt setup, albeit at different scale. I guess cost saving/performance/functionality can't be achieved all at once.

1 Like

what happens on small networks is all 3 get compressed into one... and it's more the 'logical philosophy' that is beneficial...

in particular tho' the 'core' gets compressed with 'distribution'... it is this distribution layer where all gains are to be had... (and what your question truly relates to)

That's really interesting to know, I'll do more reading into that solution, thank you!

1 Like

fundamentally this is true... what the above does is help you to isolate what-is-what and target the area that needs it...