I am running recent build of OpenWrt from the master branch on TP-Link Archer AX23, connected to TP-Link "smart" switch TL-SG116E.
Since I configured a trunk connection between the two devices I am seeing RxBadPkt counter on the switch increasing on the trunk port. (The trunk port is transmitting both untagged and tagged frames, for both the "default" lan and the "separated" vlan).
After some googling I found the following topics o TP-Link forums:
which if I understood correctly basically say that the RxBadPkt counter increase is a false positive. The hardware is miscounting good tagged frames as bad ones because it is hardwired to check the frame size for the minimal size and while expecting the minimal frame size for untagged packets to be 60 bytes, it assumes that the minimal size of the tagged frames is 64 bytes (60 + 4 byte tag).
The response on the forum is not very clear about whether the 64 byte limit for tagged frames is actually a real requirement or just a bad implementation of the standard.
My hypothesis is though that my OpenWrt router is sending the small payloads padded to 60 bytes even for tagged frames and this triggers the bad packet counter on the switch. The idea how to verify this would be to set OpenWrt to pad the frames to 64 bytes regardless whether they are tagged or untagged. This should satisfy the switch and keep the RxBadPkt clean.
The question is, whether it is possible to do, and if yes, how?
Can you configure the port to only carry tagged frames? IIRC, it's recommended to not mix tagged and untagged as that might lead to different sorts of problems.
I did configure the trunk between the switch and the router to pass only tagged traffic. As expected, it did not help ("bad" frames counter started to increase). As I already wrote in my OP, the problem seems to be in the TP-Link switch and its misdetection of the bad frames.
In particular, if the TP-Link switch misdetects the tagged frames as bad ones (those with minimal size), switching the trunk connection to only tagged traffic should make the misdetection happen more often.
My goal is to avoid this misdetection in order to be able to use RxBadPkt counter for actual bad frames detection.
So basically you're trying to work around a broken switch...
I hate to say that, but: get yourself a decent switch. I replaced my two TL-SG108E because of their stupid behavior and nobody even wants them used for cheap.
You are basically right. I was just curious if I could possibly do it, by tweaking OpenWrt (which is BTW running on TP-Link - ramips arch - and I my suspicion is that in the switch there is also a Ralink hw).
Any idea, what might be a good candidate? Until now I was running 8 port Ovislink (Airlive) switch. Which was working fine, but was already abandoned for probably more than 15 years. I would run it still, if it did not crash when I was trying to configure VLANs on it :).
I have two (5 and 8 port) Netgear switches, their "easy managed" (cheap) line, but they have some other quirks (for example I cannot connect to their web interface over wireguard tunnel, because they cannot handle shorter than default MTU frames).
I do not need any advanced features apart from those already present on those Netgears though (VLAN, IGMP snooping, and eventually load balancing - with some access to set them up).
Only 16 port i am using is T2600G-18TS 2.0
it is working as expected, full flagged managed switch, 16+2
only drawback is that DDM is missing on SFP ports, so you will never have info about laser dbm/temperatures
I replaced all my switches by models that can run OpenWrt to have a unified configuration experience. I have some small and cheap models (like Netgear GS308T, GS108Tv3) and a few 24/48-port models (HP JG927A, ZyXEL GS1900-24HP, D-Link DGS-1210-28MP). There are not many 16 port models that are supported, but the 24-port models can be found for cheap on the used market (sometimes below €50,-).
So, I replaced the buggy switch with a new one - TL-SG3210 v3.0, flashed the latest firmware 3.0.8, configured the same way and this time I am seeing Jumbo frames counter increasing in the "Statistics" on the trunk port. Despite the fact that according to "System Info" I have "Jumbo Frame" disabled.
My suspicion is that now the switch counts tagged packets of maximal length wrongly as Jumbo frames.
Do you see the similar behavior (if you run a tagged VLAN on any port)?
yes
if you look at this line 1024-to-1518-Octets Packets:
you will notice that thes counter is increasing
and default jumbo frame is 1518 on switch
it could be called bug, or miss interpreting things, but yes, in default condition jumbo frame is increasing too
now, my switch is set up for 9000 jumbo
and yes, i am using many vlans
in this setup, i you don't use 9000 byte frames, jumbo counter stay well below 1024-to-1518-Octets Packets
ok, looks like tp-link have their cosmetic problems
bright side of things is that it is only WEB statistic, and lldp/snmp/ddm/l2/l3/arp&dhcp protection,rtsp and most important, multicast snooping works as expected
Thanks for the checking and the analysis. I guess it is still better if it miscounts the jumbo frames, instead of the undersized packets as the previous TP-Link switch, but I wonder, why is it so difficult for TP-Link to implement the diagnostics right.
My venerable ovislink switch (+15 years old) gets me far more detailed error statistics and most importantly is right even on trunk ports. The problem it has is it crashes when configured for VLANs (but then honors the config) and needs to be "reinitialized" in order to continue working.
Unfortunately, I already made an experience, where I had a problematic wiring, and the traffic diagnostics on the switch was the actual way how I noticed that, so I am really picky about the diagnostics being right.