Support for RTL838x based managed switches

While looking into L3 routing on rtl838x I stumbled over a mention of an undocumented hardware limitation at the HPE community

The ARP table on the HP 1920 24G (JG924A) can only reliably handle 60 entries. Go over this and you will start to experience the loss in performance reported here.
In English, If you have more than 60 devices across all of your VLANS you will have this problem. This is a pretty significant limitation for what is meant to be a layer 3 switch. I also note than this is not mentioned in any of the devices specifications.
We are currently talking to our supplier about returning our 1920 24G as it is not fit for purpose. It was sold as a Layer 3 switch. The specifications issued by HP make no reference to only supporting a maximum of 60 ARP entries (the ARP table can actually hold many more). If you contact HP support you eventually get this information but they will make you jump through hoops before they admit it.

See: http://h20565.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7399514&docId=mmr_kc-0130409&docLocale=en_US
My comments:

  • On 1920-24G I managed to reproduce this slow routing speed condition
  • On 1920-24G there is no syslog event generated when ARP table grows beyond 64 entries.
  • On 1920-24G the ARP table can't grow beyond 256 entries, so no CPU based routing for host #257! This condition does generate a syslog message:"Oct 31 08:25:14 2015 HP1920G %%10TPMB/4/ARP TABLE FULL(t): 1.3.6.1.4.1.25506.2.38.1.2.4.1: ARP table is full and number of items is 256."
  • HP doesn't specify IPv6 max neighbor table size, so we can assume same limits apply.
    Note: On IPv6 number of hosts does not equal to neighbor table size! A host uses multiple addresses (with its link-local and privacy extension addresses)
    On 1920-24G I did managed to get this IPv6 event at 256 entries:
    "Oct 31 09:11:02 2015 HP1920G %%10TPMB/4/ND TABLE FULL(t): 1.3.6.1.4.1.25506.2.38.1.5.4.1: Neighbor table is full and number of items is 256."
  • Switch reboot is seems useless to me , to make neigbors time out earlier use commands like:
    arp timer aging 4 (minutes, default=20)
    ipv6 neighbor stale-aging 1 (hours! default=4)
    After ARP table size is back within safe limits (60) , hardware speed returns

So back to the 1920-24 I'm testing with

# brctl showmacs switch | wc -l
177

every local port has 2 or 7 entries for it's local mac address. 115 of the 177 entries are local, for 28 ports

# cat /sys/kernel/debug/rtl838x/l2_table | grep mac | wc -l
148

looks like it is one entry per VLAN

  mac 5c:8a:38:86:6a:e3 vid 0 rvid 0
  mac 5c:8a:38:86:6a:e3 vid 1 rvid 1
  mac 5c:8a:38:86:6a:e3 vid 10 rvid 10
  mac 5c:8a:38:86:6a:e3 vid 20 rvid 20
  mac 5c:8a:38:86:6a:e3 vid 22 rvid 22
  mac 5c:8a:38:86:6a:e3 vid 30 rvid 30
  mac 5c:8a:38:86:6a:e3 vid 40 rvid 40
  mac 5c:8a:38:86:6a:e4 vid 0 rvid 0
  mac 5c:8a:38:86:6a:e4 vid 1 rvid 1
  mac 5c:8a:38:86:6a:e4 vid 10 rvid 10
  mac 5c:8a:38:86:6a:e4 vid 20 rvid 20
  mac 5c:8a:38:86:6a:e4 vid 22 rvid 22
  mac 5c:8a:38:86:6a:e4 vid 30 rvid 30
  mac 5c:8a:38:86:6a:e4 vid 40 rvid 40

iperf between the same two nodes using L2 and L3

[  1] 0.0000-10.0717 sec  30.2 MBytes  25.2 Mbits/sec
[  2] 0.0000-10.0506 sec   878 MBytes   733 Mbits/sec
[  3] 0.0000-10.1007 sec  30.7 MBytes  25.5 Mbits/sec

so brought it down below 60 entries, but that did not change anything. So it's still routing on CPU... Still leaving this here for later.

1 Like