I have noticed that both L2 patches (original & modifed) are noticeably increasing the CPU utilization, so I decided to test the NAT-ed throughput and saw a 35% throughput drop.
Test: performance governor, iperf3, single stream, WAN is configured with a static IP, router between two computers (LAN & WAN ports), all other normal setup (firewall rules, etc), LAN is on CPU0 & WAN is on CPU1.
Software Offload Enabled:
19.07: upload 700Mbits / download 800Mbps
L2 patch: upload 450Mbits / download 475Mbps
Software Offload Disabled:
19.07: upload 622Mbits / download 730Mbps
L2 patch: upload 403Mbits / download 400Mbps
Results from multiple runs would fluctuate a bit, but overall would stay consistent with the values above.
Any ideas/hints/suggestions will be appreciated as this is a totally unfamiliar area to me.