I tried with inserting below two rules before 'flow add':
ip protocol udp ip saddr 192.168.2.2 accept
ip protocol udp ip daddr 192.168.2.2 accept
meta l4proto { tcp, udp } flow add @ft
Works pretty well that all UDP packets originated from or destinated to 192.168.2.2 will remain on slow flow and none out of order packets seen even going as high as 900Mbps
So seems like this could be a workaround if necessary.
Now I recall you intended to create a 'flow_handler' and jump to flow_handler to process 'flow add' and 'accept' with pre-condition 'ct state { established, related }'. I think that appears to be a good idea. So it looks something like this:
In chain 'forward'
ct state { established, related } jump flow_handler
In the flow_handler
<flowtable exemption to be inserted by users here>
flow add @ft
accept
Files in /etc/nftables.d/ like gamer.nft will be put right after flowtable definition before any other rules, see templates/ruleset.uc
chain handle_offload {
iif !="lo" udp sport 9010 accept
}
If other chain is emitted then this one will be prepended, otherwise hang idle.
Evaluate with nft -c -d netlink -f file_with_table_inet_around.nft some constrcts are hell slow, like meta vs packet access or iif vs iifname. More or less you need to permutate particular construct until fastest is reached, you can add 1000 repetitions to LAN ping to visually measure, then add counter between conditions and reorder them to have most picky/selective/fastest first.
Had I known this feature, I won't have created my own 'table-prepend' user includes. LOL
This is a better version than my quick & dirty test cases. So rules like this could be inserted into 'flow_handler' 'handle_offload' by user includes with an infrastructure like this: (changed slightly to better align with your pull-request codes)
By checking your pull-request, I think you still intend to do so mostly:
Now my only comment is you could merge the two 'ct state' statements into a single one like above.
Interesting idea for heuristic (I hope)
Add first handle_offload chain rule
meta length lt 128 accept
Tried 64...1024 , the jitter is lowest around 100...300.
I'd speculate its some CPU load added to individual packet, becoming insignificant for bigger ones.