Yep, look up dlakeland's threads on gigabit routing, good reference on the heavy task of shaping that much data rate, to get an idea what's needed to do it properly. Pretty much every off the shelf router currently, will not be up to 1400mbit. Though, some x86 mini PC's, and a RPI4, would be.
I used to run a C7, on a 300/30mbit cable connection. The C7 has only the CPU to shape 110-120mbit. But, I was able to get perhaps 3/4 of the debloating benefits by just shaping the uplink. 30mbit was easy for the C7, and I got a surprising amount of improvement on the downlink. Helped a lot, with my oversubscribed and lag bursty cable service, till I broke down and got a little x86 box to do the heavy lifting. Now, even upgraded to 940/35mbit, I can SQM all day with CPU to spare...
Software offload... hmmm... think its not supposed to mess with SQM? Hardware does? Both might effect the queuing efforts negatively. I'm not sure where things are at, currently. I'd avoid both, to be on the safe side.
I would suggest reading your way thru the 3 SQM doc sections on the main OpenWrt site, including the "make cake sing and dance" section, and following that. Pretty much what I've been using, and I've been pretty happy with it.