Your dslreports results are very good. It's hard to know whether the hitreg / micro lag is real unless you experience it consistently in the two conditions, if it's just an occasional feeling, variation in the traffic at the server and etc are also factors that would confuse things.
is a link to a post where I give examples of how to add custom firewall commands to mark DSCP, you can put this kind of thing into your /etc/firewall.user But to mark DSCP properly requires being able to decide which packets to prioritize. You could for example ensure that your gaming rig gets a static IP, and then just tag all the UDP packets to or from that rig.
As I said, instead of doing both an uplink and a downlink on WAN, just do an uplink on wan and an uplink on LAN (with the speed of your ISP downlink). Then packets coming in WAN will run through the iptables and be queued / shaped as they leave LAN. Note that this ignores traffic going to wifi. So you might do a simple piece of cake downlink on WAN in addition to the uplink on LAN so that total bandwidth is limited not just wired LAN bandwidth.
shapers work by delaying packets very slightly so as to meet the bandwidth requirements, so you will experience somewhat slightly more delay during idle than you would without a shaper, but you will experience substantially less delay during load with the shaper, basically things get more consistent.
One issue is your uplink is about 3.7Mbps, the largest packet you can send is probably 1500 bytes. At 3.7Mbps a 1500 byte packet takes 3.2 ms to send down the wire. So during load the fastest your system will be able to react to a keypress is 3.2ms and there is nothing you can do to reduce that (except ensure that your game is the only thing sending packets). Game packets are probably only a few hundred bytes, so a 200 byte packet takes 0.4 ms to send. I don't know what the reaction time is for a twitch gamer, but looking online suggests it's about 100ms (more like 200 for a non-gamer). A bigger issue then is probably packet drop rather than packet delay. If the game is sending a game state every say 100ms and you lose a packet in flight, it will appear that you have a 200ms delay. Even if the game is sending state every say 20ms, loss of a packet will be potentially noticeable as you increase your delay from 20ms to 40ms
using layer cake and DSCP should prioritize your game packets so that they don't get delayed and more importantly dropped. simpler systems like piece of cake will not know to avoid dropping your game packets, as they try to get fair bandwidth usage to all the different machines.