SQM with OpenVPN and VPN by pass

So I have a custom x86 build with WAN and VPN interfaces eth1 and tun0 and I've been testing SQM but haven't managed to get good results with following scenario:

Some destination IPs by-pass VPN interface (using iptables --mark) but if there's download load on both eth1 (by-passed) and tun0 (VPN) and I have set SQM for both of these interfaces, pinging destination through tun0 I get very poor results. Pinging destination through eth1 is always fine. If there's no download load on eth1, then results are OK for pinging through tun0. Best results I get if I have only set up SQM for tun0 and there's only load for tun0. Adding SQM for eth1 gives poorer results in this case even if theres no load.

I have tried various configurations but not sure if this a setup that should work and if it is, what kind of SQM configuration should be used. Any ideas?

You could post the output of:
cat /etc/config/sqm
tc -s qdisc
ifconfig
ifstatus wan
maybe that will reveal anything. But is by any chance the VPN traffic also being transmitted over eth1?

Yes, VPN traffic goes through eth1 which I think makes this problematic. My WAN connection is 50/10Mbit DOCSIS 3.0 cable modem bridged. With eth1 and tun0 load ping 8.8.8.8 is around 60-100ms, without eth1 load and with tun0 load around 19-25 (normal). Traffic is generated from computer connected to wlan0. CPU is Intel N4200 Pentium quad-core (more than enough?). If I set tun0 download closer to eth1 download it gets worse with or without eth1 load.

root@OpenWrt:~# cat /etc/config/sqm

config queue 'eth1'
        option debug_logging '0'
        option verbosity '5'
        option qdisc_advanced '1'
        option squash_dscp '1'
        option squash_ingress '1'
        option ingress_ecn 'ECN'
        option egress_ecn 'NOECN'
        option qdisc_really_really_advanced '1'
        option iqdisc_opts 'nat dual-dsthost'
        option eqdisc_opts 'nat dual-srchost'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option linklayer 'none'
        option interface 'eth1'
        option download '45000'
        option upload '8000'
        option enabled '1'

config queue 'tun0'
        option debug_logging '0'
        option verbosity '5'
        option qdisc_advanced '1'
        option squash_dscp '1'
        option squash_ingress '1'
        option ingress_ecn 'ECN'
        option egress_ecn 'NOECN'
        option qdisc_really_really_advanced '1'
        option iqdisc_opts 'nat dual-dsthost'
        option eqdisc_opts 'nat dual-srchost'
        option interface 'tun0'
        option qdisc 'cake'
        option linklayer 'none'
        option upload '7000'
        option script 'piece_of_cake.qos'
        option download '40000'
        option enabled '1'
root@OpenWrt:~# tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc mq 0: dev eth0 root
 Sent 11522671208 bytes 8567133 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
 Sent 3342044332 bytes 2509450 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 1514 drop_overlimit 0 new_flow_count 166 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
 Sent 8180626876 bytes 6057683 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 1514 drop_overlimit 0 new_flow_count 118 ecn_mark 0
  new_flows_len 0 old_flows_len 1
qdisc cake 81b9: dev eth1 root refcnt 65 bandwidth 8Mbit besteffort dual-srchost nat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 128812839 bytes 1191251 pkt (dropped 15, overlimits 206719 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 191552b of 4Mb
 capacity estimate: 8Mbit
 min/max network layer size:           42 /    1514
 min/max overhead-adjusted size:       42 /    1514
 average network hdr offset:           14

                  Tin 0
  thresh          8Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        155us
  av_delay          9us
  sp_delay          1us
  backlog            0b
  pkts          1191266
  bytes       128828338
  way_inds           13
  way_miss          148
  way_cols            0
  drops              15
  marks               0
  ack_drop            0
  sp_flows            2
  bk_flows            1
  un_flows            0
  max_len          3162
  quantum           300

qdisc ingress ffff: dev eth1 parent ffff:fff1 ----------------
 Sent 3033032405 bytes 2207966 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-lan root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 81bd: dev tun0 root refcnt 2 bandwidth 7Mbit besteffort dual-srchost nat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 46216397 bytes 888395 pkt (dropped 13, overlimits 61919 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 66304b of 4Mb
 capacity estimate: 7Mbit
 min/max network layer size:           40 /    1378
 min/max overhead-adjusted size:       40 /    1378
 average network hdr offset:            0

                  Tin 0
  thresh          7Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        1.1ms
  av_delay         31us
  sp_delay          1us
  backlog            0b
  pkts           888408
  bytes        46220663
  way_inds         3545
  way_miss         2411
  way_cols            0
  drops              13
  marks               1
  ack_drop            0
  sp_flows           10
  bk_flows            1
  un_flows            0
  max_len          8501
  quantum           300

qdisc ingress ffff: dev tun0 parent ffff:fff1 ----------------
 Sent 2130748935 bytes 1597598 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
qdisc cake 81ba: dev ifb4eth1 root refcnt 2 bandwidth 45Mbit besteffort dual-dsthost nat wash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 3070264953 bytes 2207552 pkt (dropped 409, overlimits 3917178 requeues 0)
 backlog 7570b 5p requeues 0
 memory used: 321Kb of 4Mb
 capacity estimate: 45Mbit
 min/max network layer size:           60 /    1514
 min/max overhead-adjusted size:       60 /    1514
 average network hdr offset:           14

                  Tin 0
  thresh         45Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        5.8ms
  av_delay        1.7ms
  sp_delay        304us
  backlog         7570b
  pkts          2207966
  bytes      3070870777
  way_inds           17
  way_miss          166
  way_cols            0
  drops             409
  marks               0
  ack_drop            0
  sp_flows            2
  bk_flows            2
  un_flows            0
  max_len         36336
  quantum          1373

qdisc cake 81be: dev ifb4tun0 root refcnt 2 bandwidth 40Mbit besteffort dual-dsthost nat wash no-ack-filter split-gso rtt 100.0ms raw overhead 0
 Sent 2124029099 bytes 1592578 pkt (dropped 5020, overlimits 2688979 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 292608b of 4Mb
 capacity estimate: 40Mbit
 min/max network layer size:           30 /    1480
 min/max overhead-adjusted size:       30 /    1480
 average network hdr offset:            0

                  Tin 0
  thresh         40Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        206us
  av_delay         18us
  sp_delay          2us
  backlog            0b
  pkts          1597598
  bytes      2130748935
  way_inds           65
  way_miss         1943
  way_cols            0
  drops            5020
  marks               0
  ack_drop            0
  sp_flows           11
  bk_flows            1
  un_flows            0
  max_len          1480
  quantum          1220

With a 50/10 link once eth1 becomes loaded the OpenVPN traffic traversing it will only be considered as a single flow and hence will get less bandwidth share than you believe it would. You already configured per-internal-IP fairness and unfortunately that does not seem to help. But the stacked shaper design is fickle how about just using a shaper on eth1 (with your per-internal-IP-fairness configuration) how does that work? That probably still has some issues, but you will only have one shaper to deal with and not the interferences between two of them. And if that does not work well enough, try with the tun0 shaper set to 25/5 (for testing).

BTW, in that case for eth1 you need to configure the "Per Packet Overhead (byte):
" to 18 or safer 22 bytes and set "Which link layer to account for:" to "Ethernet with overhead", to make your shaper better deal with your link. For tun0 the "Per Packet Overhead (byte):" probably needs to be larger to account for the added OpenVPN encapsulation, but I really do not know the proper value (probably an additional IP header as well as something openvpn specific).

Using only eth1 shaper is what I tried first, so if I just disable tun0 shaper I get ping 8.8.8.8 between 25-60 so it's not very stable at least in this matter. Spikes to 50-60 are between 10 seconds or so and random packet loss. Using tun0 22.5/4000 things are much better around 20-25 but still spikes around 30-50 every 5 to 10 seconds.