Home Hub 5A: IPTV streaming problem

Hi,
I'm having a problem with interactive/streaming traffic. In particular, with IPTV streaming of HD (720p-1080p) content for which should be enough 5-10 Mbit/s download connection.
The video content goes slowly and sometimes it blocks for a second or less.
Since I already checked and solved some problems with the ADSL link status with my ISP as reported here, I think it could be related to bufferbloat.
I'm trying to solve such problem with QoS SQM as described here and here.
I have the latest openWRT 18.06.1 r7258-5eb055306f with linux kernel 4.9.120.
OpenWRT reports for ADSL Data Rate: 16.547 Mb/s / 909 Kb/s and Max. Attainable Data Rate (ATTNDR): 16.660 Mb/s / 909 Kb/s. My ADSL2 parameters: G.992.5 (ADSL2+) with PPPoATM and MTU of 1478 (suggested by the ISP).
I setup layer_cake QoS with 14 Mbit/s download and 750 kbit/s (about 85% of the max) as you can see:

cat /etc/config/sqm

config queue 'eth1'
	option qdisc_advanced '0'
	option debug_logging '0'
	option verbosity '5'
	option linklayer 'atm'
	option overhead '44'
	option interface 'pppoa-wan'
	option qdisc 'cake'
	option enabled '1'
	option script 'layer_cake.qos'
	option download '14000'
	option upload '750'
cat /etc/config/qos
cat: can't open '/etc/config/qos': No such file or directory
tc -d qdisc
qdisc noqueue 0: dev lo root refcnt 2 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
qdisc noqueue 0: dev br-lan root refcnt 2 
qdisc noqueue 0: dev eth0.1 root refcnt 2 
qdisc noqueue 0: dev wlan1 root refcnt 2 
qdisc cake 8025: dev pppoa-wan root refcnt 2 bandwidth 750Kbit diffserv3 triple-isolate split-gso rtt 100.0ms atm overhead 44 
qdisc ingress ffff: dev pppoa-wan parent ffff:fff1 ---------------- 
qdisc cake 8026: dev ifb4pppoa-wan root refcnt 2 bandwidth 14Mbit besteffort triple-isolate wash split-gso rtt 100.0ms atm overhead 44
tc -s qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn 
 Sent 2703921609 bytes 2221528 pkt (dropped 0, overlimits 0 requeues 2) 
 backlog 0b 0p requeues 2
  maxpacket 1514 drop_overlimit 0 new_flow_count 236 ecn_mark 0
  new_flows_len 0 old_flows_len 0
qdisc noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 8025: dev pppoa-wan root refcnt 2 bandwidth 750Kbit diffserv3 triple-isolate split-gso rtt 100.0ms atm overhead 44 
 Sent 41757371 bytes 298681 pkt (dropped 4472, overlimits 317914 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 374400b of 4Mb
 capacity estimate: 750Kbit
 min/max network layer size:           34 /    1470
 min/max overhead-adjusted size:      106 /    1696
 average network hdr offset:            0

                   Bulk  Best Effort        Voice
  thresh       46872bit      750Kbit    187496bit
  target        379.4ms       23.7ms       94.8ms
  interval      758.8ms      118.7ms      189.8ms
  pk_delay          0us       34.1ms       13.4ms
  av_delay          0us        2.2ms        833us
  sp_delay          0us         18us         20us
  backlog            0b           0b           0b
  pkts                0       302794          359
  bytes               0     47128796        56904
  way_inds            0         7017            0
  way_miss            0         4257           33
  way_cols            0            0            0
  drops               0         4472            0
  marks               0            0            0
  ack_drop            0            0            0
  sp_flows            0            2            0
  bk_flows            0            2            0
  un_flows            0            0            0
  max_len             0         1470          471
  quantum           300          300          300

qdisc ingress ffff: dev pppoa-wan parent ffff:fff1 ---------------- 
 Sent 1231679586 bytes 907042 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc cake 8026: dev ifb4pppoa-wan root refcnt 2 bandwidth 14Mbit besteffort triple-isolate wash split-gso rtt 100.0ms atm overhead 44 
 Sent 1229003406 bytes 905124 pkt (dropped 1918, overlimits 1583373 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 275968b of 4Mb
 capacity estimate: 14Mbit
 min/max network layer size:           30 /    1470
 min/max overhead-adjusted size:      106 /    1696
 average network hdr offset:            0

                  Tin 0
  thresh         14Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay        2.0ms
  av_delay        432us
  sp_delay         16us
  backlog            0b
  pkts           907042
  bytes      1231679586
  way_inds        67661
  way_miss         4317
  way_cols            0
  drops            1918
  marks               0
  ack_drop            0
  sp_flows            3
  bk_flows            2
  un_flows            0
  max_len          1470
  quantum           427

This is the screenshot with the previous configuration and with DSLreport configured as described [here] by @moeller0 ([SQM/QOS] Recommended settings for the dslreports speedtest (bufferbloat testing)).

The strange thing is that a neighbourhood of mine (same street), with theoretical connection of 8 Mbit/s and practical of 5 Mbit/s and different ISP, can see the same content very well (with two devices :frowning: ).
What could be the problems?
Thank you

Is there any WLAN link on your path? Temporarily replace it with a wired connection if possible.

Monitor the DSL link usage to check if you can see any reduced throughput. How long does it usually take until the video quality drops?
I would suggest to run a speed test for a longer time to reproduce the issue. You could also double-check with a different video source to rule out any bottleneck at the server end.

Does your IPTV stream use TCP? Cake's ack-filter on egress could help if that's the case. Since the down/up ratio of 18.2 (16547/909) is very asymmetric, the ACKs need a significant share of the upstream capacity, competing with other uploads. This can reduce the IPTV download throughput. Also make sure the TV/media player is the only device putting load on the DSL, at least during the test.

No, there is not any wireless link. I have modem/router home hub 5a (gigabit ethernet)-> hub netgear gs605v4 (gigabit ethernet)-> smart TV samsung (fast ethernet) or smart box (fast ethernet).
However, the video quality is good, but it is slow and not in real time e.g. sometimes it blocks for a second.
In order to perform the test, I used DSLreport with 60 seconds (maximum allowed). Do you know a better software or a Web site?
I asked more information to the server provider even if I do not think that it could be the problem since for other neighbourhood it works well.
Then I installed tcpdump, I saved the data via tcpdump -i pppoa-wan -v -U -s0 -w capture.cap and I analysed it via wireshark. With a big surprise for me, I found that the IPTV uses TCP. So it is point-to-point connection instead a more clever multicast connection (RTP over UDP).
The usage of TCP explains the reason for which the video quality is good and slow and as you suggested maybe is related to ACK transmission i.e. in wireshark I saw many duplicated ACKs and many TCP out-of-order segment and reassembled PDU.
How can I setup ack-filter? I found this interesting paper about piece of CAKE and openWRT. They show that ack-filter with aggressive mode produces great benefit. Moreover, the DiffServ4 with BestEffort option achieves a great improvement.
Do you know how to enabled them? I only found this on bufferbloat mailing list
option eqdisc_opts 'nat dual-srchost ack-filter'
and this comprehensive man page.
Finally, I found that during the evening and night when the Internet connection is crowded, the speed is reduced a lot and it could be the source of the problem.
Morning

Evening

or

Note: the evening and night charts for upload have big initial spike at the beginning and at the end of the test.

I agree this is the source of the problem. Check your ADSL link status during such a time of congestion. Is the data rate also reported with lower numbers? Do you see a higher rate of errors?

Yes, that should be right, add ack-filter to option eqdisc_opts.
For 4.83Mb/s down, 0.112 Mb/s up, the ack filter might help a bit. But at 1.83/0.009, IPTV over TCP is unlikely to work at all. Even web browsing will be very slow.

1 Like

I made some further test by using the following setting:

ingress: nat dual-dsthost diffserv4 pppoa-vcmux
egress: nat dual-srchost diffserv4 ack-filter-aggressive pppoa-vcmux

First, according to the paper, diffserv4 is better than diffserv3 since it takes into account also video tin while ack-filter-aggressive is better than ack-filter.
Moreover, according to the documentation here, nat together with dual-dsthost and dual-srchost are more suited than triple-isolate. Finally, I also added pppoa-vcmux.
Some benchmarks seem to give better results.
Morning

Afternoon

I will report how it works during the evening and weekend.

In the meantime, I'm also trying to use multiple WAN connections with load balancing: main ADSL2 connection (wan) and backup 4G connection (via WiFi tethering from a smartphone) (wwan) with limited amount of monthly traffic.
I installed successfully mwan3 package and I setup load balancing between both connections according to the documentation.
I would like to use only main wan for all traffic except for traffic coming to an specific fixed IP belonging to the smart TV.

cat /etc/config/mwan3 

config rule 'IPTV_out'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'
	option src_ip '192.168.1.200'

config rule 'IPTV_in'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'
	option dest_ip '192.168.1.200'

config rule 'default_rule'
	option dest_ip '0.0.0.0/0'
	option proto 'all'
	option sticky '0'
	option use_policy 'wan_only'

config policy 'wan_only'
	list use_member 'wan_m1_w3'
	list use_member 'wan6_m1_w3'

config policy 'balanced'
	option last_resort 'unreachable'
	list use_member 'wan_m1_w3'
	list use_member 'wan6_m1_w3'
	list use_member 'wwan_m1_w2'

config policy 'wan_wwan'
	option last_resort 'unreachable'
	list use_member 'wan_m1_w3'
	list use_member 'wan6_m1_w3'
	list use_member 'wwan_m2_w3'

config member 'wan_m1_w3'
	option interface 'wan'
	option metric '1'
	option weight '3'

config member 'wan6_m1_w3'
	option interface 'wan6'
	option metric '1'
	option weight '3'

config globals 'globals'
	option mmx_mask '0x3F00'
	option local_source 'lan'

config interface 'wan'
	option enabled '1'
	option family 'ipv4'
	option reliability '2'
	option count '1'
	option timeout '2'
	option interval '5'
	option down '3'
	option up '8'
	option initial_state 'online'
	list track_ip '1.1.1.1'
	list track_ip '208.67.222.222'
	list track_ip '208.67.220.220'
	option track_method 'ping'
	option size '56'
	option check_quality '0'
	option failure_interval '5'
	option recovery_interval '5'
	option flush_conntrack 'never'

config interface 'wan6'
	option enabled '0'
	list track_ip '2001:4860:4860::8844'
	list track_ip '2001:4860:4860::8888'
	list track_ip '2620:0:ccd::2'
	list track_ip '2620:0:ccc::2'
	option family 'ipv6'
	option reliability '2'
	option count '1'
	option timeout '2'
	option interval '5'
	option down '3'
	option up '8'

config interface 'wwan'
	option enabled '1'
	option initial_state 'online'
	option family 'ipv4'
	option track_method 'ping'
	option count '1'
	option size '56'
	option check_quality '0'
	option timeout '2'
	option interval '5'
	option failure_interval '5'
	option recovery_interval '5'
	option flush_conntrack 'never'
	option down '3'
	option up '8'
	list track_ip '1.1.1.1'
	list track_ip '208.67.222.222'
	list track_ip '208.67.220.220'
	option reliability '2'

config member 'wwan_m1_w3'
	option interface 'wwan'
	option metric '1'
	option weight '3'

config member 'wwan_m1_w2'
	option interface 'wwan'
	option metric '1'
	option weight '2'

config member 'wwan_m2_w3'
	option interface 'wwan'
	option metric '2'
	option weight '3'

In this way, the smart TV should connect to Internet if one between wan and wwan is up. However, only when wan is up the smart TV works.

mwan3 status
Interface status:
 interface wan is online and tracking is active
 interface wan6 is unknown and tracking is down
 interface wwan is online and tracking is active

Current ipv4 policies:
balanced:
 wwan (40%)
 wan (60%)

wan_only:
 wan (100%)

wan_wwan:
 wan (100%)


Current ipv6 policies:
balanced:
 unreachable

wan_only:
 unreachable

wan_wwan:
 unreachable


Directly connected ipv4 networks:
 127.0.0.0/8
 192.168.43.74
 224.0.0.0/3
 127.0.0.0
 192.168.1.1
 192.168.43.0/24
 94.38.203.143
 192.168.43.0
 192.168.43.255
 213.205.53.51
 127.255.255.255
 192.168.1.0/24
 127.0.0.1
 192.168.1.255
 192.168.1.0

Directly connected ipv6 networks:
 fe80::/64
 fd02:28a1:4744::/64

Active ipv4 user rules:
    0     0 - balanced  all  --  *      *       192.168.1.200        0.0.0.0/0            
    0     0 - balanced  all  --  *      *       0.0.0.0/0            192.168.1.200        
    7   500 - wan_only  all  --  *      *       0.0.0.0/0            0.0.0.0/0            

Active ipv6 user rules:
   23  1868 - wan_only  all      *      *       ::/0                 ::/0

There really is no objective better for diffserv schemes only better or worse fitting to your use-case ;), same for the ack-filter, but since you tested that these work better for you, you can safely ignore my comment (which is mainly meant for others reading your post).

For benchmarking the dal-xxxhost modes seem better suited since their behaviour under typical benchmark loads seems easier to understand and predict than triple-isolate, for more realistic traffic patterns the differences between the two get less clear, but personally I also prefer the conceptual clarity of the dual-xxxhost keywords.

I made other test during the most crowed time of the day (evening-night).




Unfortunately, during some time both upload and download speed are too lows.
Moreover, all the countermeasure about QoS that I enabled are not able to solve the problem even if they improve the situation.
So the last possibility that I have before to give up is to try with load balancing.
Even with this I made some test, but I'm having two problems:

  1. Enabling load balancing between ADSL (wan) and 4G (wwan) works only when both are online. In theory, I expected to work even if one of the two goes offline and openWRT should move accordingly the load the online link.
  2. I would like to use load balancing only for one specific IP and the ADSL (wan) for all the network, but it does not seem to work.

Do you have any suggestion?

Did you check the ADSL link status during this time?

Yes, it was more or less as usual. OpenWRT reported for ADSL Data Rate: 16.517 Mb/s / 912 Kb/s and Max. Attainable Data Rate (ATTNDR): 16.460 Mb/s / 909 Kb/s.

This seems to indicate congestion upstream of your access link, maybe the uplink of the DSLAM/MSAN. Could you try to run mtr against, say the website of a local university (to get a reasonably close well connected ICMP responder that is not colocalizing with your ISP) during the peak hours in the evening. The hallmark of a congested link is that all RTTs after that hop show an increased RTT compared to non-peak hours. Please note that reading traceroute/mtr is not as easy as it seems initially (see https://www.nanog.org/meetings/nanog47/presentations/Sunday/RAS_Traceroute_N47_Sun.pdf for more details)

I don't know the mwan package, but this is easily doable with policy routing in linux. The more complicated thing is if you want IPTV host to use ADSL unless it's too congested, and then failover... Because you need to detect the congestion.

Thank you for the material, I found it very interesting.
In order to use mtr as suggested by you, I first found the IP addresses of video source via tcpdump+wireshark (as explained in my second post of this thread). Then, I ran mtr against my "local" university (about 40 km, I live in the country side. I also tried mtr against some companies that are closed (10-15 km) and have optical fiber connection without any great different on results).
Finally, I ran mtr against the two most used IP addresses of video source (should be one for accounting and one for video stream) and the results are very similar to those of my local university. In particular, in any case (even if the IP is different) the third and fourth hops have high worst and stdev values even if the average value is normal.
Can be this the source of problem?

University
mtr%20unisi%20obscured
Account
mtr%20cloudflare%20obscured
Stream
mtr%20worldstream%20obscured

Note: I have to repeat the test during the most crowded time on weekend.

P.S. After the test my IP has been banned (HTTP 403 forbidden error) and I cannot see the IPTV any more even if I can reach it via ping, traceroute, etc. I have to use VPN in order to access it. It is very strange behaviour...

From my understanding this is harmless, it mostly indicates that these hops are not optimised for responding to ICMP/UDP probes, this is quite typical behavior for routers. If all Wrst RTTs would be increased after those hops that would be different...

Sure, do this; but I am pessimistic that this is the root cause of your problems...

Ypu need to talk to the people managing the IPTV source, but it might indicate that it did not like you probing it constantly and might have misdiagnosed your network debugging with a nefarious attack on its services...

According to mwan documentation, it exploits policy routing of the linux kernel. In particular:

  1. mwan3 uses normal Linux policy routing to balance outgoing traffic over multiple WAN connections
  2. Linux outgoing network traffic load-balancing is performed on a per-IP connection basis
  3. As such load-balancing will help speed multiple separate downloads or traffic generated from a group of source PCs all accessing different sites but it will not speed up a single download from one PC

In the meantime, I solved my first problem of load balancing between ADSL (wan) and 4G (wwan). Now I can randomly connect and disconnect, the two connections and it works well. I also added more 4G (wwan1 e wwan2) connections and the "hand-over" works well.
However, I'm still struggling against the second problem i.e. I cannot understand why load balancing for one specific IP does not work.

cat /etc/config/mwan3 

config rule 'IPTV_out'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'
	option src_ip '192.168.1.100'

config rule 'IPTV_in'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'
	option dest_ip '192.168.1.100'

config rule 'default_out'
	option proto 'all'
	option sticky '0'
	option src_ip '0.0.0.0/0'
	option use_policy 'wan_only'

config rule 'default_in'
	option dest_ip '0.0.0.0/0'
	option proto 'all'
	option sticky '0'
	option use_policy 'wan_only'

These rules should work as described in the documentation (matched from top to the bottom). In order, to understand better the behaviour I setup a fixed IP (192.168.1.100) to my laptop and I discovered that balanced mode works well only if it is applied to the whole network (only rule 4). The rules 1,3 do not produce any effect and can be removed.

In the Luci interface->status->load balancing->details, I saw this:

Active ipv4 user rules:
   66  4636 - balanced  all  --  *      *       192.168.1.100        0.0.0.0/0            
    0     0 - balanced  all  --  *      *       0.0.0.0/0            192.168.1.100        
  718 45413 - wan_only  all  --  *      *       0.0.0.0/0            0.0.0.0/0            
    0     0 - wan_only  all  --  *      *       0.0.0.0/0            0.0.0.0/0

It is very strange, rules 1,3 have active connections, but in practice they do not work.
It could be a bug? Wrong configuration?

I triple checked all the configurations (metric, gateway and conntrack for wan interface, etc.)
I cannot understand why load balancing for one specific IP does not work.
@feckert @ptpt52 do you have any idea about this?
Thank you

I repeated the test during weekend. The results are very similar. In addition, there are some lost packets.
University
university
Account
account
Stream
stream

I do not think that about 2-4% of lost packets could be the source of problem.

Well, the Stream results look noticeably worse especially the last two hops, as compared to your off-peak hours test above, but that still seems somewhat acceptable to me. The small packet loss should not be a reason off concern, but you could use wireshark during peak-hour streaming and try to see how many missing packets you see for the real traffic.

@erotavlas
This is a stupid question, but do you realy need rules *_in?
Is it not enough to use only *_out?
The conntracking knows the way back to the host who starts the connection.

@feckert thank you very much for you reply.
I do not know exactly how mwan3 works. What I know come from the official documentation here.
I would like to use balanced rules only for a single device while wan_only rule for the rest of the network.
However, I found that the *_in rules are useless (no traffic). Do you confirm this? If so, what are the purpose of *_in i.e. destination address in the documentation:
dest_ip Match traffic directed to the specified destination ip address

What about option flush_conntrack option under interface of mwan3?
The documentation only talks about conttrack under the firewall setting option conntrack '1'.

In order to better understand the behaviour, I changed a lot the configuration of my network.
I used VLAN in order to separate LAN, LAN_IPTV and WLAN traffic (as shown in the diagram).

network
My new mwan3 configuration:

cat /etc/config/mwan3 

config rule 'lan_iptv_out'
	option src_ip '192.168.2.0/24'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'

config rule 'lan_iptv_in'
	option dest_ip '192.168.2.0/24'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'

config rule 'wlan_out'
	option src_ip '192.168.3.0/24'
	option proto 'all'
	option sticky '0'
	option use_policy 'wan_only'

config rule 'wlan_in'
	option dest_ip '192.168.3.0/24'
	option proto 'all'
	option sticky '0'
	option use_policy 'wan_only'

config rule 'lan_out'
	option src_ip '192.168.1.0/24'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'

config rule 'lan_in'
	option dest_ip '192.168.1.0/24'
	option proto 'all'
	option sticky '0'
	option use_policy 'balanced'

In this scenario, the traffic of WLAN subnet is completely isolated from the other. However, there is still a problem between the two VLAN subnets. In particular, if I use the setup reported above, all works well while if I setup balanced only for the LAN_IPTV (wan_only for LAN) it does not work.
This behaviour is not normal in my opinion. Why is it necessary to have balanced LAN? This behaviour resembles the previous case without subnet and VLAN in which I need to have all the network balanced instead of a single IP.
P.S.
The *_in rules are useless also in this scenario.

Active ipv4 user rules:
    1    56 - balanced  all  --  *      *       192.168.2.0/24       0.0.0.0/0            
    0     0 - balanced  all  --  *      *       0.0.0.0/0            192.168.2.0/24       
    1   118 - wan_only  all  --  *      *       192.168.3.0/24       0.0.0.0/0            
    0     0 - wan_only  all  --  *      *       0.0.0.0/0            192.168.3.0/24       
  497 33772 - balanced  all  --  *      *       192.168.1.0/24       0.0.0.0/0            
    0     0 - balanced  all  --  *      *       0.0.0.0/0            192.168.1.0/24

However, I found that the *_in rules are useless (no traffic). Do you confirm this? If so, what are the purpose of *_in i.e. destination address in the documentation:

I think the rules are useless. Only add a roule which match the source address and then apply the right policy. The returned package should find the way to the right client.

What about option flush_conntrack option under interface of mwan3

This will flush the conntrack table of the kernel on up/down events of the netifd

Please also set the local_source to none there are a lot of problems with this configuration. This was a fault decision on the last release. I am working on a solution.
See
https://github.com/TDT-AG/packages/tree/pr/20180918-net-mwan3-fix-router-traffic

What about the firewall? Are the zone lan_iptv set up right?