Here is something weird from your most recent tcp dump:
We see that you connect to 45.56.96.24 send a GET request, and then we receive an ACK packet 67 (5th line) coming from port 1 not port 80... hunh? So your laptop retransmits... and we get another ACK from the wrong port! and then retransmit! and then another ack from the wrong port!
Everything we get from the internet has the wrong source port!!!
Pretty sure this is our problem
but this does not seem like a normal kind of problem.
I now think firewall is the issue. if we can avoid having source port be mangled by firewall we will have our working system!
EDIT: I notice in the veth capture that in fact the source port is still 80, and on the bridge capture as well. so somehow between the time it goes into veth0 and the time it comes out wlan1 the source port is being mangled!
EDIT AGAIN: this could be a firewall thing, but it also could be a kernel bug why should that source port ever change?
Yes, it could be nat problem, and since we're not doing anything to the firewall here, it seems that some existing firewall rule is interfering or some new rule is created when you create the QOS interface for veth0 or something like that. Perhaps remove QOS interface, or change the LEDE settings in this interface somehow, something to make it so that the packets don't get mangled. The whole thing will not work unless the packets get delivered with the right ports!
Thank you hisham, hopefully we will have success, packet captures suggest it is all working except the packets hit wlan1 with altered source port! if we can stop that, everything should work.
EDIT: here is the fireqos.conf script that I think should be used once you get firewall to stop altering ports. It sets up the veth and the route as we just did, and it classifies traffic, but does not reboot the firewall.
## set up a pair of veth devices to handle inbound and outbound traffic
ip link show | grep veth0 || ip link add type veth
ip link set veth0 up
ip link set veth1 up
ip link set veth1 master br-lan ## place the veth1 into br-lan bridge
ip rule del priority 99 ## to be sure
ip rule del priority 100
ip route add 192.168.1.0/24 dev br-lan table 99
ip rule add from 192.168.1.0/24 to 192.168.1.0/24 table 99 priority 99
## nonlocal traffic to LAN goes via table 100, we already matched priority 99
## to local traffic
ip route add 192.168.1.1 dev br-lan table 100
ip route add 192.168.1.0/24 dev veth0 table 100
ip rule add to 192.168.1.0/24 table 100 priority 100
##ipset for googlevideo,etc.they are beening filled by dnsmasq
ipset create gvideostream4 hash:ip
## set up some mangle tables... this is an example,
#mark packets for googlevideo and others
iptables -t mangle -A PREROUTING -i pppoe-wan -j DSCP --set-dscp 0
iptables -t mangle -A PREROUTING -p udp --sport 5000:5500 -j DSCP --set-dscp 48
iptables -t mangle -A PREROUTING -p udp --dport 5000:5500 -j DSCP --set-dscp 48
iptables -t mangle -A PREROUTING -m set --match-set gvideostream4 src -j DSCP --set-dscp-class AF41
## etc... you need to mark similarly for google or facebook non-video traffic etc
## we send inbound packets from the internet through veth0 towards the br-lan bridge so this
## handles our inbound bandwidth, as output on a virtual ethernet device
interface veth0 lanin output rate 15500kbit qdisc fq_codel overhead 28
class group regular rate 1000kbit ceil 1000kbit
match tos 0x00
match tos 0x01
match tos 0x02
match tos 0x03
match dscp CS6
class highprio rate 300kbit ceil 500kbit
match dscp CS6
class default rate 100kbit
class torrent rate 100kbit
match dports 6881:6889 ### note this is a default range, please insert the range used by your actual torrent client
class group end
class group medprio rate 14000kbit ceil 14000kbit
match dscp AF41
match dscp AF21
class video rate 50%
match dscp AF41
class faststuff rate 20%
match dscp AF21
class group end
## traffic from the LAN gets sent via pppoe-wan so this handles our outbound traffic
interface pppoe-wan wanout output rate 5500kbit qdisc fq_codel overhead 28
class highprio rate 300kbit ceil 500kbit
match dscp CS6
you should be able to use this by telling fireqos to run normally, then adjust your firewall for testing, see if you can make it work... if you need things back to normal all that should be required is
ip rule del priority 100
to eliminate the special rule, and things should no longer be locked out... then you can debug and try again with:
ip rule add to 192.168.1.0/24 table 100 priority 100
hi, still no luck,no internet.
i played a lot with firewall but no luck.
but is this normal "Thu Dec 7 17:19:40 2017 kern.warn kernel: [ 4604.731926] MSSFIX(qos0): IN=pppoe-wan OUT=veth0 MAC= SRC=62.201.216.215 DST=192.168.1.156 LEN=52 TOS=0x08 PREC=0x60 TTL=57 ID=34513 PROTO=TCP SPT=443 DPT=17025 WINDOW=29200 RES=0x00 ACK SYN URGP=0
"
i think veth0 is our problem.
The MSSFIX warning has to do with the fact your pppoe-wan has different MTU than veth0. I don't think it's a problem.
Packet captures show that routing works, packets get where they're supposed to be, but port numbers are changed so the packets are useless and your laptop rejects them. To me this suggests either firewall issue or bug in some kernel component. The packet capture on veth0 and veth1 shows the packets are fine there. It's only once they hit the wlan that they all get their source ports set to 1, not a random value, it seems always 1.
I think we should engage the rest of the LEDE community on what might be causing that, in a new topic. I really do think that the setup we have here "works" in the sense that routing is correct, QoS setup is great, DSCP is being tagged correctly... it's all about this new question: why are ports being screwed with as the packet passes over br-lan into the wlan1?
I'll post a new topic on this with details if you like. What is the make/model of your router? If there are driver issues it will be useful to know as much as possible about your hardware.
It is possible to do that, but either one of two things will happen:
The problem is that there's a bug in the kernel bridge code, or the bridge has requirements we aren't aware of, and so the problem will go away.
The problem is that there's something you've got in your firewall which causes it, and the problem could get worse because now we've got multiple separate networks with more firewall issues!
EDIT: also you won't be able to have devices roam between wlan0 and wlan1 you'll have to give them separate SSID because they're on separate networks, so every time you switch you'll need new DHCP for new ip address... Basically this solution is not very good. But let's see, maybe someone knows exactly what is going on here and we can fix it easily
Let's see if someone has an idea of what's going on in the separate thread, in mean time you can put qos on incoming pppoe-wan interface using just a single default class with fq_codel and you'll get some benefit of lower bufferbloat, until we have a more comprehensive solution. Just comment out the "rule" and add an input interface definition for pppoe-wan like we had at the top of the thread
strange think happen:
when in lockout i tried to ping youtube.com from windows and also my pc, i get request timeout.
when i run this command "ip rule del to 192.168.1.0/24 table 100 priority 100",internet will be available then i visit youtube.com then i run this command"ip rule add to 192.168.1.0/24 table 100 priority 100",internet will gone,so when i ping youtube.com from my pc or router it will ping normally but i can't access youtube.com from browser,if i ping lets said fastserver.me ping is not working "request timeout"
this happen during lockout.
if you still have some ideas to try it ,please tell me.
Hmm... well one thing we know is that the reason TCP doesn't work isn't because packets don't go... it's because packets are mangled so they are ignored. The mangling is that ports are changed. ICMP has no ports to change. so this might be why ping works. but then, why does fastserver.me ping not work? It seems more like firewall issue.
can you copy and paste output of
iptables -t nat -L
It kind of seems like you're natting something that doesn't want to be natted.
see this "Use the act_connmark kernel module. This has been developed by the openwrt devs and allows using MARKs on inbound traffic.
To use it in FireQOS just add this line at the top of fireqos.conf: FIREQOS_CONNMARK_RESTORE="act_connmark" posted in : https://github.com/firehol/firehol/wiki/FireQOS-Use-Scenarios
My impression was act connmark module was openwrt only and out of date for modern kernels, but if it is available and works give it a try. You will have to use firewall marks not dscp, but you should still set dscp for your wifi wmm queues
Instead of bridging with veth1 set routing rule so all packets coming in pppoe-wan are routed to veth0. Place veth0 into qos0 network and veth1 into qos1 network and put all of them into wan zone. I'm on phone so can't explain details now but do you understand what I mean?