[Solved] I can't getting firehol to work on latest lede trunk

For now I think we just look at traffic over each interface, don't worry about qos tagging and priority etc, just how much traffic flows over each interface during your "lockout" period. that will tell us something about where the traffic is going and hence where the problem is.

When you run the fireqos test, it puts all that stuff to your screen. And that comes after the line resetting the firewall, so it's apparently not the firewall locking you out. So evidently you can receive data from the router. But when you try to access the router, the router doesn't respond. That seems strange...

Suppose you create a new connection to the router to see LuCI page... packet comes in eth0.1 headed for 192.168.1.1 hits the bridge, and is delivered to the router at br-lan 192.168.1.1 the router then tries to send response... packet is sent from 192.168.1.1 via br-lan, the bridge looks up your mac address and sees it's on eth0.1 and sends it... I don't see why you should be locked out. So the fact you are locked out is what we need to figure out.

EDIT: if you are right that router can send you information, then it might work to connect to netdata and watch it, then in separate ssh window run the test, and watch netdata dashboard while you try to surf and do whatever. see which interfaces have traffic, how much traffic, etc. Is there some big spike in traffic over certain interfaces? Does something saturate the router with packets in a loop? Does something else strange happen? Lots of packets over pppoe-wan but no packets over br-lan.

Even if you are locked out of netdata during that period, when router comes back you should be able to see what happened during lockout.

i think it's firewall problem because when i surf a site i get "Error code 101 (net::ERR_CONNECTION_RESET)" or timed out.
this netdata output for veth0 and veth1:

veth0
veth1

note: i can access internet after the 120 sec time.

Is that new, or just something you didn't notice before?

Ok, how about data on br-lan and on the interface your computer is connected via (is it eth0.1 wired LAN or is it one of the wlans?) either way what is traffic on br-lan and whatever physical interface you connect via

EDIT:
When you run your test script, during the time that the computer is doing "sleep 120" or "sleep 60" it is normal that the ssh session seems "locked out" since it's just waiting for the sleep to end. During that time period though, can you access LuCI in browser? perhaps you are not locked out of the router at all but simply have no traffic on ssh because that is normal while we wait for sleep to finish?

1 Like

yeah,i don't know why i can't access my router at morning after the 60 sec period but now i can access it after 120 sec period.

my laptop is connected via wlan0,and my phone to wlan1.
br-lan traffic :

yeah the terminal looking lockedout ,no i can't access luci web interface or internet.

i think we need to add some commands at end of the test file to see the routing status like "ip route show","ip route show table 100",etc.

Hmmm.. It is suspicious that br-lan is sending a megabit continuously. I'm not sure if that's due to netdata sending a bunch of data for updating its display, or not. Was that graph for the interval when you were locked out? We want to see traffic during lockout period. Was lockout period when those spikes occurred at beginning of graph? let's see both br-lan and wlan0 during lockout period. Perhaps the bridge is delivering traffic to wrong place.

Note the good stuff here: we're seeing plenty of traffic go to the veth0 and be delivered to veth1 from your first graphs. That means our routing table is working. What we're not seeing is the traffic then getting delivered to your computer via wlan0. This makes me think we have a bridging problem or a firewall problem, not a routing problem. One thing is that veth1 is not yet part of any firewall group. Perhaps create a new network QOS2 that contains veth1 and put it in the LAN firewall group. This might be our problem. Perhaps data is arriving at veth1 and being dropped rather than bridged.
EDIT: but this still doesn't make sense for why we can't access the router LuCI itself. That seems like a bridging problem. I'm thinking it's possible a loop is being created and this is why br-lan saturates at a megabit, since veth0 has a megabit cap on its default qos!

Yes, some commands at the end of the script to echo information to a file where we can see it later would be helpful. echo it to a file and then you can look at it when you get reconnected to router. Seeing the routing tables, seeing the output of ifconfig which includes summaries of rx packets and tx packets etc would be useful. Perhaps run ifconfig then sleep for 10 seconds then ifconfig again, both times output to your debug info file and then we can see which interfaces have traffic all at once.

1 Like

we can't see anything will locked out.
i will prepare some useful commands to debug the process.
EDIT:
this is my debug commands is that enough :

 ##debug commands
iptables -vL -t nat &> /root/iptables-nat.txt
iptables -vL -t filter &> /root/iptables-filter.txt
iptables -vL -t mangle &> /root/iptables-mangle.txt
iptables -vL -t raw &> /root/iptables-raw.txt
iptables -S  &> /root/iptables-S.txt
ip route show &> /root/iproute-show.txt
ip route show table 100 &> /root/iproute-show-table100.txt
ip route show table 254 &> /root/iproute-show-table254.txt
ip route show table 255 &> /root/iproute-show-table255.txt
ifconfig &> /root/ifconfig.txt
sleep 15;ifconfig &> /root/ifconfig-sleep15.txt
ip link show &> /root/iplink-show.txt
1 Like

It certainly seems like a good start. Set up one window to look at netdata, one window to do google search, one window to reload some other site. Once those are set up, run the script, then during the 15 second window try to do the google search and reload the other site etc so we can see where traffic is going.

EDIT: wait, I don't think &> does what you want, you want

ifconfig > /root/ifconfig.txt &

1 Like

is it now ok:

##debug commands
iptables -vL -t nat > /root/iptables-nat.txt &
iptables -vL -t filter > /root/iptables-filter.txt &
iptables -vL -t mangle > /root/iptables-mangle.txt &
iptables -vL -t raw > /root/iptables-raw.txt &
iptables -S  > /root/iptables-S.txt &
ip route show > /root/iproute-show.txt &
ip route show table 100 > /root/iproute-show-table100.txt &
ip route show table 254 > /root/iproute-show-table254.txt &
ip route show table 255 > /root/iproute-show-table255.txt &
ifconfig > /root/ifconfig.txt &
sleep 15;ifconfig > /root/ifconfig-sleep15.txt &
ip link show > /root/iplink-show.txt &
netstat -nat -n > /root/netstat-nat-n.txt &
netstat -nat -a > /root/netstat-nat-a.txt &
netstat -a > /root/netstat-a.txt &
netstat -pt > /root/netstat-pt.txt &

what about adding these commands before at first line of the script so we can know before and after running the script data,i will add "-b mean before" to each output file.
i add the new qos interface.

It seems good. Give it a try

1 Like

Great news:
i can access router during lockout period, this happen when i created QOS2 for veth1.
the log files contain useful juicy info.
this screenshots is during lockout :
veth-0veth-1br-lan2
this is logs link: https://userscloud.com/6tgl3xoopb5j

Yikes that link you posted sent me to porn spam!! Upload your logs direct to forum please.

Otherwise, very good news regarding still connected to router with qos2 interface. But do you have internet connection?

1 Like

no internet,but how to upload to forum,i can only upload images to fourm,BTW the userscloud is not spam is a file host.

Yes but where I am somehow userscloud fills my screen with ads for prostitutes or the like. Seriously. Perhaps it is different for you. I see an upload link here at bottom of text entry box. Try to use that.

i can't upload here,but this is a direct link to logs files: http://fastserver.me/p/user/hisham2630/files/logs.zip

Ok so it seems likely that we have firewall issue mainly. With qos2 interface now you have full access to router but no internet? I'm wondering what firewall is doing to internet packets, keeping them from coming to veth0 perhaps. I'm on my phone so can't see your log files side by side etc. Can you summarize. Between first ifconfig and second one, how many more packets on tx and Rx each interface?

1 Like

Another thought I had. Perhaps you need to put veth1 into the LAN interface in the LEDE physical configuration rather than into it's own QOS2 interface. That might affect what firewall rules lede generates. When we run ip link set veth1 master br-lan it adds to the bridge, but firewall doesn't know this

1 Like

i removed qos2 interface and added veth1 to lan.
i see there's more packets in second ifconfig, i will try to make a new test.
i tried it but still no internet even i can't access isp iptv site on 10.6.6.8.

Ok, I looked at the logs and found this:

Screenshot from 2017-12-04 13-29-27

Clearly some packets went across the pppoe-wan, in 15 seconds on the order of 300 packets, so 20 packets a second or more, yet only 185 were sent into veth0.

Your iptables-filter log seems to show packets were being forwarded to veth0...

I believe the remaining problem is somehow firewall related. That particularly seems likely due to adding things to QOS2 or LAN interface improved the situation with respect to receiving packets from router. Are you still able to see the router when veth1 is in the LAN bridge?

Unfortunately the LEDE firewall script makes things very easy for simple situations, but a bit opaque for more complex situations. It might be time for a tcpdump to see which packets went where.

tcpdump -i any -w /root/packets.pcap -c 1500

at end of your fireqos script will make tcpdump capture 1500 packets from all interfaces and dump to a raw file. You should try to surf during "lockout" period. You will probably need to install tcpdump. You can then grab the file and look at it in wireshark or upload it for me to look at.

1 Like

I think it is a good idea to reset the Mac table on br-lan and reset the arp cache, and enable stp in case the bridge is confused. after reloading the firewall.

ip neigh flush all
brctl stp br-lan on
brctl setageing br-lan 30

I honestly don't expect that to help, but I don't think it will hurt, and if it does help it will tell us something useful.

I think this is the question we need to answer: When you try to load a web page, where do the packets go?

I expect that it goes

Wlan0 ----> br-lan ---routed---> pppoe-wan ---> internet

and then coming back....

Wlan0 <---- bridged on br-lan <----- veth1 <------ veth0 <---routed--- pppoe-wan <----- internet

And if you request a page off the router itself:

Wlan0 ----> br-lan -----> httpd

Wlan0 <---- br-lan <----- httpd

which is why it's confusing that adding veth1 into LAN zone would affect this... it suggests that we're getting

Wlan0 <---- bridged br-lan <---- veth1 <---- veth0 <----- httpd

and I don't know why this routing rule would be involved since our rule only involves packets from a network other than the 192.168.1.0/24 to that network... but that's not what's involved in this.......UNLESS you have LuCI listening on some interface other than LAN?

possibly what's going on is:

Wlan0 -> br-lan bridged to ---->veth1 -----> veth0 --- routed br-lan ->httpd
Wlan0 <--- br-lan <--- httpd

and then you wouldn't be able to talk to httpd without veth1 allowing output via firewall. But this is not the path we want things to take. Perhaps we need an ebtables rule to prevent stuff being sent out veth1 as it shouldn't ever be necessary, it's just a way to receive packets from the internet through a queue.

RESULT: I really think we need that tcpdump and this will tell us all that we need to know to see what packets go where.

1 Like

hi,still the same no internet access,have a look at captured data with tcpdump:
this link for pcap file https://ufile.io/y9hxb
this is another link : https://expirebox.com/download/38924e86cc57dd362efb93d1f79f5945.html