TFTP get file transfer from server fail some time

I am trying to use tftp under OpenWrt 18.06.1 on x86_64 platform, to fetch file from a remote server from the lan interface.
The tftp tool is the one from busybox.
The file is quite huge about 70MB.

To be clear I am using this command :

tftp -g -l /tmp/bsa.bin -r file.img 192.168.93.254

In this case 192.168.93.254 is a server.
The problem is sometime the tftp is failing.

For testing this problem I use the following script and I included the logs:

root@OpenWrt:~# for i in 1 2 3 4 5 6 7 8 9 10
> do
> echo "#### loop $i ####" 
> rm -f /tmp/bsa.bin;date;tftp -g -l /tmp/bsa.bin -r file.img 192.168.93.254;echo $?;date
> done

#### loop 1 ####
Tue Mar 19 15:34:11 UTC 2019
0
Tue Mar 19 15:35:31 UTC 2019
#### loop 2 ####
Tue Mar 19 15:35:31 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:36:39 UTC 2019
#### loop 3 ####
Tue Mar 19 15:36:39 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:37:42 UTC 2019
#### loop 4 ####
Tue Mar 19 15:37:42 UTC 2019
0
Tue Mar 19 15:39:03 UTC 2019
#### loop 5 ####
Tue Mar 19 15:39:03 UTC 2019
0
Tue Mar 19 15:40:21 UTC 2019
#### loop 6 ####
Tue Mar 19 15:40:21 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:40:24 UTC 2019
#### loop 7 ####
Tue Mar 19 15:40:24 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:40:37 UTC 2019
#### loop 8 ####
Tue Mar 19 15:40:37 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:41:02 UTC 2019
#### loop 9 ####
Tue Mar 19 15:41:02 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:41:15 UTC 2019
#### loop 10 ####
Tue Mar 19 15:41:15 UTC 2019
tftp: sendto: Operation not permitted
1
Tue Mar 19 15:41:40 UTC 2019

when it return 0 the download is passed, when it is 1 is failing.
I used tcpdump to check out what is happening when the tftp transfer failed.
This is below a tcpdum ptrace :

13:12:57.993439 IP 192.168.93.1.47075 > 192.168.93.254.42711: UDP, length 4                                                          
13:12:57.993721 IP 192.168.93.254.42711 > 192.168.93.1.47075: UDP, length 516
13:12:57.993769 IP 192.168.93.1.47075 > 192.168.93.254.42711: UDP, length 4  
13:12:57.994044 IP 192.168.93.254.42711 > 192.168.93.1.47075: UDP, length 516
13:12:57.999113 IP 192.168.93.1.22 > 192.168.93.254.53072: Flags [P.], seq 737:1137, ack 192, win 294, options [nop,nop,TS val 1162783648 ecr 105089867], length 400
13:12:57.999814 IP 192.168.93.254.53072 > 192.168.93.1.22: Flags [.], ack 1137, win 510, options [nop,nop,TS val 105090868 ecr 1162783648], length 0
13:12:58.000050 IP 192.168.93.1.58858 > 192.168.93.254.69:  67 RRQ "file.img" octet
13:12:58.006707 IP 192.168.93.254.49097 > 192.168.93.1.58858: UDP, length 516
13:12:58.006900 IP 192.168.93.1.58858 > 192.168.93.254.49097: UDP, length 4  
13:12:58.007973 IP 192.168.93.254.49097 > 192.168.93.1.58858: UDP, length 516
13:12:58.008128 IP 192.168.93.1.58858 > 192.168.93.254.49097: UDP, length 4  
13:12:58.008836 IP 192.168.93.254.49097 > 192.168.93.1.58858: UDP, length 516

In this case the 192.168.93.1 is the openwrt system and 192.168.93.254 is the tftp server.
On the 4th line of the log the server send a 516 UDP packet but the openWrt never acknowledge the packet.
It is the consequence of the "tftp: sendto: Operation not permitted" failure message.

FYI,

  1. when I stop the firewall, no problem happened.
  2. when I stop the wan interface by "ifdown wan" and no problem happened too.
    The wan is a 4G modem in ncm protocol.
  3. When the tftp on the lan is failing the 4G interface is trying to connect to the 4G network without success because the APN and pincode are not correct.

It is very bizarre.

I don't really know when I need to look for.

On the same system, I used " CHAOS CALMER" using the 4G modem and so on, and the TFTP transfer works fine.

I used also curl with tftp support and the result is the same with the same error.

Thanks in advance for your help
Benoit

What I noticed also is when APN and pincode are correct and the 4G network is connected , the tftp transfer issue does not happened too.

tftp fail on large files happens....

gut feeling is that some dhcp client state logic in causing interrupt delay....

Yep, you are right.
Due to the fact that tftp use UDP packet with small size it could happen to fail tftp transfer.
But with the same target and different OpenWrt release I did not have the same behaviour.
I am going probably to check the packages differences I probably added or removed.

you mean....

the other releases don't have

lan is failing the 4G interface is trying to connect to the 4G network
fetch file from a remote server

this issue?

are you saying that the remote connection is over another wan link? your aware that interface state changes trigger the firewall? ( i think? wonder how the other releases did it? ).... any other addons?

I have a CHAOS CALMER release that does not present the tftp transfer issue.
I ported most of the feature needed for a product I had in the CHAOS CALMER to the 18.06.1 and this issue appeared.

Between the CHAOS CALMER release and the 18.06.1, the kernel release is different and other stuff, but I added the wifi interface support.

Don't really know what could happen in my configuration which is really simple.

I think it's more at an "interface handling" subsystem level..... beyond basic configurations anyway....

it's a huge version jump...... 4g modems wouldn't go through a high level of testing me thinks.....

Well! Not sure.
When the firewall is disable this problem does not happen.
So my guess is that it is probably related to the iptable.

what's in this file on your chaos chalmer?

cat /etc/hotplug.d/iface/20-firewall

This is the CHAOS CALMER contents of /etc/hotplug.d/iface/20-firewall

#!/bin/sh

[ "$ACTION" = ifup -o "$ACTION" = ifupdate ] || exit 0
[ "$ACTION" = ifupdate -a -z "$IFUPDATE_ADDRESSES" -a -z "$IFUPDATE_DATA" ] && exit 0

/etc/init.d/firewall enabled || exit 0

fw3 -q network "$INTERFACE" >/dev/null || exit 0

logger -t firewall "Reloading firewall due to $ACTION of $INTERFACE ($DEVICE)"
fw3 -q reload

identical....

not the firewall ( specifically )