Tuning TCP parameters in LEDE for maximum performance

I have a router running LEDE OS (v17.01) with Kernel Version v4.4.61. The router is connected directly by 1Gbit Ethernet Cable to a laptop. To get the maximum throughput, I tune TCP parameters on both sides:

sysctl -w net.core.rmem_max=33554432
sysctl -w net.core.wmem_max=33554432
sysctl -w net.ipv4.tcp_rmem='4096 87380 33554432'
sysctl -w net.ipv4.tcp_wmem='4096 16384 33554432'
ifconfig eth0 txqueuelen 100000

I run iperf in a server mode on the router and starts iperf in client mode on the laptop. I can reach around 950 Mbps and in the same time I can see in the output of the iperf that I have some retries which is an indication of packet drops. However, since the RTT is very small it does not matter a lot since TCP can recover pretty fast. Now I add virtual delay using netem tool on the outgoing traffic from my laptop towards the router. In the same time, I make sure that the buffer of the netem is very large around 100000 packets. I run iperf session and I see that I cannot reach more than 75 Mbps for a delay of 140ms. The output of iperf reports very small window size and a lot of packet drops from time to time (High retries).

So my question; what would be the source of packet drops in my setup? How can I find the source of bottleneck? As what I mentioned above I increased both TCP buffer sizes and the Ethernet interface buffer sizes on both sides.

I looked into the statistics of eth0 interface in /sys/class/net/eth0/statistics/ but I could not find anything strange. In addition, netstat tool in LEDE does not give so much information about TCP sockets.

Note: I repeated the previous experiment again and replaced the router with a laptop and I was able to reach 900Mbps for high delay. So the bottleneck comes from the router side.

txqueuelen is useless btw. LEDE uses fq_codel, not pfifo_fast.

I am aware about fq_codel being the default qDisc. I replaced it with pfifo only and it did not make any changes.

tc -s qdisc show dev eth0
qdisc pfifo 800a: root refcnt 2 limit 100000p
 Sent 7517766 bytes 113726 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

I believe I heard that netem should only be used on a dedicated machine:

endhost ---> netem delay host --> router
instead of:
endhost(netem) --> router

Now that advise might not be recent anymore, but I recommend to treat netem with caution.

BTW, why do you want to use the router as endpoint? Why not:

laptop --> netem host --> router --> second laptop?

Well, I thought the same of using netem in a dedicated host. But as what I mentioned the method works perfectly when I replace the router with a laptop. In addition, netem runs on the endhost.

Actually I am not using the router as end point after the router I have a wireless part but I was trying to understand the cause of the drop in TCP throughput so I split the connection into two parts. The wired parts which give me the problem and the wireless part.