OpenWrt, Linux kernel and sockets: how much time a write() call is blocked in front of a full buffer?

fr483 · November 8, 2018, 1:58pm

Hello,

I have a doubt about how OpenWrt and the Linux kernel are managing UDP sockets. I'm using OpenWrt 18.06.1 with kernel version 4.14.63.
I have a C application, cross-compiled to work with OpenWrt on the target hardware, that is using sockets to send UDP packets.

When sending data over the socket I can use sendto() or write(). I noticed that if I try to write over a full UDP buffer (for instance if I try to write too much data with respect to what the wireless network can deliver), these calls are actually blocking my application.

But the question is: how long does the blocking time last? As far as I was able to observe, these calls seems to block for much more time with respect to what is required to free the space for a single packet, letting the buffer get emptied much more. This seeems to influence the packet drop in the kernel, before the WNIC, when trying to offer a large amount of data.

Does this depend on the Linux kernel scheduler? Does it depend on internal operations that copy packets from the UDP buffer towards the WNIC, maybe in blocks only?

Thank you very much in advance.

twinkleLED · November 8, 2018, 8:27pm

I can't answer your question directly, but I am curious.

Do you see the same behaviour if you run the same program on a linux laptop?

fr483 · November 10, 2018, 4:48pm

I just tried this on a Linux laptop: the behaviour is actually the same.

I was also able to test this behaviour in a better way with the network measurement program iPerf, in UDP mode, by opening a client on one Linux laptop and a server on a another Linux laptop connected to the same network, trying to offer a big amount of traffic (with -b 100M) in order to saturate the buffer.

The result is reported in the following plot, obtained using a modified iPerf version, which was patched to output the internal iPerf delay at each iteration (which should be the delay that should be kept between packets to respect the user specified bandwidth (for instance, 100Mbit/s), depending on the previous loop time - a negative value means that the program should run x ms faster the next iteration due to a slow previous iteration, if I got it correctly) together with the number of bytes in the UDP socket buffer.

Due to the algorithm inside iPerf, if I properly understood it, the delay value is reset when it reaches a certain threshold, which, in my case, was set to > 50 ms of loop time (< "-50 ms" of delay).

untitled

The write() call seems to be blocked for enough time to free about 159.64 kB, which is much more than a single packet (1470 B of payload, as set by iPerf).

Moreover, if a socket timeout is set, I observed that it seems to properly expire (for instance, if I set a 20 ms timeout, the loop time seems to never take more than 20 ms + a little amount), but without giving any error, due to the buffer being now free to accomodate much more than a single packet (thanks to the write() blocking the application for a good amount of time).
The behavious seems to be exactly the same on OpenWrt.

eduperez · November 10, 2018, 6:19pm

In that case, I think you will probably get better support in a kernel forum, instead of this one.

fr483 · November 10, 2018, 7:17pm

Thanks! I will try asking also in a Linux kernel forum: if I have any news, I will report them here too.

robhancock · November 10, 2018, 11:07pm

Your expectation that buffer space should get freed up one packet at a time is probably incorrect, since packets are transmitted in larger groupings due to MPDU aggregation etc. Also, it would be less efficient to wake up the application so frequently just to refill the socket buffer when it still has a significant amount of buffered data in the buffer to send.

fr483 · November 17, 2018, 6:59pm

Thank you for your reply! This completely makes sense! So, as far as I understood, the blocking time, together with how much the buffer is freed up in the mean time, is depending on both the OS and the driver/WNIC (?), and it is typically more than the time needed to free up the space for a single packet.
In order to have more detailed data, though, probably the kernel code has to be analyzed more in details...