Python3 ctypes nanosleep broken with musl?

s_2 · November 19, 2024, 10:53pm

Hi everyone,

I'm not sure if this belongs here or in the developer's section, but technically it's just from a user's perspective (did not compile anything yet).

To add a bit of context, I'm trying to use an FTDI RS485 adapter to control DMX stage lights using Python3 and pydmx (which uses pylibftdi installed via pip, which relies on libftdi1 etc.). The same code that runs perfectly on my ubuntu system would not cause any output at all (FTDI TX LED not lighting up) on OpenWrt 23.05.5 on a ramips mt7621 device.

I eventually tracked it down to the nanosleep function called from the "FTDI driver" which is located here. The condensed version of that code to reproduce is:

from ctypes import cdll, c_long, byref, Structure

_LIBC = cdll.LoadLibrary("libc.so.6")

class timespec(Structure):
	_fields_ = [("tv_sec", c_long), ("tv_nsec", c_long)]

dummy = timespec()
sleeper = timespec()
sleeper.tv_sec = 0
sleeper.tv_nsec = 8000
_LIBC.nanosleep(byref(sleeper), byref(dummy))

This works perfectly on my host system, but on the router, every tv_nsec value greater than 0 seems to make it sleep forever.

Do I need to call this differently since OpenWrt uses musl?

I could try to build an image against glibc tomorrow, just curious if I missed anything here. Any input is appreciated

brada4 · November 19, 2024, 11:20pm

ubus call system board

Python native usleep should work well, average 32bit mips lacks better timer anyway.

s_2 · November 19, 2024, 11:35pm

# ubus call system board
{
        "kernel": "5.15.167",
        "hostname": "OpenWrt",
        "system": "MediaTek MT7621 ver:1 eco:3",
        "model": "D-Link DIR-2660 A1",
        "board_name": "dlink,dir-2660-a1",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "23.05.5",
                "revision": "r24106-10cc5fcd00",
                "target": "ramips/mt7621",
                "description": "OpenWrt 23.05.5 r24106-10cc5fcd00"
        }
}

Indeed, nanosecond-level accuracy is never even needed in that code.
I patched the driver to replace the 3 occurences of waiting with _LIBC.usleep and that did the trick already.

So this is a "works for me" now, not sure whether to upstream this to the pyDMX project

But from looking at musl, it seems nanosleep should generally be supported? At least it works for 0 (and exits with an error for nanosecond values greater than one second).

brada4 · November 19, 2024, 11:48pm

Good idea:

-- https://en.m.wikipedia.org/wiki/DMX512#Timing

Did you build complete modules or installed pypi wheels? You need to recover build logs and check if time is 64bit.

anon6639214 · November 20, 2024, 12:46am

Is there a reason you need ctypes for this? Pythonic way would be to use time.sleep.

Unix implementation:

Use clock_nanosleep() if available (resolution: 1 nanosecond);

Or use nanosleep() if available (resolution: 1 nanosecond);

Or use select() (resolution: 1 microsecond).

brada4 · November 20, 2024, 6:00am

OP needs precise 10usec +/-2usec pulses, not infinite interpolated resolution

s_2 · November 20, 2024, 11:01am

I was wondering the same, maybe the original author just used nanosleep to reduce latency / jitter. But then again, the maximum achievable timing still seemed quite slow to me, i.e. when sweeping one color from 0-255 in a loop, it takes several seconds (which would correspond to 44Hz refresh rate, but I also tried sending much shorter packets instead of a full universe).

I installed everything via opkg and pip (then just patched the driver), nothing compiled by myself yet.

I guess I will have to look with an oscilloscope what the actual resulting timing is on the line, probably there is way more overhead from the USB stack and buffer within the FTDI chip itself, as that it would matter how precise it is in the application.

I may have another look by the start of next week (for this weekend it seems to be working good enough for its use case )

Thanks for your input so far!

brada4 · November 20, 2024, 2:10pm

Their code loads libc.so.6, while musl does not version. And does not check return.

usleep has adequate precision for your application., from python 3.11 there is nanosleep directly in python, no need for dlopen

anon6639214 · November 20, 2024, 3:19pm

No userspace sleep syscall will be precise in the way you're talking about due to OS scheduling (and possibly, signal interruptions).

https://www.man7.org/linux/man-pages/man2/nanosleep.2.html

If the duration is not an exact multiple of the granularity
underlying clock (see time(7)), then the interval will be rounded
up to the next multiple.  Furthermore, after the sleep completes,
there may still be a delay before the CPU becomes free to once
again execute the calling thread.

The only significant difference in the ctypes code is that time.sleep is guaranteed to catch interruptions and sleep for the remaining time (since 3.5), while the ctypes code will not. This kind of delay generally wants to sleep more rather than less, so the time.sleep approach would still be the correct one.

Furthermore, select was the call used for microseconds sleep, before the introduction of usleep, see the select documentarion.

   Emulating usleep(3)
       Before the advent of usleep(3), some code employed a call to
       select() with all three sets empty, nfds zero, and a non-NULL
       timeout as a fairly portable way to sleep with subsecond
       precision.

brada4 · November 20, 2024, 3:44pm

The hack of dlopening libc was never a good practice
usleep -> matches mips clock resolution suffices for application 10x+-2x
nanosleep -> python >3.11 one can remove dlopen(libc) just like that.

s_2 · November 27, 2024, 12:38am

Finally, I found time for conducting some more extensive testing on this

The test script for this scenario was just a loop transmitting constant purple color for a 4-channel device at address 1:

from dmx import DMXInterface

# Open an interface
with DMXInterface("FT232R") as interface:
    interface.set_frame([255, 255, 0, 255])  # chinese 4-channel PAR: brightness, R, G, B
    while True:
        interface.send_update()

First, the original version of the ft232r.py driver from the PyDMX project on my ubuntu machine:

for reference, the code that produces it:

        self._set_break_on()
        wait_ms(10)
        # Mark after break
        self._set_break_off()
        wait_us(8)
        # Frame body
        Device.write(self, b"\x00" + byte_data)
        # Idle
        wait_ms(15)

Curiously, the first thing to notice is that wait_ms(15) only generates an idle period of less than 13ms here, maybe due to the FTDI data transmission being buffered, and the call to Device.write() only blocking until the last byte is written into the buffer, so the timer set for 15ms prematurely starts while the FTDI is actually still busy transmitting the remaining data in the buffer...

Besides, I cannot see anywhere in the DMX specifications where these 15ms come from, the only value I could find for MBB (mark before break) is 8µs (which maybe did not work for the original author of the code, since any sleeping below 2ms would be useless due to the FTDI buffering maybe this was the reason for choosing such an arbitrarily high value).

Break duration is somewhere near 12ms in most of the transmissions, resulting from wait_ms(10). Again, I don't see why such long duration was needed in the first place.

Mark after break is somewhere between 200 and 300µs (resulting from the call to wait_us(8), but there may be additional delay from the FTDI before it actually starts pulling the line down for the first byte transmitted).
Also a full universe will always be transmitted here, which could be optimized for setups with a small count of fixtures.

Now for the boring part: None of these timings change much depending on whether

the original libc nanosleep
the libc usleep
or python time.sleep(microsencods / 1000000)
is used.

On OpenWrt, the main difference to ubuntu (for all three variants) is the increased MAB time, which is roughly about 500µs, regardless of whether using musl usleep or python time.sleep():

As a conclusion, the driver could really just be switched to the native python sleep function (which would also eliminate to distinguish between Windows and Linux in the code, though I haven't tested on Windows).

Now for the reality part: It actually did not work quite well... when started via ssh, the actual application under test would run fine for any amount of time (i.e. it would visualize the amount of wifi probe requests in the air by simply changing the value for the blue channel), however when started via init script, it would hang after 5-10 minutes with a DMX blackout - but that is for another night to debug
If anyone is interested though: https://github.com/s-2/pax2dmx/
Maybe switching to native python sleep helps here, though I believe the issue is rather related to the wifi capturing...

brada4 · November 27, 2024, 10:11pm

Probably libc call delays it wildly.
cat /proc/timer_list
likely says 1 usec resolution.
unless you run preempt kernel with cpu dedicated add task quota, ipi and other jitter sources.

s_2 · November 27, 2024, 10:24pm

Only since Python 3.11 the underlying usleep / nanosleep function seem to be used, so it is now mostly a matter of compatibility with older versions whether it will be changed in the driver code:

github.com/JMAlego/PyDMX

FT232R: use native Python time.sleep() on Linux

JMAlego:master ← s-2:master

opened 09:46PM - 27 Nov 24 UTC

s-2

+5 -21

fixes compatibility with certain libc implementations, e.g. musl on OpenWrt … ------------------------- When trying to use PyDMX with the FTDI driver on an OpenWrt device, I found it was not outputting any DMX signal, while the same code was working fine on a ubuntu desktop. As it turned out, there seems to be an issue with the musl implementation of nanosleep as used by OpenWrt, which made me wonder about the advantage of calling a ctypes function here, rather than using native Python3 sleep(). Regarding performance of the DMX output, I made a few measurements on both desktop and an mt7621-based OpenWrt router, and it did not make any noticeable difference (actually CPython would use usleep or nanosleep anyways, when available), c.f.: https://forum.openwrt.org/t/python3-ctypes-nanosleep-broken-with-musl/216072/11 Could we just change it to use time.sleep() here? Besides being more Pythonic, at least it made it work on OpenWrt for me :slightly_smiling_face: besides, the original code seemed wrong regarding the use of modulo to split the fractional part off the nanoseconds: in wait_ms there is ``` sleeper.tv_sec = int(milliseconds / 1000) sleeper.tv_nsec = (milliseconds % 1000) * 1000000 ``` but in wait_us: ``` sleeper.tv_sec = int(nanoseconds / 1000000) sleeper.tv_nsec = (nanoseconds % 1000) * 1000 ``` probably should have been ``` sleeper.tv_nsec = (nanoseconds % 1000000) * 1000 ``` instead? For this driver it would not make any difference of course, since sleep is never called with full seconds.

brada4 · November 27, 2024, 10:29pm

You need logic analyzer for signals, local timer can reach your needed accuracy but it can not read back the result with confidence.

brada4 · November 27, 2024, 10:42pm

Measure, if you change gobal preempt flag you need own kmods too.
https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/rt-tests
non compile option is to move openwrt to 1st core - cpu0 cpu1 mask 0x3 and rt stuff on cpu2 cpu3 mask 0xc

brada4 · February 6, 2025, 12:46am

24.10 kernel says better for realtime.