Not yet, that will be my next test. It is clearly stable with 24 ports, but who knows. I'll c&p the PoE config you posted later today and I'll provide results in a day or two.
It would be great to get this sorted out, the JG928A shall replace my GS1900-24HP as daily driver.
I'll dig out my Zyxel GS1900-48, it doesn't have PoE but it will be interesting to see if this is purely an HP watchdog issue or if it's a rtl839x issue.
This is a fairly active switch - 27 ports connected and 18 LLDP peers plus the Prometheus checks every 15s for netstat and PoE data. Maybe it's too much for a 700MHz CPU and only 128MB RAM?
Load is also fairly active:
There's definitely some of that going on. In fairness, I have not yet come across one that flat out did not work but of course, some behave weird or only support some bitrates... Either way, I now want to wait for the RTL phys - should be cheap and cool while supposedly supporting even weird n-baset.
Yeah, load is better on mine:
but it's still way more than on the GS1900-24HP (which is in active use, while the JG928A is just idling):
I enabled all 48 ports now, let's see if something changes.
Possibly. My GS1900-24HP has about 22 connected ports and maybe 10 LLDP peers. But there are no Prometheus checks, maybe collectd
is less resource-demanding than the prometheus exporter?
Beginning to look more like a memory leak in realtek-poe
- after the last reboot I added a cron job to restart poe every 1hr and the memory usage has levelled out:
0 * * * * /bin/sh -c "service poe restart"
@hurricos any ideas what might be happening here, the switch is reporting PoE usage via Prometheus node exporter every 15s and RAM usage appears to grow over time, likely causing the reboots shown in the uptime/memory graph above.
Maybe add some debugging behind each malloc(cmd) free(cmd) location in the code. They could be "uneven". E.g. something like this (totally untested).
void log(char *s)
{
FILE *file = fopen("/tmp/realtek-poe.log", "a");
fprintf(file, "%s", s);
fclose(file);
}
...
malloc(cmd); log("+");
...
free(cmd); log("-");
...
Just to confirm this: It's definitely realtek-poe
. While my switch hasn't run out of memory yet, it's just a matter of time until it does so.
Interestingly, the load has increased to 3.0 while the CPU is 99% idle.
and top
:
Mem: 98560K used, 21808K free, 2008K shrd, 0K buff, 23336K cached
CPU: 0% usr 0% sys 0% nic 99% idle 0% io 0% irq 0% sirq
Load average: 3.00 3.00 3.00 3/96 23694
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
19734 1 root S 38404 32% 0% /usr/bin/realtek-poe
2145 1 root S 3936 3% 0% /sbin/rpcd -s /var/run/ubus/ubus.sock
I do have a theory, but at the moment I don't have the time to set up a development environment to test this. @howels can you try to increase the polling timeout (or make it even configurable): https://github.com/Hurricos/realtek-poe/blob/c09225b842b926a2086aab4e825d2ed6bf30dc95/src/main.c#L874
If I've understood the concept with the command queues correctly, then the command queue could pile up and increase and increase if it takes too long to poll all ports. As an alternative, a check if the queue is empty could be included in the timeout handler (list_empty(&mcu->pending_cmds);
).
I flashed this https://github.com/howels/realtek-poe/commit/45ddf88a15a41c8a9a2955c9cafdae6f36754f9d to test and will observe the behaviour. Figured if 2000 was ok on the 24 port models then 5000 should be plenty for 48 port.
@andyboeh unfortunately that didn't solve it, got another spike in load and a reboot
sysadmin123
That's a pity, it was worth a try!
Maybe I'll have some time on the weekend to play around with realtek-poe
.
Is there a way I can see if my SFP+ modules are supported by OpenWRT before installing?
If it is a single-speed fiber SFP+ module not from the cheapest seller, there is a good chance it will just work.
Check this thread for reports of SFP+ testing, several of us have experimented with different options and included links to the products.