I am not sure if this topic belongs in this category.
Anyway, the problem is that system DSL Line Uptime lagging behind System Uptime.
On the TP-Link W8970, when the unit started the difference between the two Uptimes was 2min which is the time it took the line to sync.
15Hrs later it looks like this.
System Uptime
DSL Line Uptime
After 15hrs system uptime, the DSL Line Uptime is lagging almost 22min behind the system uptime.
What can be causing this? and can it be fixed?
It could be due to ADSL sync been lost and then re-syncing without your knowledge. So an additional 20 minutes difference in uptime could be due to loss of sync for 20 minutes.
You could try an experiment. Disconnect the ADSL line for a short time and reconnect and chech the difference in uptime before and after.
Nope, the PPPoE connection uptime is the same as system uptime + 2min. If line sysc been lost PPPoE would have lost connection and a new IP assigned which didn't happen.
Right now the gap increased even more, system uptime 20hrs 5min while DSL line uptime 19hrs 36min so the gap increased from my previous post from 22min to 29min gap.
Seems like linear relationship between uptime and drift.
ADSL modem is probably using its own clock that is drifting relative to router cpu
Exactly. The modem chipset firmware gets queried and returns uptime in seconds, which is then simply reformatted into days, hours, minutes and seconds for display. If that seconds value is off to begin with there is really not much you can do about it.
(In theory you could modify /lib/functions/lantiq_dsl.sh
to incorporate a "drift factor" based on observation -- judging by your numbers it seems ito be somewhere around 2% off --, but I'm not sure it warrants the effort, and it will still not be accurate.)
Can the drift be fixed with a modem firmware update?
I don't know. Maybe. Maybe a different firmware blob is more precise. I'm not even sure what provides the seconds value to the firmware, it might even be the remote end of the DSL line, then you're shit out of luck with every effort you take.
Honestly, I think it's not worth the effort. Yes, the value is off, that's a bummer, but you know it is reliably off and you can treat it accordingly. It's not like it has any influence on anything.
To fix the time drift between the router system uptime and line uptime, you need to do the following:
- Edit the
/lib/functions/lantiq_dsl.sh
file. - Look for the
line_uptime()
function (usually on line #534) - Insert this line
et=$(awk -v et=$et 'BEGIN{printf "%d", et * 1.024062}')
between these two lines
et=$(dsl_val "$ccsg" nElapsedTime)
[ -z "$et" ] && et=0
to look like this
et=$(dsl_val "$ccsg" nElapsedTime)
et=$(awk -v et=$et 'BEGIN{printf "%d", et * 1.024062}')
[ -z "$et" ] && et=0
Note: for me the correction factor was to multiply by 1.024062 which I found that it gives almost zero drift over a long period of testing time (and a very long excel sheet), I am not sure if the drift is constant across all lantiq devices. Which means that if the drift is not constant then the drift factor need to change from device to device. For now let's assume that it's constant.
4. Save the /lib/functions/lantiq_dsl.sh
file.
And that's it, no need to reboot check the overview page and the drift is fixed. The only time difference between the System Uptime and the Line Uptime will be the time it takes to sync the line.
One final note: All these steps need to be redone after every firmware update, reboots doesn't affect the changes made.
I hope one of the developers will implement this fix so we don't have to redo it with every firmware update.
Good Work
Although the drift factor you calculated looks like a combination of systematic drift caused by a count of 1024 vs 1000 per second and and a time drift
A multiplier of 1.024 should be enough to correct the systematic drift.
The 1.024 multiplier was the first value I tested but was getting +1 second drift every few hours.
So I started experimenting with other values.
Most people won't care about +1 second drift every few hours, so 1.024 should work fine for them.
The only concern now is to have this fix committed so we don't have to edit the lantiq_dsl.sh
file with every update.
I applaud the strive for accuracy, but honest question: What does actually depend on this value being exact, down to the second no less?
Again, the uptime value is generated by the modem firmware, and why it drifts and for which firmware versions and for which modem chipsets and for which line conditions and under which modem load and at which room temperature is anyone's guess. If "multiplying by 1.024062" works for you, great, more power to you.
But this is not a fix, it is a workaround based on your private observations, and IMNSHO should not be picked up by OpenWrt. Even if your empirical approach is accurate, it is a "sensor value" and should be presented, not be guesstimateinterpreted. If a fix is necessary, it needs to be done in the modem/DSLAM firmware (whichever entity calculates it wrong in the first place.)
I concurr, unless say the division by 1024 is caused by a DSL standard, and hence theoretically correct, I would always also present the actually returned value...
Extract and display statistics and diagnostic data from Lantiq modem
This problem is specific to the Lantiq modems family as far as I know and from what I can tell I am not the only one complaining and the fix is applied to the lantiq_dsl.sh
file not for every dsl modem out there. And like I said the multiplier 1024 should work for everyone don't worry about my number, my curiosity took me far but 1024 should be fine for the masses.
I totally agree with you that it should be fixed on the Lantiq modem firmware level. But we don't have the luxury to do that, do we? so we work with what we have, even if it's a patch job to keep things working nice and neat till by a miracle it gets fixed in the Lantiq modem firmware itself.
How can you possibly claim that? You have a sample size of one device, one chipset and one DSL line.
We don't know the reason for the drift, so how can you possibly know your "fix" will work for everyone? Different firmwares can be tested, yes. But if it's a modem chipset issue, how do you know different chipsets from the same family behave the same? If it's a an issue with values from the DSLAM, how do you know all DSLAMs behave the same?
"Trust me, I play a doctor on TV."
Well then, play the devil's advocate and prove me wrong. It's as simple as that.
I left a step by step guide on how to apply the fix in one of my previous posts for others to test it as well and they are more than welcome to post feedback here.
It's a community forum people.
It's a group effort.
We share our knowledge and help each other. That's how progress is made.
Thanks. It worked! I tried on zyxel P2812 modem.
So we now know the fix works for
TP-Link TD-W8970 Lantiq XWAY VRX268
ZyXEL P-2812HNU-F1 Lantiq XWAY VRX288
I will keep updating the list with every confirmed report.
Thanks all for your support.
Alright, besides the fact that this is not how any of this works, I'll play along. I restarted my router this morning, so I can't show a huge uptime, but it's already enough:
Fritz!Box 3370 (Lantiq XWAY VRX288), Firmware 5.7.8.9.1.4
Uptime according to system clock: 14h08m55s = 50935 seconds
Uptime according to modem: 13h46m14s = 49574 secs
Which means a drift of 1.0274 for me.
The uptime corrected by your 1.024062, while admittedly an improvement, would still be off by 3 minutes.
Look, I'm not trying to be confrontional, or argue that there is not an issue here. What I am saying is that we don't know the cause, so we don't know the solution. Trying to find a "correction factor" by experimentation/observation is a valiant effort, but it's only a band-aid, and apparantly one that isn't sticking all too well.
Edit: In the end, though, I am not the arbiter here. Submit it to the devs, hear their feedback. Personally, I don't give enough expletives about the uptime the modem reports, and if it's off by 2 or 3% it's no skin off my back.
Thank you, you actually proved it for me.
Your calculations are wrong.
You expected that the system uptime should equal to the line uptime (hence the 1.0274 drift factor) and didn't factor in the time it takes to load other modules and the time it takes the line to sync, so 2-3 min between system uptime and line uptime is an acceptable variance.
That's why I said 1.024 is an acceptable median and should work for everyone.