[testers needed] [21.02, snapshot] collectd-exec scripts to collect Lantiq DSL values

Ah I think tables 7.1 to 7.4 in ITU G.997.1 might be the best explanation we will find:

IBE: Count of idle cell payload bit errors in the bearer channel
CRC_P: Count of non-pre-emptive packets with CRC error in the bearer channel

but I am not sure how helpful these short definitions are and what the difference is between the preemptive and non-preemptive variants....

Argh, I guess I know what is up, VDSL2 allows an optional method in which higher priority packets can preempt lower priority ones, that is the higher priority data is interspersed between the parts of the lower priority packet that was already in transfer. Now, I have no indication that any ISP actually uses that option and I have also no idea if that is compatible with G.INP. I assume the standard grew that capability to ease ISPs reservation to switch from ADSL to VDLS2, since in ADSL with its ATM/AAL5 encapsulation one can simply have ATM cells of higher priority packets zip past the ATM cels of lower priority data, to allow e.g. smoother VoIP performance over low rate links.
If my assumption is correct the counters for the preemptive variant should basically never increase but stay at zero.

root@OpenWrt:/tmp/rrd/OpenWrt/exec-lantiqdsl# /usr/lib/collectd/exec/exec-lantiqdsl.sh
PUTVAL "OpenWrt/exec-lantiqdsl/uptime" N:1655
PUTVAL "OpenWrt/exec-lantiqdsl/bitrate-downstream" N:39993000
PUTVAL "OpenWrt/exec-lantiqdsl/bitrate-downstream_max" N:51871744
PUTVAL "OpenWrt/exec-lantiqdsl/snr-downstream" N:9.400000
PUTVAL "OpenWrt/exec-lantiqdsl/bitrate-upstream" N:9585000
PUTVAL "OpenWrt/exec-lantiqdsl/bitrate-upstream_max" N:9734478
PUTVAL "OpenWrt/exec-lantiqdsl/snr-upstream" N:6.100000
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_es" N:1
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_ses" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_fecs" N:1
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_crc" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_crcp" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_cv" N:4
PUTVAL "OpenWrt/exec-lantiqdsl/errors-near_cvp" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_es" N:18000
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_ses" N:259
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_fecs" N:310694
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_crc" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_crcp" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_cv" N:0
PUTVAL "OpenWrt/exec-lantiqdsl/errors-far_cvp" N:0
**Line State:**Showtime with TC-Layer sync
**Line Mode:**G.993.2 (VDSL2, Profile 17a)
**Line Uptime:**0h 29m 51s
**Annex:**B
**Data Rate:**39.993 Mb/s / 9.585 Mb/s
**Max. Attainable Data Rate (ATTNDR):**51.855 Mb/s / 9.694 Mb/s
**Latency:**0.00 ms / 0.00 ms
**Line Attenuation (LATN):**17.3 dB / 23.7 dB
**Signal Attenuation (SATN):**17.3 dB / 23.6 dB
**Noise Margin (SNR):**9.4 dB / 6.1 dB
**Aggregate Transmit Power (ACTATP):**6.3 dB / 12.5 dB
**Forward Error Correction Seconds (FECS):**1 / 310694
**Errored seconds (ES):**1 / 18000
**Severely Errored Seconds (SES):**0 / 259
**Loss of Signal Seconds (LOSS):**0 / 0
**Unavailable Seconds (UAS):**58 / 58
**Header Error Code Errors (HEC):**0 / 0
**Non Pre-emptive CRC errors (CRC_P):**0 / 0
**Pre-emptive CRC errors (CRCP_P):**0 / 0
**ATU-C System Vendor ID:**Infineon 208.134
**Power Management Mode:**L0 - Synchronized

Seeing as my ISP kindly screwed around the other night and booted my line. i redid your scripts and added in the FEC extended JS. Fresh reboot and now i have stats. My line sadly is an ECI hub and thus no vectoring and a fairly shitty line tbh. it used to be 80/20 but in past year has dropped all the way down to its current. (thankfully i'm only paying for 40/10)

So far it looks uncertain, few errors reported except a few FECS on the upstream, but one retrain/resync this afternoon that I did not initiate....

I will be a happy camper once the CuDA/copper wiring has been replaced with fiber. No, I have no idea when/if that is going to happen, in spite of living ~20-30m away from a central-office location (VSt.). (Famous last words, might be tempting to see whether one can run OpenWrt on the apparently unavoidable PON-ONT :wink: )

The answer to that will be, no. The 'good' part, from the outside it appears to be be just a media-converter (internally it will do more, of course), mediating between fibre and copper ethernet.

--
/me: Deutsche Glasfaser, Nokia G-010G-P GPON ONT, plain copper ethernet/ DHCP to the OpenWrt router. No VLANs, not PPPoE, 'just works'.

And one more unenforced retrain at ~23:30... no CRC errors, just a few FECS. Will need to display the ReTx values to see whether there is increased activity around the retrains... not sure I can continue this experiment much longer my "users" will not appreciate multiple resyncs during the day....

Sidenote: even with 9.6dB SNR margin on the upstream the sync happens at 100% of capacity, so there is (almost) zero slack for transient issues with the bit loading, could well be that the retrains happen if the modem can not remap one or multiple bits between sub carriers... The broadcom modem both synced considerably higher 37 instead of 35.5 and had a higher capacity 38 versus 35.5... not sure I can do much on my side to increase the robustness of the upstream side.

It is well possible that this is the same error I saw in the past with the lantiq modem, I might just have failed to see it, because it being "masked" by a flurry of other errors/code violations.

EDIT: This does not actually work as intended, a working draft is below at https://forum.openwrt.org/t/testers-needed-21-02-snapshot-collectd-exec-scripts-to-collect-lantiq-dsl-values/105890/30?u=moeller0

This is quite ugly, but seems to work:
EDIT: no it does not...

#!/bin/sh

# source jshn shell library
. /usr/share/libubox/jshn.sh

HOSTNAME="${COLLECTD_HOSTNAME:-$(cat /proc/sys/kernel/hostname)}"

# retrieve DSL metrics through ubus
dsl_metrics_json=$(/bin/ubus call dsl metrics)

# initialize JSHN and load JSON
json_init
json_load "$dsl_metrics_json"

# get line state
json_get_var linestate up

# only continue if line is up
# collecting any line stats is pointless if the line is down, downtime will be reflected by the gap in statistics
[ "$linestate" = "1" ] || exit 0

# get basic line stats
json_get_var uptime uptime

# get downstream and upstream stats
json_select downstream
	json_get_var downstream_datarate     data_rate
	json_get_var downstream_datarate_max attndr
	json_get_var downstream_snr          snr
json_close_object
json_select upstream
	json_get_var upstream_datarate       data_rate
	json_get_var upstream_datarate_max   attndr
	json_get_var upstream_snr            snr
json_close_object

# get near and far errors
json_select errors
	json_select near
		json_get_var errors_near_es       es
		json_get_var errors_near_ses      ses
		json_get_var errors_near_fecs     fecs
		json_get_var errors_near_crc      crc_p
		json_get_var errors_near_crcp     crcp_p
		json_get_var errors_near_cv       cv_p
		json_get_var errors_near_cvp      cvp_p
		json_get_var errors_near_rtx_uc   rx_corrupted
		json_get_var errors_near_rtx_ucp  rx_uncorrected_protected
		json_get_var errors_near_rtx_rx   rx_retransmitted
		json_get_var errors_near_rtx_c    rx_corrected
		json_get_var errors_near_rtx_tx   rx_retransmitted		
	json_close_object
	json_select far
		json_get_var errors_far_es        es
		json_get_var errors_far_ses       ses
		json_get_var errors_far_fecs      fecs
		json_get_var errors_far_crc       crc_p
		json_get_var errors_far_crcp      crcp_p
		json_get_var errors_far_cv        cv_p
		json_get_var errors_far_cvp       cvp_p
		json_get_var errors_far_rtx_uc    rx_corrupted
		json_get_var errors_far_rtx_ucp   rx_uncorrected_protected
		json_get_var errors_far_rtx_rx    rx_retransmitted
		json_get_var errors_far_rtx_c     rx_corrected
		json_get_var errors_far_rtx_tx    rx_retransmitted
	json_close_object
json_close_object

# present values to collectd
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/uptime\" N:$uptime"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-downstream\" N:$downstream_datarate"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-downstream_max\" N:$downstream_datarate_max"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/snr-downstream\" N:$downstream_snr"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-upstream\" N:$upstream_datarate"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-upstream_max\" N:$upstream_datarate_max"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/snr-upstream\" N:$upstream_snr"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_es\" N:$errors_near_es"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_ses\" N:$errors_near_ses"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_fecs\" N:$errors_near_fecs"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_crc\" N:$errors_near_crc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_crcp\" N:$errors_near_crcp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_cv\" N:$errors_near_cv"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_cvp\" N:$errors_near_cvp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_uc\" N:$errors_near_rtx_uc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_ucp\" N:$errors_near_rtx_ucp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_rx\" N:$errors_near_rtx_rx"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_c\" N:$errors_near_rtx_c"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_tx\" N:$errors_near_rtx_tx"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_es\" N:$errors_far_es"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_ses\" N:$errors_far_ses"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_fecs\" N:$errors_far_fecs"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_crc\" N:$errors_far_crc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_crcp\" N:$errors_far_crcp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_cv\" N:$errors_far_cv"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_cvp\" N:$errors_far_cvp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_uc\" N:$errors_far_rtx_uc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_c\" N:$errors_far_rtx_c"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_ucp\" N:$errors_far_rtx_ucp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_rx\" N:$errors_far_rtx_rx"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_tx\" N:$errors_far_rtx_tx"

exit 0

and

'use strict';
'require baseclass';
return baseclass.extend({
	title: _('DSL'),
	rrdargs: function(graph, host, plugin, plugin_instance, dtype) {
		var uptime = {
			title: "%H: DSL line uptime",
			vlabel: "days",
			alt_autoscale: true,
			number_format: "%5.1lf days",
			rrdopts: ["-h 80"],
			data: {
				types: ["uptime"],
				options: {
					uptime: {
						title:         "Uptime",
						transform_rpn: "86400,/",
						noavg:         true,
						color:         "007700"
					}
				}
			}
		};
		var datarate = {
			title: "%H: DSL datarates",
			vlabel: "bit/s",
			number_format: "%5.1lf %sbit/s",
			alt_autoscale: true,
			data: {
				instances: {
					bitrate: ["downstream", "downstream_max", "upstream", "upstream_max"]
				},
				options: {
					bitrate_downstream: {
						title:         "Downstream",
						color:         "007700"
					},
					bitrate_downstream_max: {
						title:         "Downstream (max.)",
						color:         "aaaaaa",
						overlay:       true,
						noarea:        true
					},
					bitrate_upstream: {
						title:         "Upstream",
						color:         "000077",
						flip:          true
					},
					bitrate_upstream_max: {
						title:         "Upstream (max.)",
						color:         "aaaaaa",
						overlay:       true,
						noarea:        true,
						flip:          true
					}
				}
			}
		};
		var snr = {
			title: "%H: DSL SNR",
			vlabel: "dB",
			number_format: "%4.1lf dB",
			alt_autoscale: true,
			data: {
				instances: {
					snr: ["downstream", "upstream"]
				},
				options: {
					snr_downstream: {
						title:         "Downstream",
						color:         "007700",
						overlay:       true
					},
					snr_upstream: {
						title:         "Upstream",
						color:         "000077",
						flip:          true
					}
				}
			}
		};
		var operational_counters = {
			title: "%H: DSL operational counters",
			vlabel: "events (per day)",
			y_min: -4000,
			y_max: 4000,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_fecs", "far_fecs"]
				},
				options: {
					errors_near_fecs: {
						title:         "FECs (near)",
						transform_rpn: "86400,*",
						color:         "007700"
					},
					errors_far_fecs: {
						title:         "FECs (far)",
						transform_rpn: "86400,*",
						color:         "000077",
						flip:          true
					}
				}
			}
		};
		var error_seconds = {
			title: "%H: DSL errored seconds",
			vlabel: "seconds (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_ses", "near_es", "far_ses", "far_es" ]
				},
				options: {
					errors_near_ses: {
						title:         "SES/severely errored seconds (near)",
						transform_rpn: "86400,*",
						color:         "ff0000"
					},
					errors_near_es: {
						title:         "ES/errored seconds (near)",
						transform_rpn: "86400,*",
						color:         "777777"
					},
					errors_far_ses: {
						title:         "SES/severely errored seconds (far)",
						transform_rpn: "86400,*",
						color:         "ff0000",
						flip:          true
					},
					errors_far_es: {
						title:         "ES/errored seconds (upstream)",
						transform_rpn: "86400,*",
						color:         "777777",
						flip:          true
					}
				}
			}
		};
		var error_counters = {
			title: "%H: DSL error counters",
			vlabel: "errors (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_crc", "near_crcp", "far_crc", "far_crcp"]
				},
				options: {
					errors_near_crc: {
						title:         "CRC errors (near)",
						transform_rpn: "86400,*",
						color:         "444444"
					},
					errors_near_crcp: {
						title:         "CRC errors (preemptive, near)",
						transform_rpn: "86400,*",
						color:         "888888"
					},
					errors_far_crc: {
						title:         "CRC errors (far)",
						transform_rpn: "86400,*",
						color:         "444444",
						flip:          true
					},
					errors_far_crcp: {
						title:         "CRC errors (preemptive, far)",
						transform_rpn: "86400,*",
						color:         "888888",
						flip:          true
					}
				}
			}
		};
		var retx_counters = {
			title: "%H: DSL G.INP(retx) retransmission counters",
			vlabel: "DTUs (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_rtx_uc", "near_rtx_ucp", "near_rtx_c", "near_rtx_tx", "far_rtx_c", "far_rtx_ucp", "far_rtx_c", "far_rtx_tx"]
				},
				options: {
					errors_near_rtx_uc: {
						title:         "ReTx corrupted (near)",
						transform_rpn: "86400,*",
						color:         "880000"
					},
					errors_near_rtx_ucp: {
						title:         "ReTx corrupted protected (near)",
						transform_rpn: "86400,*",
						color:         "ff0000"
					},
					errors_near_rtx_c: {
						title:         "ReTx corrected (near)",
						transform_rpn: "86400,*",
						color:         "00ff00"
					},
					errors_near_rtx_tx: {
						title:         "ReTx tx-retransmitted (near)",
						transform_rpn: "86400,*",
						color:         "ff00ff"
					},
					errors_far_rtx_uc: {
						title:         "ReTx corrupted (far)",
						transform_rpn: "86400,*",
						color:         "880000",
						flip:          true
					},
					errors_far_rtx_ucp: {
						title:         "ReTx corrupted protected (far)",
						transform_rpn: "86400,*",
						color:         "ff0000",
						flip:          true
					},
					errors_far_rtx_c: {
						title:         "ReTx corrected (far)",
						transform_rpn: "86400,*",
						color:         "00ff00",
						flip:          true
					},
					errors_far_rtx_tx: {
						title:         "ReTx tx-retransmitted (far)",
						transform_rpn: "86400,*",
						color:         "ff00ff",
						flip:          true
					},
				}
			}
		};

		return [uptime,datarate,snr,operational_counters,error_seconds,error_counters,retx_counters];
	}
});

But the issue still seems to be that in upstream direction the HH5A syncs fully up to the ATTNDR like:
Data Rate: 116.797 Mb/s / 35.491 Mb/s
Max. Attainable Data Rate (ATTNDR): 138.674 Mb/s / 35.491 Mb/s
leaving no slack for slight signal degradation (e.g. due to changing RF noise in the course of a day) since my ISP does not enable seamless rate adaptation (SRA). I guess I need to find a way to limit the upstream sync or add an additional SNR margin onto the sync requirements (which OpenWrt offers for the downstream direction, where I do not really need it :wink: #firstworldproblems )

What is? I don't see anything I wouldn't have done myself. Of course this only creates numerical rtx values if the JSON value is available, so for those who don't have the current patches it would be sensible to return zero instead of an empty string, i.e.

echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_uc\" N:${errors_near_rtx_uc:-0}"

or something to that effect. But that would be for a "release version" of this script, if there is such a thing.

Edit: This might not even be necessary. If I read the jshn.sh source correctly, json_get_var has an undocumented third parameter for a fallback value. So, untested, for example

json_get_var errors_far_rtx_uc    rx_corrupted 0

should, if it can't find the value, set the variable to 0. No further conditional processing necessary.

Oh, currently the assembled plot is 'ugly', mostly because some colors lines are invisible as they are plotted behind other area plots... I guess I want tobplay a bit with the plotting options....
Thanks for the pointer about how to deal with empty values, will incorporate and post, after some testing.

So here is a version that actually seems to work:
cat /usr/lib/collectd/exec/exec-lantiqdsl.sh

#!/bin/sh

# source jshn shell library
. /usr/share/libubox/jshn.sh

HOSTNAME="${COLLECTD_HOSTNAME:-$(cat /proc/sys/kernel/hostname)}"

# retrieve DSL metrics through ubus
dsl_metrics_json=$(/bin/ubus call dsl metrics)

# initialize JSHN and load JSON
json_init
json_load "$dsl_metrics_json"

# get line state
json_get_var linestate up

# only continue if line is up
# collecting any line stats is pointless if the line is down, downtime will be reflected by the gap in statistics
[ "$linestate" = "1" ] || exit 0

# get basic line stats
json_get_var uptime uptime

# get downstream and upstream stats
json_select downstream
	json_get_var downstream_datarate     data_rate
	json_get_var downstream_datarate_max attndr
	json_get_var downstream_snr          snr
json_close_object
json_select upstream
	json_get_var upstream_datarate       data_rate
	json_get_var upstream_datarate_max   attndr
	json_get_var upstream_snr            snr
json_close_object

# get near and far errors
json_select errors
	json_select near
		json_get_var errors_near_es       es
		json_get_var errors_near_ses      ses
		json_get_var errors_near_fecs     fecs
		json_get_var errors_near_crc      crc_p
		json_get_var errors_near_crcp     crcp_p
		json_get_var errors_near_cv       cv_p
		json_get_var errors_near_cvp      cvp_p
		json_get_var errors_near_rtx_uc   rx_corrupted
		json_get_var errors_near_rtx_ucp  rx_uncorrected_protected
		json_get_var errors_near_rtx_rx   rx_retransmitted
		json_get_var errors_near_rtx_c    rx_corrected
		json_get_var errors_near_rtx_tx   tx_retransmitted
	json_close_object
	json_select far
		json_get_var errors_far_es        es
		json_get_var errors_far_ses       ses
		json_get_var errors_far_fecs      fecs
		json_get_var errors_far_crc       crc_p
		json_get_var errors_far_crcp      crcp_p
		json_get_var errors_far_cv        cv_p
		json_get_var errors_far_cvp       cvp_p
		json_get_var errors_far_rtx_uc    rx_corrupted
		json_get_var errors_far_rtx_ucp   rx_uncorrected_protected
		json_get_var errors_far_rtx_rx    rx_retransmitted
		json_get_var errors_far_rtx_c     rx_corrected
		json_get_var errors_far_rtx_tx    tx_retransmitted
	json_close_object
json_close_object

# present values to collectd
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/uptime\" N:$uptime"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-downstream\" N:$downstream_datarate"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-downstream_max\" N:$downstream_datarate_max"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/snr-downstream\" N:$downstream_snr"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-upstream\" N:$upstream_datarate"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/bitrate-upstream_max\" N:$upstream_datarate_max"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/snr-upstream\" N:$upstream_snr"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_es\" N:$errors_near_es"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_ses\" N:$errors_near_ses"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_fecs\" N:$errors_near_fecs"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_crc\" N:$errors_near_crc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_crcp\" N:$errors_near_crcp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_cv\" N:$errors_near_cv"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_cvp\" N:$errors_near_cvp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_uc\" N:$errors_near_rtx_uc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_ucp\" N:$errors_near_rtx_ucp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_rx\" N:$errors_near_rtx_rx"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_c\" N:$errors_near_rtx_c"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-near_rtx_tx\" N:$errors_near_rtx_tx"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_es\" N:$errors_far_es"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_ses\" N:$errors_far_ses"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_fecs\" N:$errors_far_fecs"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_crc\" N:$errors_far_crc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_crcp\" N:$errors_far_crcp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_cv\" N:$errors_far_cv"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_cvp\" N:$errors_far_cvp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_uc\" N:$errors_far_rtx_uc"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_ucp\" N:$errors_far_rtx_ucp"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_rx\" N:$errors_far_rtx_rx"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_c\" N:$errors_far_rtx_c"
echo "PUTVAL \"$HOSTNAME/exec-lantiqdsl/errors-far_rtx_tx\" N:$errors_far_rtx_tx"

cat /www/luci-static/resources/statistics/rrdtool/definitions/exec.js

'use strict';
'require baseclass';
return baseclass.extend({
	title: _('DSL'),
	rrdargs: function(graph, host, plugin, plugin_instance, dtype) {
		var uptime = {
			title: "%H: DSL line uptime",
			vlabel: "days",
			alt_autoscale: true,
			number_format: "%5.1lf days",
			rrdopts: ["-h 80"],
			data: {
				types: ["uptime"],
				options: {
					uptime: {
						title:         "Uptime",
						transform_rpn: "86400,/",
						noavg:         true,
						color:         "007700"
					}
				}
			}
		};
		var datarate = {
			title: "%H: DSL datarates",
			vlabel: "bit/s",
			number_format: "%5.1lf %sbit/s",
			alt_autoscale: true,
			data: {
				instances: {
					bitrate: ["downstream", "downstream_max", "upstream", "upstream_max"]
				},
				options: {
					bitrate_downstream: {
						title:         "Downstream",
						color:         "007700"
					},
					bitrate_downstream_max: {
						title:         "Downstream (max.)",
						color:         "aaaaaa",
						overlay:       true,
						noarea:        true
					},
					bitrate_upstream: {
						title:         "Upstream",
						color:         "000077",
						flip:          true
					},
					bitrate_upstream_max: {
						title:         "Upstream (max.)",
						color:         "aaaaaa",
						overlay:       true,
						noarea:        true,
						flip:          true
					}
				}
			}
		};
		var snr = {
			title: "%H: DSL SNR",
			vlabel: "dB",
			number_format: "%4.1lf dB",
			alt_autoscale: true,
			data: {
				instances: {
					snr: ["downstream", "upstream"]
				},
				options: {
					snr_downstream: {
						title:         "Downstream",
						color:         "007700",
						overlay:       true
					},
					snr_upstream: {
						title:         "Upstream",
						color:         "000077",
						flip:          true
					}
				}
			}
		};
		var operational_counters = {
			title: "%H: DSL operational counters",
			vlabel: "events (per day)",
			y_min: -4000,
			y_max: 4000,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_fecs", "far_fecs"]
				},
				options: {
					errors_near_fecs: {
						title:         "FECs (near)",
						transform_rpn: "86400,*",
						color:         "007700"
					},
					errors_far_fecs: {
						title:         "FECs (far)",
						transform_rpn: "86400,*",
						color:         "000077",
						flip:          true
					}
				}
			}
		};
		var error_seconds = {
			title: "%H: DSL errored seconds",
			vlabel: "seconds (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_ses", "near_es", "far_ses", "far_es" ]
				},
				options: {
					errors_near_ses: {
						title:         "SES/severely errored seconds (near)",
						transform_rpn: "86400,*",
						color:         "ff0000"
					},
					errors_near_es: {
						title:         "ES/errored seconds (near)",
						transform_rpn: "86400,*",
						color:         "777777",
						overlay:       true,
						noarea:        true
					},
					errors_far_ses: {
						title:         "SES/severely errored seconds (far)",
						transform_rpn: "86400,*",
						color:         "ff0000",
						flip:          true
					},
					errors_far_es: {
						title:         "ES/errored seconds (upstream)",
						transform_rpn: "86400,*",
						color:         "777777",
						flip:          true,
						overlay:       true,
						noarea:        true
					}
				}
			}
		};
		var error_counters = {
			title: "%H: DSL error counters",
			vlabel: "errors (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_crc", "near_crcp", "far_crc", "far_crcp"]
				},
				options: {
					errors_near_crc: {
						title:         "CRC errors (near)",
						transform_rpn: "86400,*",
						color:         "444444"
					},
					errors_near_crcp: {
						title:         "CRC errors (preemptive, near)",
						transform_rpn: "86400,*",
						color:         "888888",
						overlay:       true,
						noarea:        true
					},
					errors_far_crc: {
						title:         "CRC errors (far)",
						transform_rpn: "86400,*",
						color:         "444444",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
					errors_far_crcp: {
						title:         "CRC errors (preemptive, far)",
						transform_rpn: "86400,*",
						color:         "888888",
						flip:          true,
						overlay:       true,
						noarea:        true
					}
				}
			}
		};
		var retx_counters = {
			title: "%H: DSL G.INP(retx) retransmission counters",
			vlabel: "DTUs (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_rtx_uc", "near_rtx_ucp", "near_rtx_c", "near_rtx_tx", "far_rtx_uc", "far_rtx_ucp", "far_rtx_c", "far_rtx_tx"]
				},
				options: {
					errors_near_rtx_uc: {
						title:         "ReTx corrupted (near)",
						transform_rpn: "86400,*",
						color:         "880000",
						overlay:       true,
						noarea:        false
					},
					errors_near_rtx_ucp: {
						title:         "ReTx corrupted protected (near)",
						transform_rpn: "86400,*",
						color:         "ff0000",
						overlay:       true,
						noarea:        true
					},
					errors_near_rtx_c: {
						title:         "ReTx corrected (near)",
						transform_rpn: "86400,*",
						color:         "00ff00",
						overlay:       true,
						noarea:        true
					},
					errors_near_rtx_tx: {
						title:         "ReTx tx-retransmitted (near)",
						transform_rpn: "86400,*",
						color:         "ff00ff",
						overlay:       true,
						noarea:        true
					},
					errors_far_rtx_uc: {
						title:         "ReTx corrupted (far)",
						transform_rpn: "86400,*",
						color:         "880000",
						flip:          true,
						overlay:       true,
						noarea:        false
					},
					errors_far_rtx_ucp: {
						title:         "ReTx corrupted protected (far)",
						transform_rpn: "86400,*",
						color:         "ff0000",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
					errors_far_rtx_c: {
						title:         "ReTx corrected (far)",
						transform_rpn: "86400,*",
						color:         "00ff00",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
					errors_far_rtx_tx: {
						title:         "ReTx tx-retransmitted (far)",
						transform_rpn: "86400,*",
						color:         "ff00ff",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
				}
			}
		};

		return [uptime,datarate,snr,operational_counters,error_seconds,error_counters,retx_counters];
	}
});

I have not yet tried any of the default to zero approaches, will update after testing. I do wonder whether using an undocumented feature in json_get_var is a robust way forward though, so I am more inclined to use the shell approach instead. If I do, would it make sense to just do this for all variables to reduce the dependency on the exact ubus dsl metrics?

Actually, I had a different thought. I believe there's a way to only create the graph if there's a corresponding RRD database, though I have to find out how it's done yet. If so, rather than zeroing out the values it would be more sensible to not submit those values to collectd at all if they are not available.

That sounds like a real plan :wink: forward.
But, ATM I am more concentrated on sanity checking the displayed data, I see events with more corrected DTUs than corrupted and retransmitted, which seems somewhat impossible, or (more likely) I have not yet fully understood what the counters are intending to tell me.

BUT, I do see that my upstream FECs correlate with ReTx events in the upstream, and also somewhat weaker in downstream direction (but there are no FECs reported for the downstream on my link). I have my doubts, that CPE in general are reliable in reporting these counters....

I guess it really is quite simple, the retransmissions are simply accounted on the other side, that is retransmissions for near corrupted DTU's appear on the far side's counters, which on second thought is pretty much where they should be accounted for :wink:
This pattern actually became clear only after I fixed the display of the counters. Now, I am pondering whether to leave things as they are now (and add a note to the description/legend) or whether I should flip the curves.... or both...

@takimata I have a question about the scaling values:

transform_rpn: "86400,*",

is number of seconds per day, but if I understand collectd's default configuration it samples every 30 seconds and calculates the difference to the last sample taken, which would be in samples/30 seconds, so should the conversion factor to sec/days not be 26024 = 2880 instead? Or does collectd convert to counts per second on sampling or storing to the rrd data-base?

It does. errors data sources (yes, data types are defined by the first word in the value name) are of the "DERIVE" data source type and end up as a "data rate" which is "calculated per second."

Sample frequency generally does not matter with collectd. It's been specifically designed to deal with infrequent and/or inconsistent update intervals in mind.

1 Like

Thanks, quite helpful....

So here is a version that has some internal consistency (values affecting the near side plotted up, values affecting the far side plotted down; area plots for retransmission events, as these potentially cause increased jitter/burstiness; area plots for rx_uncorrected_protected, as these should result in CRC events as well as these indicate true packet loss). I also reduced the scaling to per minute, because I believe for absolute difference values only the hour plot is useful and there per day leads to hard to interpret large values, for the larger time windows, I guess only the ratio of the different areas matter....

I have not even tried to tackle the make plotting depend on a metric being available on ubus.

cat exec.js:

'use strict';
'require baseclass';
return baseclass.extend({
	title: _('DSL'),
	rrdargs: function(graph, host, plugin, plugin_instance, dtype) {
		var uptime = {
			title: "%H: DSL line uptime",
			vlabel: "days",
			alt_autoscale: true,
			number_format: "%5.1lf days",
			rrdopts: ["-h 80"],
			data: {
				types: ["uptime"],
				options: {
					uptime: {
						title:         "Uptime",
						transform_rpn: "86400,/",
						noavg:         true,
						color:         "007700"
					}
				}
			}
		};
		var datarate = {
			title: "%H: DSL datarates",
			vlabel: "bit/s",
			number_format: "%5.1lf %sbit/s",
			alt_autoscale: true,
			data: {
				instances: {
					bitrate: ["downstream", "downstream_max", "upstream", "upstream_max"]
				},
				options: {
					bitrate_downstream: {
						title:         "Downstream",
						color:         "007700"
					},
					bitrate_downstream_max: {
						title:         "Downstream (max.)",
						color:         "aaaaaa",
						overlay:       true,
						noarea:        true
					},
					bitrate_upstream: {
						title:         "Upstream",
						color:         "000077",
						flip:          true
					},
					bitrate_upstream_max: {
						title:         "Upstream (max.)",
						color:         "aaaaaa",
						overlay:       true,
						noarea:        true,
						flip:          true
					}
				}
			}
		};
		var snr = {
			title: "%H: DSL SNR",
			vlabel: "dB",
			number_format: "%4.1lf dB",
			alt_autoscale: true,
			data: {
				instances: {
					snr: ["downstream", "upstream"]
				},
				options: {
					snr_downstream: {
						title:         "Downstream",
						color:         "007700",
						overlay:       true
					},
					snr_upstream: {
						title:         "Upstream",
						color:         "000077",
						flip:          true
					}
				}
			}
		};
		var operational_counters = {
			title: "%H: DSL operational counters",
			vlabel: "events (per day)",
			y_min: -4000,
			y_max: 4000,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_fecs", "far_fecs"]
				},
				options: {
					errors_near_fecs: {
						title:         "FECs (near)",
						transform_rpn: "86400,*",
						color:         "007700"
					},
					errors_far_fecs: {
						title:         "FECs (far)",
						transform_rpn: "86400,*",
						color:         "000077",
						flip:          true
					}
				}
			}
		};
		var error_seconds = {
			title: "%H: DSL errored seconds",
			vlabel: "seconds (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_ses", "near_es", "far_ses", "far_es" ]
				},
				options: {
					errors_near_ses: {
						title:         "SES/severely errored seconds (near)",
						transform_rpn: "86400,*",
						color:         "ff0000"
					},
					errors_near_es: {
						title:         "ES/errored seconds (near)",
						transform_rpn: "86400,*",
						color:         "777777",
						overlay:       true,
						noarea:        true
					},
					errors_far_ses: {
						title:         "SES/severely errored seconds (far)",
						transform_rpn: "86400,*",
						color:         "ff0000",
						flip:          true
					},
					errors_far_es: {
						title:         "ES/errored seconds (upstream)",
						transform_rpn: "86400,*",
						color:         "777777",
						flip:          true,
						overlay:       true,
						noarea:        true
					}
				}
			}
		};
		var error_counters = {
			title: "%H: DSL error counters",
			vlabel: "errors (per day)",
			y_min: -200,
			y_max: 200,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["near_crc", "near_crcp", "far_crc", "far_crcp"]
				},
				options: {
					errors_near_crc: {
						title:         "CRC errors (near)",
						transform_rpn: "86400,*",
						color:         "444444"
					},
					errors_near_crcp: {
						title:         "CRC errors (preemptive, near)",
						transform_rpn: "86400,*",
						color:         "888888",
						overlay:       true,
						noarea:        true
					},
					errors_far_crc: {
						title:         "CRC errors (far)",
						transform_rpn: "86400,*",
						color:         "444444",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
					errors_far_crcp: {
						title:         "CRC errors (preemptive, far)",
						transform_rpn: "86400,*",
						color:         "888888",
						flip:          true,
						overlay:       true,
						noarea:        true
					}
				}
			}
		};
		var retx_counters = {
			title: "%H: DSL G.INP(retx) retransmission counters",
			vlabel: "DTUs (per minute)",
			y_min: -0.1,
			y_max: 0.1,
			alt_autoscale: true,
			data: {
				instances: {
					errors: ["far_rtx_tx", "near_rtx_uc", "near_rtx_ucp", "near_rtx_c", "near_rtx_tx", "far_rtx_uc", "far_rtx_ucp", "far_rtx_c"]
				},
				options: {
					errors_near_rtx_tx: {
						title:         "ReTx tx-retransmitted (far, accounted as near)",
						transform_rpn: "60,*",
						color:         "ff00ff",
						overlay:       false,
						flip:          true,
						noarea:        false
					},
					errors_near_rtx_uc: {
						title:         "ReTx corrupted (near)",
						transform_rpn: "60,*",
						color:         "ff0000",
						overlay:       true,
						noarea:        true
					},
					errors_near_rtx_c: {
						title:         "ReTx corrected (near)",
						transform_rpn: "60,*",
						color:         "00ff00",
						overlay:       true,
						noarea:        true
					},
					errors_near_rtx_ucp: {
						title:         "ReTx corrupted protected (near)",
						transform_rpn: "60,*",
						color:         "0000ff",
						overlay:       true,
						noarea:        false
					},
					errors_far_rtx_tx: {
						title:         "ReTx tx-retransmitted (near, accounted as far)",
						transform_rpn: "60,*",
						color:         "af00af",
						overlay:       false,
						noarea:        false
					},
					errors_far_rtx_uc: {
						title:         "ReTx corrupted (far)",
						transform_rpn: "60,*",
						color:         "af0000",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
					errors_far_rtx_c: {
						title:         "ReTx corrected (far)",
						transform_rpn: "60,*",
						color:         "00af00",
						flip:          true,
						overlay:       true,
						noarea:        true
					},
					errors_far_rtx_ucp: {
						title:         "ReTx corrupted protected (far)",
						transform_rpn: "60,*",
						color:         "0000af",
						flip:          true,
						overlay:       true,
						noarea:        false
					},
				}
			}
		};

		return [uptime,datarate,snr,operational_counters,error_seconds,error_counters,retx_counters];
	}
});

I can't say much about the rtx graphs, I don't have them, so I trust you that they make sense. Honestly, I'm agnostic when it comes to the scaling factor, if "per hour" makes sense to debug, then per hour it is. I just know that either you leave it per-second to get meaningful totals, or scale them per-anything-else to get meaningful averages.

Thank you though for looking into the meaning of the "preemptive CRC errors", I had the suspicion that they were part of the total CRC errors, but heck if I could find any definitive answer on that. (And empirical observation is hard for me, after a month of uptime I am still looking at a zero counter.) So overlaying those makes sense.

As for the additional graps for rtx being there at all, I think the key is in per-instance definitions, but I have to wrap my head around rrdtool.js first and do some tests.

Also I am moving the scripts to a proper Github repository, I'm a bit tired of the Gists that have outlived their usefulness. That would also make it possible for you to directly submit a PR. (Edit:) or I'll pass the baton to someone else entirely, I am the farthest from claiming I am the official instance for this project.

1 Like

;), I think by iterative trial and error they should be okay now :wink:

Fair enough, makes sense, with a sample interval of ~30 seconds a single event will be at 1/30 = 0.033 Hz, which I guess is okay, similar to per minute... per day just leads to a "quantization" with really large and unintuitive steps in the hour plot...

Well, that is pretty opaque, from what I see Rec. ITU-T G.992.3 (04/2009) Annex N mentions the options preemption (Annex N.3.1.2 Support for preemption) and short packages (Annex N.3.1.3 Support for short packets), which is referenced in Rec. ITU-T G.993.2 (02/2019) at three places with terse references back to 992.3. So my interpretation is just my best interpretation of the data I have found (that does not contradict the ITU standrads as far as I can see), and not strictly backed up by explicit text....

This at least fits with the expectation that no-one actually uses this... on ADSL2 (G992.3) with low bitrates pre-emption might have made sense (to allow VoIP packets to get real piority access to the link) but with VDSL2 and the typical bitrates it might simply not be used at all, at which point zero counters are expected, no?

Whatever you prefer, I can see the whole PR/github thing as a helpful formalization, but I am not sure whether the DSL collectd module will actually see much more changes at a high enough rate (then, I naively would probably try to package everything as a repository anyways ;))

I am considerably less qualified than you, and will in all likely hood switch back to the Zyxel after I helped out a bit with the lantiq* (as the lantiq is still fickle as hell compared to the Zyxel with multiple unenforced retrains during my short testing period; yes due to my wiring fix-up it seems to be better than in the past, but the Zyxel kept sync for months on end, while with the HH5A it days).

*) I will give it a bit ore time, because my wildest error type looked like the vectoring engine stopped working, which might be fixed by actually sending error samples back to the VE, so that under changing conditions in the binder and hence changing cross-talk patterns vectoring might be able to adapt; but that error was only sporadic to begin with so....

Maybe, but that's not what I mean. I'm literally looking at zero CRC errors on my line and therefore cannot make any meaningful observations:


I therefore need to rely on other people's input.

Tangent: I'm sorry to hear that the Lantiq modem is still temperamental on your line, which by means that it would probably not do much better on my home line either. I will probably not switch away from my Zyxel there. And once international travel is possible again I'm planning to emigrate in the following few months, to a country where the optic fibres grow plentiful. So there's an expiration date on my DSL adventures anyway.

No you're not.