Snapshot crashes when using USB drive

farnwomt · February 21, 2020, 6:30pm

I am using an Armor Z2 with an external hard drive, it seems to work fine for a while but if I am transferring a lot of data onto the hard drive it eventually causes the router to reboot.

I am using firmware compiled from the latest snapshot. I have tried using ext4, ext3 and now xfs, but the problem continues.

Today I moved to storing the syslog in something non volatile so that I could see the message and I found it appears to be saying ...

Fri Feb 21 17:18:06 2020 kern.alert kernel: [ 4334.316895] Unable to handle kernel NULL pointer dereference at virtual address 0000000b
Fri Feb 21 17:18:06 2020 kern.alert kernel: [ 4334.316924] pgd = 7ab090fb
Fri Feb 21 17:18:06 2020 kern.alert kernel: [ 4334.324156] [0000000b] *pgd=00000000
Fri Feb 21 17:18:06 2020 kern.emerg kernel: [ 4334.326608] Internal error: Oops: 5 [#1] SMP ARM
Fri Feb 21 17:18:06 2020 kern.warn kernel: [ 4334.330290] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic x
t_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLAS
SIFY slhc nf_reject_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conncount iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat fuse sch_cake nf_conntrack nf_defrag_i
pv6 nf_defrag_ipv4 sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred
Fri Feb 21 17:18:07 2020 kern.warn kernel: [ 4334.383929]  ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet i
p_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables 
nf_reject_ipv6 msdos ifb vfat fat hfsplus hfs autofs4 dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax nls_utf8 nls_iso8859_1 nls_cp437 usb_storage leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom ohci_platform ohci_hcd phy_q
com_dwc3 ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug xfs libcrc32c ext4 mbcache

Any thoughts or suggestions on what I should do to improve the debugging message?

LGA1150 · February 21, 2020, 7:33pm

Check /sys/kernel/debug/crashlog after a crash

farnwomt · February 21, 2020, 7:47pm

Would love to, but there is "No such file or directory" ...

I have tried compiling in various things in the hope that it will create that, but I still don't have it. Any thoughts on the options I should be enabling?

LGA1150 · February 21, 2020, 7:49pm

You have to access that file immediately after a crash reboot, it's gone after a manual reboot or power cycle.

farnwomt · February 21, 2020, 7:58pm

I haven't done a manual reboot or power cycle since it last crashed.

farnwomt · February 21, 2020, 8:47pm

Pretty much the same error message yet again ...

Fri Feb 21 20:42:47 2020 kern.alert kernel: [12263.947616] Unable to handle kernel NULL pointer dereference at virtual address 0000000b
Fri Feb 21 20:42:47 2020 kern.alert kernel: [12263.947645] pgd = 4d6a3d22
Fri Feb 21 20:42:47 2020 kern.alert kernel: [12263.954941] [0000000b] *pgd=00000000
Fri Feb 21 20:42:47 2020 kern.emerg kernel: [12263.957282] Internal error: Oops: 5 [#1] SMP ARM
Fri Feb 21 20:42:48 2020 kern.warn kernel: [12263.961013] Modules linked in: pppoe ppp_async ath10k_pci ath10k_core ath pppox ppp_generic mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic x
t_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLAS
SIFY slhc nf_reject_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conncount iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat fuse sch_cake nf_conntrack nf_defrag_i
pv6 nf_defrag_ipv4 sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred
Fri Feb 21 20:42:48 2020 kern.warn kernel: [12264.014651]  ledtrig_usbport xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet i
p_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables 
nf_reject_ipv6 msdos ifb vfat fat hfsplus hfs autofs4 dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax nls_utf8 nls_iso8859_1 nls_cp437 usb_storage leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom ohci_platform ohci_hcd phy_q
com_dwc3 ahci ehci_platform sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug xfs libcrc32c ext4 mbcache

Just crashed and I have seen the error message in the log file and I have gone immediately for that /sys/kernel/debug/crashlog file and it definitely doesn't exist.

anomeome · February 21, 2020, 9:17pm

iff supported, you can force a test

echo c >/proc/sysrq-trigger

support in

target/linux/generic/hack-4.19/930-crashlog.patch

farnwomt · February 21, 2020, 10:24pm

Did exactly as suggested and it reboots the system and puts an oops message into the syslog, but it doesn't create a /sys/kernel/debug/crashlog

Is there a particular option I need to compile in to make it work?

Thanks for any information.

anon50098793 · February 22, 2020, 2:17am

may be seeing this too... if so... was roughly introduced around the 4.19 bump... approx 5 months ago... will see if I can also get some debugging info...

for me i'd get a total hang... over console... no crash message... while> fgrep xyz /largedir ... rare though...

farnwomt · February 22, 2020, 9:06am

Somebody posted a comment emailed to me at the time (which appears to have subsequently been deleted) suggesting that I explain in detail what hardware I am using, which I am happy to do.

It is a Armor Z2 and I am using an external USB hard drive connected to the USB3 port (which seemed obvious because that is the faster port).

The specific hard drive is a brand new one of these:

In the now deleted post it was suggested I should perhaps try the USB2 port and/or another drive. I will start by trying USB2 and let you know how it goes.

farnwomt · February 22, 2020, 9:35am

I can confirm that connecting to the USB2 port doesn't fix the problem as it crashed within 14 minutes of heavy use of the drive.

For reference I am using rsync to transfer files over the network onto the drive. I am writing at a relatively consistent 11MB/s.

anon50098793 · February 22, 2020, 10:04am

and a different drive? i.e. usb flash drive?... we need to eliminate regulator / power issues...

farnwomt · February 22, 2020, 10:29am

I will try to dig out another drive, but I need to find a flash drive that is big enough for it to run for a while. I had wondered if it might be related to power draw, although the Armor Z2 comes with a 12V/3A power supply which would appear to offer a fair bit of headroom.

I am currently running rsync with --bwlimit=8m to see whether slowing things down to 8MB/s stops it crashing. I should have an idea if that is the case within the next hour or so because it rarely runs for extended periods.

At one point during this slower rsync I did notice that the filesystem on the hard drive became unresponsive, the sort of thing I might expect with a conventional hard drive that was struggling to read or write sectors, but there were no errors in either syslog or dmesg. During this issue the load average increased to over 5, but then it eventually seemed to get over it and carry on.

farnwomt · February 22, 2020, 2:33pm

I can confirm that it still crashes even when trying to slow down the data transfer.