Edgerouter-X network breakdown

Hi,
for quite some time i used an Edgerouter-X running OpenWrt with Adblock as the DNS in my home network. I never had issues with this configuration.

As the Arris Cable modem from my ISP has limited functionality I recently set the modem into bridge mode and configured the Edgerouter-X as my main router (using the most recent stable 19.07.4 build). I also have one TP-Link Archer C7 and one Asrock G10 running stock firmware but configured as APs. This configuration works fine, but since I use the Edgerouter-X as the main router (for about a week now) my network went down completely three or four times and I had to power-cycle the Edgrouter-X to make it work again. It also happens if hardware acceleration is turned of. And when the network is down I can't access the TP-Link or the Asrock even they are configured with static IP Adresses.

Has anyone else had similar experiences with an Edgerouter-X and OpenWRT?

No. My Edgerouter X has been very stable on 19.07.4 and prior OpenWrt releases. It is our home gateway router, and has two AP's (also flashed with OpenWrt) connected to it.

More information about your configuration will be needed to help debug this.

Thank you for your reply!

I recently upgraded to OpenWrt 19.07.4 r11208 and resettet to the OpenWrt defaults. The error occurs even with this default configuration of OpenWrt.

The WAN port is connected to an Arris TG862s cable modem. Of course there are multiple devices connected to the LAN which are mostly clients. But there are also two wifi access points with static IP addresses (Tplink Archer C7, Asrock G10).

As the problem just occurred while i was typing this text, i could confirm, that I could access the devices with static IPs when i defined a static IP for my notebook. But access to the Edgerouter was not possible.

As it is the the default configuration for OpenWrt I am not sure if there is any other information which could help to pin down the problem.

I found one post in this forum which sounds a bit like the issues i experienced: EdgeRouter X Crash

In this case overheating seemed to be the problem. Is it possible to monitor the temperature on the EdgeRouter X?

I don't think there is any temperature sensor. In the other thread it was noted that external heat is the main problem. There's very little internal heating. Convective cooling would be improved by mounting on end with port 4 down, but don't block the air holes on what is now the bottom.

Any time there seems to be hardware instability, try a different power supply. The input is built for 24 volts so a 24 volt power cube may be better, or supply Ubiquiti 24 volt PoE to port 0.

Is there anything in /sys/kernel/debug/crashlog ?

Thanks for the hint, but there is no crashlog in /sys/kernel/debug

Thank you @mk24 for mentioning the power supply. I found a post where a user mentioned problems after switching from PoE to 12V. I could not try switching to PoE because i do not have an injector, but i tried to use a more powerful 12V power supply but the the router keeps freezing.

I also tried heating the router intentionally but this did not trigger the issues, so i think i can rule out overheating issues.

What i find interesting is the fact, that the router seems to work without issues with either EdgeOS or OpenWRT 18.06.9 (which I am using at the moment).

I'm having similar @bruno_l LAN setup and ER-X crash problem.
Hardware Offloading is active and sirq is almost 0% and just 1%-5% few time with my FTTC DL100Mbps/UL30Mbps full usage.
I've been using ER-X with Open-WRT for years without any problem until 19.07.3, 5 days ago I upgrades to 19.07.5 and network problem began, I head around 4 crash up to now.
When ER-X crash I can't ssh it and can't ping the router or 1.1.1.1 but once I could ping the AP.
The first time the problem happened I was using all the DL and UL bandwidth but then I couldn't reproduce the crash forcing traffic load.
The only solution is to unplug the ER-X power supply and than power it again.
Once I noticed that with uptime that the router rebooted alone.
Once, after a temporary network stop, maybe the router uncrashed alone and I could ssh and see this log:

Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.204777] ------------[ cut here ]------------
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.213984] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 0x8038d150
Thu Dec 31 16:46:35 2020 kern.info kernel: [22229.228031] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.241898] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio gpio_button_hotplug
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.355878] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.209 #0
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.367994] Stack : 00000000 00000000 00000000 8fe67a40 00000000 00000000 00000000 00000000
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.384624]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0fd60 53261630
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.401250]         8fc0fdf8 00000000 00000000 00003db8 00000038 8049da98 00000008 00000000
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.417878]         00000000 80550000 00056e26 00000000 8fc0fd40 00000000 00000000 8050c4d4
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.434507]         8038d150 00000140 00000003 8fe67a40 00000000 802ae190 0000000c 806b000c
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.451136]         ...
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.455991] Call Trace:
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.456008] [<8049da98>] 0x8049da98
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.467780] [<8038d150>] 0x8038d150
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.474706] [<802ae190>] 0x802ae190
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.481632] [<8000c1a0>] 0x8000c1a0
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.488559] [<8000c1a8>] 0x8000c1a8
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.495486] [<804868d4>] 0x804868d4
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.502411] [<80071c80>] 0x80071c80
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.509338] [<8002e798>] 0x8002e798
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.516265] [<8038d150>] 0x8038d150
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.523191] [<8002e820>] 0x8002e820
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.530116] [<800552b8>] 0x800552b8
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.537045] [<8038d150>] 0x8038d150
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.543970] [<80099b90>] 0x80099b90
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.550895] [<8038cfa4>] 0x8038cfa4
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.557824] [<80088738>] 0x80088738
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.564749] [<8005f3e4>] 0x8005f3e4
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.571677] [<800889f4>] 0x800889f4
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.578605] [<80079328>] 0x80079328
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.585540] [<804a4898>] 0x804a4898
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.592470] [<80033164>] 0x80033164
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.599396] [<8025b4c0>] 0x8025b4c0
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.606324] [<80007488>] 0x80007488
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.613247]
Thu Dec 31 16:46:35 2020 kern.warn kernel: [22229.616344] ---[ end trace fe83a231f63769f3 ]---
Thu Dec 31 16:46:35 2020 kern.err kernel: [22229.625548] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Thu Dec 31 16:46:35 2020 kern.info kernel: [22229.637869] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Thu Dec 31 16:46:35 2020 kern.info kernel: [22229.649851] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e9c0000, max=0, ctx=940, dtx=940, fdx=939, next=940
Thu Dec 31 16:46:35 2020 kern.info kernel: [22229.670819] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f250000, max=0, calc=1774, drx=1775
Thu Dec 31 16:46:35 2020 kern.info kernel: [22229.692008] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
Thu Dec 31 16:46:35 2020 kern.info kernel: [22229.711794] mtk_soc_eth 1e100000.ethernet: PPE started

Any idea?
Thanks

As suggested somewhere I tried to do a clean reinstall of OpenWRT following the instructions on the device page (starting with a custom image vor 19.07.2 and then flashing the official release). I used 19.07.2 until it froze (also hardware flow offloading did not work with this release) and also tried to upgrade to 19.07.5 which also kept freezing.

In a next step i flashed the current snapshot release (Kernel Version 5.4.86) which is running without an issue for 24h. Today also a PeO power supply arrived, but I have not installed it yet.

I did not check the major differences between 19.07.5 and the snapshot but i already noticed, that the switch configuration changed. It seems DSA driver is used now.

This thread seems to be directly linked to the issues I was experiencing Mtk_soc_eth watchdog timeout after r11573

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.