[SOLVED] Kernel fails to boot: “BUG: scheduling while atomic” (ipq806x)

Hi I am trying to use a low latency kernel (from master) on Netgear R7800 and it is not booting. The same type of kernel boots fine if built from stable. I also tried a volunteer preemption kernel from both master and stable and the both work fine.

The boot log is here. Does anyone have any insight into this?

The very first log error in the log is BUG: scheduling while atomic: swapper/0/1/0x00000003 followed by a lot of BUG: scheduling while atomic: kworker/0:1/26/0x00000002.

https://drive.google.com/file/d/1Bzyg3PiKyj4UApVkUnRFoeoxHWklxh50/view

UPDATE: the firmware with this kernel has actually managed to boot once, but was constantly slamming the log with the same message. Would never boot normally again.

Try reverting lines 120 and 131 in patch 0038
https://github.com/openwrt/openwrt/commit/7903a9219c7eb5d14c62f79d4a70f5cab1c6294f#diff-de28ebb48ddcbe6ec583993e4bf900df

I will do that. I already have a pre-built workspace: is it enough to revert those lines and re-run make or I have to start from an empty workspace?

To rerun make should be enough. Or you can run make target/linux/clean prior to make to be completely sure

It booted fine, but only the LAN interfaces came up. wifi, uhttpd, etc did not start. See the errors below. I updated while preserves config. Reverting back to the working firmware got (presence config) got wifi, etc back.

Mon Apr  2 01:23:05 2018 daemon.err hostapd: Configuration file: /var/run/hostapd-phy1.conf
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 34: invalid key_mgmt 'WPA-PSK-SHA256'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 37: unknown configuration item 'ieee80211w'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 38: unknown configuration item 'group_mgmt_cipher'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 58: invalid key_mgmt 'WPA-PSK-SHA256'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 61: unknown configuration item 'ieee80211w'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 62: unknown configuration item 'group_mgmt_cipher'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: 6 errors found in configuration file '/var/run/hostapd-phy1.conf'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Failed to set up interface with /var/run/hostapd-phy1.conf
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Failed to initialize interface
Mon Apr  2 01:23:05 2018 daemon.notice netifd: radio1 (792): cat: can't open '/var/run/wifi-phy1.pid': No such file or directory
Mon Apr  2 01:23:05 2018 daemon.notice netifd: radio1 (792): WARNING (wireless_add_process): executable path /usr/sbin/wpad does not match process  path ()
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Configuration file: /var/run/hostapd-phy0.conf
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 37: invalid key_mgmt 'WPA-PSK-SHA256'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 40: unknown configuration item 'ieee80211w'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 41: unknown configuration item 'group_mgmt_cipher'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 59: invalid key_mgmt 'WPA-PSK-SHA256'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 62: unknown configuration item 'ieee80211w'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 63: unknown configuration item 'group_mgmt_cipher'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 83: invalid key_mgmt 'WPA-PSK-SHA256'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 86: unknown configuration item 'ieee80211w'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Line 87: unknown configuration item 'group_mgmt_cipher'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: 9 errors found in configuration file '/var/run/hostapd-phy0.conf'
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Failed to set up interface with /var/run/hostapd-phy0.conf
Mon Apr  2 01:23:05 2018 daemon.err hostapd: Failed to initialize interface

Mon Apr 2 08:28:06 2018 daemon.err dnsmasq[2173]: cannot read /tmp/adb_list.overall: No such file or directory

You are missing wpad-full package probably
You should replace wpad with wpad-full. Or reset the wireless configuration.

I think this is because I used a workspace with the default master configuration that likely does not include wpad or luci. I wanted to have a clean test with a single change only: low latency kernel. I will now apply your suggestion to my primary workspace and do a full rebuild.
Having said that, now that the image actually boots will you be committing the change you suggested to master?

Thx for the suggestion. It has been running for 24 hours with no issue now. Is there anything I can do to help getting this into master soon? More testing?