Onhub TP-LINK TGR1900 future support?

@dadogroove, Great, dual core is working now after upgrading using your asus sysupgrade non-CT file (openwrt-ipq806x-chromium-asus_onhub-squashfs-sysupgrade.bin)

are there any outstanding issues with these latest image?

Thanks again.

Thanks for the extra details. This highlights an extra problem I forgot: I didn't add kmod-ramoops to DEVICE_PACKAGES, so it doesn't get included by default. (I've enabled it for my own local builds.) If you are feeling adventurous, you could enable kmod-ramoops for your own builds too.

Separately, I noticed another problem: the kmod-ramoops package will log panics, but it won't log kernel messages for other occasions, like clean reboots, or the kernel messages leading up to a hardware watchdog event. I can fix that by enabling CONFIG_PSTORE_CONSOLE=y in the kernel -- I might send such a patch, to improve the ramoops logging in the future.

In the meantime, I've already been running 1 OnHub with these logging fixes, and it rebooted with no explanation recently. Unfortunately, that log shows nothing useful in the ramoops dump. This suggests the kernel didn't print anything useful, but there was likely a hardware watchdog event or similar that caused the reboot.

So, that doesn't give me very many leads at the moment.

One other data point: I have 3 OnHubs running (1 ASUS and 2 TP-Link), and only 1 of them (my TP-Link test device) has rebooted like this. The other 2 have been running without issue for ~18 days. There are two main differences between the good and bad:

  1. The good ones are running an image based off commit 895f38ca1efe. The bad one is running off commit 7396263680b9.
  2. The good ones are actively running a mesh network that I occasionally use. The bad one is just sitting idle on my desk most of the time, with no wireless active.

I'd guess that difference #2 (usage pattern) is more relevant than #1, since the ipq806x-related changes between the two are minimal. But I suppose there's always room for regression in there. For one, there are some 5.15.x kernel bumps in there.

So, still no great leads. It might help if others could get kmod-ramoops + CONFIG_PSTORE_CONSOLE=y builds running, and provide the contents of /sys/fs/pstore/ if/when there are failures. I pushed my latest work to my branch again. I don't expect any different result, but it could be good data anyway.

1 Like

No problem. Thanks for letting me know that the ASUS image works. I am building new ones based off the recent ramoops additions from @bnorris. Once they are completed and I test it on my TP-Link version, I will upload to the same place. Others above have mentioned issues with reboots. I only encountered that once when the device was under heavy load.

I am wondering if that could be related to overloading the CPUs. These devices have other processing cores that are not in use with openwrt but are on the stock software. They are NSS cores which remove load off of the two main CPUs. This can be seen in the OEM bootlog on the wiki page. They are common to the ipq806x processors.

I do not use mesh nor is this my main router. It is just a test device behind my main router.

Just saw that these devices are now in the master repo. Should be able to install the base sysupgrade and add packages as needed. I haven't tried that yet. Since these are master builds, you will have to add luci via opkg as it is not included.

https://downloads.openwrt.org/snapshots/targets/ipq806x/chromium/

Is there a working method to get serial access to these? I am attempting to make a build that has the NSS cores active. I finally have a image that seems to boot (muticolored ring tells me openwrt should be running). The issue is that the network ports are not coming up and wifi is disabled by default, so can't get in that way. If I could see the log, might be able to see what needs adjusting in the dts files to get it functional.

I can get back to a working openwrt by doing the initial install process again after a recovery.

First time posting here. I just wanted to say thanks to those of you who have contributed to this effort. I moved on from Google WiFi last year to TP-Link Omadas because I needed a router with VPN enabled, but I still had 4 of these OnHubs collecting dust in the closet. Now I have repurposed one of them for use as a DNS server via the OpenWRT built in dnsmasq, and I plan to run various docker containers on the others.

So thanks to @bnorris @dadogroove and anyone else who has contributed here. Very cool stuff!

1 Like

While I'm not familiar with the onhub hardware, it isn't too difficult to hack in wireless support if you're building from source, be it by supplying a fully config via files/ or uci-defaults firstboot scripts. Obviously a no-go for normal operations, but this can be very valuable for development builds.

Cool. Thanks for the tip. I'll give it a try and see if I can get in.

I had some installation troubles with the router not being able to get to the recovery routine (purple flashing lights) - I've installed the Google recovery, managed to get the router working with the stock OS, and still couldn't get the lights to work. I've left the router without power for about an hour and then it worked??! Another nuisance is that "white" is something between cream and orange, the next set of lights is orange to red and the final is red.

All 4 routers are OpenWRT powered now and I'll be investigating soon (already looking at the BT audio).

The NSS image does boot and looks like the NSS driver loads but is not engaged.

[    4.965318] pstore: Registered ramoops as persistent store backend
[    4.969664] ramoops: using 0x100000@0x7ff00000, ecc: 0
[    4.978593] **********************************************************
[    4.980936] * Driver    :NSS GMAC Driver - RTL v(3.72a)
[    4.987368] * Version   :1.0
[    4.992577] * Copyright :Copyright (c) 2013-2018 The Linux Foundation. All rights reserved.
[    4.995619] **********************************************************
[    5.004582] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_FS(2)(0x3cf8): 0x8
[    5.010297] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_SRC_CTL(2)(0x3ce0): 0x2
[    5.023179] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_SRC0_MD(2)(0x3ce4): 0x7f0000
[    5.036224] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_SRC1_MD(2)(0x3ce8): 0x7f0000
[    5.050127] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_SRC0_NS(2)(0x3cec): 0x142
[    5.063746] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_SRC1_NS(2)(0x3cf0): 0x142
[    5.077024] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + CLK_HALT_NSSFAB0_NSSFAB1_STATEA(0x3c20): 0x5b00
[    5.090395] (unnamed net_device) (uninitialized): nss_gmac_clk_ctl_dev_init: ctx->clk_ctl_base(0xf0898000) + GMAC_COREn_CLK_CTL(2)(0x3cf4): 0x50
[    5.104453] (unnamed net_device) (uninitialized): nss_gmac_dev_init: nss_base(0xf0fb0000) + NSS_GMACn_CTL(2)(0x38): 0x80c0c
[    5.117385] (unnamed net_device) (uninitialized): nss_gmac_dev_init: nss_base(0xf0fb0000) + NSS_ETH_CLK_DIV0(0xc): 0x0
[    5.128259] (unnamed net_device) (uninitialized): nss_gmac_qsgmii_dev_init: QSGMII_PHY_SGMII_1_CTL(0x13c) - 0xc09c408f
[    5.139004] (unnamed net_device) (uninitialized): nss_gmac_qsgmii_dev_init: NSS_QSGMII_CLK_CTL(0x2c) - 0x0
[    5.149688] (unnamed net_device) (uninitialized): SGMII Specific Init for GMAC2 Done!
[    5.159326] (unnamed net_device) (uninitialized): ioremap OK. Size 0x4000. reg_base 0x37400000. mac_base 0x(ptrval).
[    5.167279] (unnamed net_device) (uninitialized): cannot find platform device from mdio node
[    5.179279] nss-gmac: probe of 37400000.ethernet failed with error -5
[    5.258085] nss_driver - fw of size 544712  bytes copied to load addr: 40000000, nss_id : 0
[    5.258868] nss_driver - Turbo Support 1
[    5.265355] Supported Frequencies - 
[    5.265361] 733Mhz 
[    5.269416] 733Mhz 
[    5.273084] 733Mhz 

There is no ethernet either. The wifi is working by passing config files when building the image (thanks @slh). I am able to connect and see the log but have not been successful in getting the lan up.

[    2.216631] switch0: Atheros AR8337 rev. 2 switch registered on gpio-0
[    3.037560] ipq806x-gmac-dwmac 37000000.ethernet: IRQ eth_wake_irq not found
[    3.037603] ipq806x-gmac-dwmac 37000000.ethernet: IRQ eth_lpi not found
[    3.043965] ipq806x-gmac-dwmac 37000000.ethernet: PTP uses main clock
[    3.050044] ipq806x-gmac-dwmac 37000000.ethernet: missing phy mode property
[    3.056711] ipq806x-gmac-dwmac 37000000.ethernet: device tree parsing error
[    3.063475] ipq806x-gmac-dwmac: probe of 37000000.ethernet failed with error -22

That's an interesting theory. I've now been trying to stress-test that by cycling through various CPU frequencies and running a high load (md5sum /dev/zero tasks), but haven't had any luck inducing a crash in ~24 hours so far.

Regardless, NSS support might be nice. Mind starting a separate thread? I'd join you there in case there's anything I can co-debug on. And publishing your WIP would be helpful too. But I doubt most folks looking for OnHub support are interested in the details, despite this being a "for developers" forum.

By the way, a different OnHub went down the other day, and see it got a proper kernel panic, appended below. I'm not sure yet, but the crash looks pretty random, and seems like it could be explained by a CPU and/or memory corruption rather than a typical coding error.

Crash log
<1>[1767446.939350] 8<--- cut here ---
<1>[1767446.939394] Unable to handle kernel paging request at virtual address 7f7ec5fc
<1>[1767446.941664] pgd = 8dc96e3d
<1>[1767446.949032] [7f7ec5fc] *pgd=00000000
<0>[1767446.951904] Internal error: Oops: 5 [#1] SMP ARM
<4>[1767446.955460] Modules linked in: pppoe ppp_async iptable_nat ath10k_pci ath10k_core ath xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD snd_soc_lpass_ipq806x snd_soc_lpass_cpu pppox ppp_generic nf_nat nf_flow_table nf_conntrack mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG vpd_sysfs spidev snd_soc_storm snd_soc_max98357a snd_soc_lpass_platform slhc nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 memconsole_coreboot memconsole lcc_ipq806x iptable_mangle iptable_filter ip_tables crc_ccitt coreboot_table compat snd_soc_core ledtrig_usbport ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 oid_registry snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer snd soundcore input_core seqiv cmac leds_gpio ohci_platform ohci_hcd ahci fsl_mph_dr_of ehci_platform ehci_fsl tpm_i2c_infineon ahci_platform libahci_platform libahci libata
<4>[1767446.956248]  ehci_hcd ramoops reed_solomon pstore gpio_button_hotplug f2fs ext4 mbcache jbd2 tpm crc32c_generic crc32_generic
<4>[1767447.047660] CPU: 0 PID: 32678 Comm: sh Tainted: G        W         5.15.86 #0
<4>[1767447.058935] Hardware name: Generic DT based system
<4>[1767447.066226] PC is at __pagevec_lru_add+0x1e0/0x368
<4>[1767447.071431] LR is at __pagevec_lru_add+0x1d8/0x368
<4>[1767447.076379] pc : [<c0483744>]    lr : [<c048373c>]    psr: 20000093
<4>[1767447.081330] sp : c37b5d30  ip : c37b5d30  fp : c37b5d7c
<4>[1767447.087664] r10: c1422800  r9 : c37b4000  r8 : 00000001
<4>[1767447.093046] r7 : eff94a5c  r6 : 0000000a  r5 : 00000001  r4 : eff94be4
<4>[1767447.098430] r3 : 40080034  r2 : 40080034  r1 : eff94be4  r0 : 7f7ec5a0
<4>[1767447.105202] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
<4>[1767447.111889] Control: 10c5787d  Table: 4775006a  DAC: 00000051
<1>[1767447.119262] Register r0 information: non-paged memory
<1>[1767447.124903] Register r1 information: non-slab/vmalloc memory
<1>[1767447.130112] Register r2 information: non-paged memory
<1>[1767447.136014] Register r3 information: non-paged memory
<1>[1767447.141136] Register r4 information: non-slab/vmalloc memory
<1>[1767447.146347] Registe
<1>[1767447.152248] Register r6 information: non-paged memory
<1>[1767447.157370] Register r7 information: non-slab/vmalloc memory
<1>[1767447.162578] Register r8 information: non-paged memory
<1>[1767447.168481] Register r9 information: non-slab/vmalloc memory
<1>[1767447.173602] Register r10 information: slab kmalloc-2k start c1422800 pointer offset 0 size 2048
<1>[1767447.179510] Register r11 information: non-slab/vmalloc memory
<1>[1767447.188447] Register r12 information: non-slab/vmalloc memory
<0>[1767447.194175] Process sh (pid: 32678, stack limit = 0xae9ed0a0)
<0>[1767447.200081] Stack: (0xc37b5d30 to 0xc37b6000)
<0>[1767447.205984] 5d20:                                     00000018 c0f8009c c0fda740 c0fda740
<0>[1767447.210506] 5d40: c0fdde40 ef70c13c c37b5d6c 60000013 c05edbd4 c0f05448 00000000 c3a45800
<0>[1767447.218839] 5d60: c37acc00 c3729da0 c37b6600 c3729900 c37b5d9c c37b5d80 c0483b58 c0483570
<0>[1767447.227173] 5d80: ffffe000 c3b09780 c3a45800 c37acc00 c37b5db4 c37b5da0 c0483c18 c0483ab4
<0>[1767447.235505] 5da0: c3b09780 c3b09780 c37b5e24 c37b5db8 c04c080c c0483bec c03550d4 c0318020
<0>[1767447.243839] 5dc0: c37b5e3c c37b5dd0 c0318020 c0b21088 c23c2e98 c03550d4 000004a4 c031efdc
<0>[1767447.252173] 5de0: c37b5e04 c0be8ee0 c37b5e24 c37b5df8 c03550d4 c0354f00 00000000 c3a45800
<0>[1767447.260505] 5e00: 00000000 c3a45800 c37acc00 c3a45800 00000000 c3a45800 c37b5e3c c37b5e28
<0>[1767447.268839] 5e20: c031efe4 c04c07d0 00000000 c3729900 c37b5e74 c37b5e40 c0508f38 c031ef80
<0>[1767447.277173] 5e40: c37b5e74 c3a45800 c056e1dc 00000001 00000001 00000000 c37b6600 c37b665c
<0>[1767447.285505] 5e60: c7083e40 c1df33c0 c37b5f0c c37b5e78 c056f490 c0508ab8 c04fe834 c0555fa0
<0>[1767447.293839] 5e80: 00000100 c0398784 c3729900 c04e7f70 c37b665c 00000100 00000001 00000000
<0>[1767447.302171] 5ea0: 00000000 00000001 00000000 00000000 c1df3600 c3b0a180 00000001 c3b8c900
<0>[1767447.310504] 5ec0: 00000034 00000000 00000000 00000000 00000000 00000000 00000006 b3290597
<0>[1767447.318839] 5ee0: ffffff9c c0fa0fb0 fffffff8 c101dae0 c37b6600 c0f981f4 c37b4000 ffffe000
<0>[1767447.327171] 5f00: c37b5f5c c37b5f10 c0507e3c c056f2d8 00007fa6 00007fa6 00000000 00000006
<0>[1767447.335504] 5f20: c37b665c 00000002 00000100 00000000 c37b5f5c 00000000 c37b6600 c1d7a000
<0>[1767447.343838] 5f40: ffffff9c b6fd2b68 b6fd2b84 0000000b c37b5f84 c37b5f60 c05086e0 c0507b90
<0>[1767447.352173] 5f60: b6fd2b68 b6fd2b84 b6fd2b84 0000000b c03002a4 c37b4000 c37b5fa4 c37b5f88
<0>[1767447.360505] 5f80: c05097c8 c0508538 00000000 00031b20 b6fd2b9c b6fd2b68 00000000 c37b5fa8
<0>[1767447.368839] 5fa0: c0300060 c0509794 b6fd2b9c b6fd2b68 b6fd2b9c b6fd2b68 b6fd2b84 b6fe1020
<0>[1767447.377171] 5fc0: b6fd2b9c b6fd2b68 b6fd2b84 0000000b 00072820 000905f8 000906fc 000905f8
<0>[1767447.385506] 5fe0: 0008fb5c bea0555c 0002d7b8 b6f8fb04 60000010 b6fd2b9c 00000000 00000000
<0>[1767447.393832] Backtrace: 
<0>[1767447.402159] [<c0483564>] (__pagevec_lru_add) from [<c0483b58>] (lru_add_drain_cpu+0xb0/0x138)
<0>[1767447.404954]  r10:c3729900 r9:c37b6600 r8:c3729da0 r7:c37acc00 r6:c3a45800 r5:00000000
<0>[1767447.413452]  r4:c0f05448
<0>[1767447.421428] [<c0483aa8>] (lru_add_drain_cpu) from [<c0483c18>] (lru_add_drain+0x38/0x54)
<0>[1767447.424220]  r7:c37acc00 r6:c3a45800 r5:c3b09780 r4:ffffe000
<0>[1767447.432456] [<c0483be0>] (lru_add_drain) from [<c04c080c>] (exit_mmap+0x48/0x204)
<0>[1767447.438278]  r5:c3b09780 r4:c3b09780
<0>[1767447.445821] [<c04c07c4>] (exit_mmap) from [<c031efe4>] (mmput+0x70/0x168)
<0>[1767447.449652]  r6:c3a45800 r5:00000000 r4:c3a45800
<0>[1767447.456501] [<c031ef74>] (mmput) from [<c0508f38>] (begin_new_exec+0x48c/0xcdc)
<0>[1767447.461369]  r5:c3729900 r4:00000000
<0>[1767447.468912] [<c0508aac>] (begin_new_exec) from [<c056f490>] (load_elf_binary+0x1c4/0x1534)
<0>[1767447.472573]  r10:c1df33c0 r9:c7083e40 r8:c37b665c r7:c37b6600 r6:00000000 r5:00000001
<0>[1767447.481159]  r4:00000001
<0>[1767447.488876] [<c056f2cc>] (load_elf_binary) from [<c0507e3c>] (bprm_execve+0x2b8/0x6b4)
<0>[1767447.491670]  r10:ffffe000 r9:c37b4000 r8:c0f981f4 r7:c37b6600 r6:c101dae0 r5:fffffff8
<0>[1767447.499909]  r4:c0fa0fb0
<0>[1767447.507625] [<c0507b84>] (bprm_execve) from [<c05086e0>] (do_execveat_common+0x1b4/0x218)
<0>[1767447.510420]  r10:0000000b r9:b6fd2b84 r8:b6fd2b68 r7:ffffff9c r6:c1d7a000 r5:c37b6600
<0>[1767447.518660]  r4:00000000
<0>[1767447.526637] [<c050852c>] (do_execveat_common) from [<c05097c8>] (sys_execve+0x40/0x48)
<0>[1767447.529429]  r9:c37b4000 r8:c03002a4 r7:0000000b r6:b6fd2b84 r5:b6fd2b84 r4:b6fd2b68
<0>[1767447.537667] [<c0509788>] (sys_execve) from [<c0300060>] (ret_fast_syscall+0x0/0x48)
<0>[1767447.545393] Exception stack(0xc37b5fa8 to 0xc37b5ff0)
<0>[1767447.553295] 5fa0:                   b6fd2b9c b6fd2b68 b6fd2b9c b6fd2b68 b6fd2b84 b6fe1020
<0>[1767447.558336] 5fc0: b6fd2b9c b6fd2b68 b6fd2b84 0000000b 00072820 000905f8 000906fc 000905f8
<0>[1767447.566665] 5fe0: 0008fb5c bea0555c 0002d7b8 b6f8fb04
<0>[1767447.574994]  r5:b6fd2b68 r4:b6fd2b9c
<0>[1767447.580205] Code: e1a00004 eb005977 e3500000 0a00001b (e590305c) 
<4>[1767447.584025] ---[ end trace 778a7acdd7ca6b3a ]---

I'm not really a developer. I'm just cloning the master repo and adding some files from other repos to see what the build system spits out and see if it works. I have another router (Askey RT4230W) that is an ipq8065 device and got interested in the NSS through that. Not really sure how to create a repo with the different source files needed to build what I've done so far. My issue is I don't really know the code involved in the dts and dtsi files to really know what changes need to be made. It's really just copy / paste for me. I feel that I know just enough to be dangerous but luckily the OnHub is pretty easy to recover, so far.

As far as the CPU issue goes, I just noticed using htop that when doing a speed test using the OnHub that cpu0 spikes to 100% until the test is done. While that does not occur on the RT4230W which has the NSS core working. Both CPUs stay below 10% usually. I realize that a speed test is not a true stress test of anything though.

If I can figure out how to put something up on github, I'll let you know. For now,
I'll just keep hacking along.

with the latest commits i built the image myself but everytime i reboot the device, all settings been reverted to default. but not with the snapshot build, any idea ? is it because of this commit ?

edit: with commits mentiond above, i can boot from usb but when doing 'dd' stuff, can not boot directly from emmc, had to do recovery thing. if i revert those commits (fstools commit included), boot normally from eemc.

can someone share performance of wifi speed ?
i got only aroung 150mbps with 5Ghz

1 Like

I got around the same speed 130 - 140 mbps when testing using iperf3

moving irq of the lan port to cpu 2 i got iperf3 down speed around 260mbps, up 170mbps
running iperf3 to onhub with cable i got 1gigs down and 200mbps up, is this normal for cable ?

Do you have a link where I can access the recovery image? I have exactly the same issue that you have in terms of the LED going blue on for a few seonds and not loading the image from the USB. I have two successfully installed onepwrt onhubs and am facing this problem with the third. I have recovery image called chromeos_9334.41.3_whirlwind_recovery_stable-channel_mp.bin, but while flahing it to a usb drive, it has split the drive into 6 or 7 partitions and I am hesitant to use it for fear that it will brick the Onhub

That's fine. This is how my flash drive that I have successfully used at least three times now, is partitioned:

❯❯❯ lsblk -o NAME,LABEL,FSTYPE,SIZE /dev/sdd
NAME    LABEL      FSTYPE  SIZE
sdd                       29.7G
├─sdd1             ext2      4M
├─sdd2                      16M
├─sdd3  ROOT-A     ext4    800M
├─sdd4                      16M
├─sdd5                     800M
├─sdd6                     512B
├─sdd7                     512B
├─sdd8  OEM        ext4     16M
├─sdd9                     512B
├─sdd10                    512B
├─sdd11                      8M
└─sdd12 EFI-SYSTEM vfat     16M

ok. That is how mine is partitioned too. Will try it out now. Thank you!

1 Like

It looks like applying the snapshot sysupgrade wiped the configuration when updated via the WebUI on one of the initial TP Link builds. It's on a spare unit, and I did a backup beforehand, so I can just restore that but is there something I missed that would allow the configuration to persist across the sysupgrade?

Early versions of my patches had a different, incompatible storage layout to what was eventually committed to openwrt. So it's expected to have a one time loss of configuration if you were using the early image. But that shouldn't happen on future upgrades.

1 Like

I'm sure the answer is no, but is there any chance that we would be able to make use of the Zigbee radio in the OnHub? Are there any drivers for the Silicon Labs EM3581?