Optimized build for IPQ40xx devices

Well, I don't think your device is dying. There is something different in the kernel because @dzid26 also reported problems related to the flash just after upgrading. However, it looks like a one shot problem. Needless to say, I do not touch the kernel source code (except to remove the jitter, unrelated to the flash subsystem) so maybe there is a problematic patch from OpenWrt or some update to the source code from above is causing problems.

There are 2 possible reasons for the observed behavior: either you need the "real" ath10k driver (I changed it to the "small buffers" version to reduce memory usage) or the calibration is not working correctly and you need to replace it.

The idea of providing the calibration facility is to allow users to test and use whatever calibration they fit the best.

I have tried to play around with calibration files, and similar to your results, seems pwr calibration is generally the winner, though ah_Y9803 looks best for 2.4Ghz. I could not capture serveside metrics for 2.4Ghz due to the STA mode / do not have access to upstream device. For |hw_1_Y9803 its marked as unstable, the wifi would not stay up; i would connect and drop almost right away.

Client is a 2019 macbook pro, ~4 meters and two drywall based walls away. I am based downtown of a major city center, there are 100's of devices around around.

I will see if the pwd calibration will resolve my issues, if not, to convert to the "real" driver, all i would need to do is remove kmod-ath10k-ct-smallbuffers and install kmod-ath10k-ct ?

5.1Ghz serverside
calibration signal noise SNR
ah_Y9803 -75 -107 32
au_Y9803 -76 -107 31
default -76 -107 31
fcc_Y9803 -76 -107 31
hw_1_Y9803 unstable #VALUE!
ic_Y9803 -74 -105 31
neg_pwr -75 -107 32
pwr -74 -107 33
5.1Ghz client
calibration signal noise SNR
ah_Y9803 -60 -91 31
au_Y9803 -60 -91 31
default -60 -91 31
fcc_Y9803 -59 -90 31
hw_1_Y9803 unstable #VALUE!
ic_Y9803 -61 -91 30
neg_pwr -57 -90 33
pwr -51 -91 40
2.4Ghz STA
calibration signal noise SNR
ah_Y9803 -38 -97 59
au_Y9803 -62 -98 36
default -63 -98 35
fcc_Y9803 -62 -96 34
hw_1_Y9803 unstable #VALUE!
ic_Y9803 -61 -97 36
neg_pwr -73 -96 23
pwr -45 -98 53
1 Like

Yes, that's the solution.

But currently, I do not provide the driver. You cannot install it by now, but I will be updating my build system to allow the installation of all of the ath10k drivers (ct, ct-smallbuffers and vanilla) and all of the ath10k firmwares (ct and vanilla).

About the wireless performance

I'll use this post to give users a fast overview and some guidelines about the efforts for finding the correct wireless calibration for this device.

REPORTS ARE NEEDED to move this into OpenWrt's source code, if you can provide wireless reports, it will help all OpenWrt users that own this device.

Background

This device has an Atheros (ath10k) wireless module, which needs several pieces to work: the kernel module (the driver), the firmware that runs inside the wireless chip CPU, the precalibration which is embeded inside the device's ROM (where the MAC and other device-specific information lives) and the calibration which is loaded by the driver from the filesystem. The part that we are researching is the calibration.

It looks like Linksys can detect where the device is running and somehow select the correct calibration, or maybe it's based on some ID in the ROM. Anyway, OpenWrt have no way to automatically pick a calibration.

Linksys provides a bunch of calibration files for the 2.4 GHz and the 5 GHz wireless modules. Because of the way of how the ath10k driver works, you cannot load them directly to OpenWrt. In order to partially solve this, I've added a custom script and syntisized all the relevant calibrations to the ath10k format.

This allows users to test the calibrations whenever they feel that their wireless performance is not optimal. Still, there is a choice to be done: the default file. This is important because my script and the files I've syntisized are not relevant to "vanilla" OpenWrt.

The default file

The default file is picked by the driver always. As such, it must have to be the best for most cases. Users can opt-in to my ROM and test all files if they feel that their device is not running correctly or can be improved, or just for pure love to the science.

How to do that?

Long story short, you only need to replace a file. Since I've syntisized many files, you can test them one by one. Note that I only have the "relevant" files. This mean that if your device works fine in every radio but with diferent files, you will have to syntisize the file yourself. This is why picking a good default is important.

If you are using my ROM, you only run testcalibration to list all calibration availables and then you run testcalibration [name] where [name] is the calibration you want to try.

BIG NOTE: This will not be true in future ROM releases, because I will stop providing crafted files and will provide all of the original, unmodified Linksys files; and a script to generate the calibration in-place. That will totally change the names of the calibrations and the way of using the testcalibration script, but will allow users to syntisize themselves a custom file that will use the best calibration for both 2.4 and 5 GHz for them.

I want to get involved

If you want to get involved (as this is currently a work in progress) you can provide your results. Here is what you have to do:

  1. Install this custom ROM or somehow extract the requiered files.
  2. Read the attached spreadsheet. It contains the files of the relevant calibrations which are being tested and their results.
  3. Prepare your environment, consider the following:
  • Ensure that only one client (your testing client) is connected to only one of the wireless bands. If you are using the device as an AP, change the names of the wireless access points so only you can connect to only one of the bands. If your device is an STA, the same may apply with the difference that the client is the Linksys.
  • Ensure that you are not moving either device (the Linksys and the testing client) and that your network is as quiet as possible.
  • Fix your wireless channel and the wireless power to: for 2.4 GHz set channel to 1 and power to 30; and for the 5 GHz set channel to 36 and power to 23.
  • Prepare a table to write down the values you'll note. You can use the attached spreadsheet as a template, but note that only the signal and noise values are relevant. Only I have write access to the spreadsheet and I did already the formula to calculate the scores.
  1. Run the testcalibration script with every file in the spreadsheet. Ensure to note all of the relevant information. Once the device reboots for every relevant calibration, you'll need to:
  • Go to LuCI or use the CLI to get the signal and noise of the your client (in AP mode) or your access point (in STA mode), you only need to provide these two values.
  • In your client, there are some apps which allows you to see the wireless performance. As with the above, write down the results of the signal and noise of the Linksys AP that your client reports.
  • The values may change often. Try to write down the most relevant value according to your criteria.
  1. Submit your results in this thread. You can see examples of the reports from other users above this post.

How it's everything going?

The results are compiled in the following spreadsheet: Google Spreadsheets.

3 Likes

Good afternoon, dear @NoTengoBattery
I have a couple of questions for you.
Tell me, please, what may the 255 error be related to when I try to update installed packages?
For example, when updating the package "base-files" - I get an error below:

umount: tmpfs busy - remounted read-only
umount: can't remount tmpfs read-only
umount: can't remount proc read-only
Collected errors:
 * copy_file: unable to open `/etc/group-opkg.backup': Read-only file system.
 * file_copy: Failed to copy file /etc/group to /etc/group-opkg.backup.
 * backup_make_backup: Failed to copy /etc/group to /etc/group-opkg.backup
 * opkg_install_cmd: Cannot install package base-files.
 * pkg_write_filelist: Failed to open //usr/lib/opkg/info/base-files.list: Read-only file system.

And I also need your help, how can I connect an external flash drive as an overlay? Since during the installation I pointed everything correctly, but after rebooting the overlay section will still be mounted in /dev/ubi15_0

Hi, dear @starychenko
Thanks for your interest!

Let's put some things clear here: you should not update any package that you have not manually installed yourself. Packages in this build contains customizations that cannot be overwritted by the packages from OpenWrt as they will prevent the device from running.

The core packages, like kernel, all kernel modules (say, drivers), base-files and many other cannot be updated under any circumstance. You are getting errors because the package base-files is trying to do some "dangerous" changes and the system is protecting itself by remouting the filesystems as read-only.

The external overlay can be changed in LuCI, but changes will not take effect as the overlay setting is embedded in the image. Currently, there is no way to set up that dynamically and since I've choosed to use the space for packages, the only choice you have is to build your own image if you really need more space.

I'm in the research of doing it an opt-in by-default option, but I have not found any solution yet. Please excuse the inconvenience.


The solution

If you have your configuration correctly but it's not working, you can do this: mount the partition that OpenWrt searches for and copy your file there.

  • Create a mount point: mkdir -p /tmp/ubifs_cfg
  • Mount the partition that OpenWrt searches for external overlays: mount /dev/ubi0_1 /tmp/ubifs_cfg
  • Recreate the path to the configuration file: mkdir -p /tmp/ubifs_cfg/etc/config
  • Copy the file that already contains your overlay: cp -a /etc/config/fstab /tmp/ubifs_cfg/etc/config/fstab

Then reboot, it should work but it needs to be done everytime it stops working (i.e. and update). Ensure that your file only contains the overlay mount point, because otherwise the extra mount points may stop working unexpectedly.

I tested and confirmed it's working with a file that contains:

config global
        option anon_swap '0'
        option anon_mount '0'
        option auto_swap '1'
        option auto_mount '1'
        option delay_root '5'
        option check_fs '1'

config mount
        option enabled '1'
        option enabled_fsck '1'
        option options 'lazytime'
        option uuid '1f718a06-61d9-458c-a9fa-1d4e3b1a53fe'
        option target '/overlay'

Obviusly, your mount point or your configuration overall will look different, don't take mine as a template but rather as an example

Upon reboot, the /dev/ubi15_0 entry will be present, but you can safely delete it. You should not add the external overlay again because it will not work anyway, if you need to change it, you need to modify the file by doing the steps above.

1 Like

@NoTengoBattery
Would you know why the stats counter TX/RX are reset at 4G count for Wan but continue pass this for Lan?

Thanks

@NoTengoBattery

Thank you very much for the tip.
I will definitely use your method.

Tell me, please, could you consider replacing the standard "DNS OVER TLS" in the direction of a more anonymous technology "DNS OVER HTTPS".

The fact is that when using "DNS Over TLS", a provider that provides Internet access services can see that we are encrypting traffic using the Deep Packet Inspection traffic filtering system.

When using the DNS Over HTTPS technology that uses port 443, HTTPS headers are assigned. At the same time, traffic between us and the DNS provider, which processes DNS requests, remains encrypted in the tunnel. When using such a technology, a provider that provides us with Internet services does not see that we are using this technology. A deep packet analysis (DPI) system is powerless against such a technology. Which in turn makes surfing the Internet safer and more anonymous.

@sammo I think it's a bug, maybe in LuCI. You know, the 32 bits values will overflow once they reach 2^32-1, which is exactly 4Gi. I don't think it's a bug in the kernel (since it's the kernel that keeps track of it). You can see the counters tracked by the kernel in /proc/net/dev. If the counters are ok, maybe try to see if the ifconfig also show them correctly. The first one to show errors is the culprit and you should file a bug against it (I really doubt the kernel is buggy).


@starychenko I understand your request, but let's think about this:

  1. DNS over HTTPS needs a working DNS first to resolve the name of the service provider, which probably will not be tested by DNSSEC nor encrypted (because you cannot use encryption before resolving the hostname).
  2. You don't need a DPI to know somebody is using DNS over TLS because it's connecting is directed to a well-known port.
  3. Since DNS over TLS uses TLS which is the same technology behind HTTPS, it can't be tampered, spoofed or unencrypted.
  4. DNS over TLS is already slow in a powerless CPU such as this router, and DNS over HTTPS is even slower.
  5. Service providers can tamper the initial unencrypted DNS request that it's used to resolve the hostname, but they can also directly block the enpoint IP (making TOR the only privacy option).

For most of these reasons, and knowing that DNSSEC is also enforced, DNS over TLS looks better in the bast majority of the free world (that excludes, for example, China, Russia, Venezuela, Pakistan and other places). You can opt-in to DNS over HTTPS by uninstalling Unbound and it's LuCI app and replacing it with Stubby. It does not have an app, but is easy to configure from the command line.

If your DNS over TLS is not being blocked, you can use it without any security concern, since it is encrypted the same way as HTTPS is. Deep package inspection is also useless against DNS over TLS since the same technology is used with HTTPS. If somehow the usage of such service is forbidden (such as in China), DNS over HTTPS is also forbidden. I see no gain, except that Stubby is smaller than Unbound (yet it's slower).

I have used ifconfig and cat /proc/dev/net and they both shows the counter resetting once reaching 4G

I have to admit that I didn't researched before, but my guess was totally right. It's a known "feature" (they don't call it a "bug"). There's nothing to do about it, tho. See RedHat Bugzilla.

Thank you for chasing this up. The feature seems to exists on the wan side,. The Lan side goes beyond 4G

Hi, My router is Linksys WRT1900ACS which is also dual partitions. Is it possible to use the feature ‘enable extroot for OEM partition’? Can you give some guide? Or is it possible to make it as a package which can benefit many users. :smiley:

Well, actually the way it is enabled is the same as any external root or overlay. You just need to know where the partition is (and taking care of cleaning everything after upgrading). Generally speaking, it's the latest MTD and it's a UBI partition. For this particular device is the partition 15 (16 if counting from 1), called syscfg.

[    0.652736] Creating 10 MTD partitions on "spi0.0":
[    0.659096] 0x000000000000-0x000000040000 : "SBL1"
[    0.664767] 0x000000040000-0x000000060000 : "MIBIB"
[    0.669552] 0x000000060000-0x0000000c0000 : "QSEE"
[    0.674360] 0x0000000c0000-0x0000000d0000 : "CDT"
[    0.679246] 0x0000000d0000-0x0000000e0000 : "APPSBLENV"
[    0.684049] 0x0000000e0000-0x000000160000 : "APPSBL"
[    0.689007] 0x000000160000-0x000000170000 : "ART"
[    0.694237] 0x000000170000-0x000000190000 : "u_env"
[    0.698758] 0x000000190000-0x0000001b0000 : "s_env"
[    0.703534] 0x0000001b0000-0x0000001c0000 : "devinfo"
[    0.709069] spi-nand spi0.1: Winbond SPI NAND was found.
[    0.712637] spi-nand spi0.1: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
[    0.718420] 6 fixed-partitions partitions found on MTD device spi0.1
[    0.726118] Creating 6 MTD partitions on "spi0.1":
[    0.732427] 0x000000000000-0x000002800000 : "kernel"
[    0.747459] random: fast init done
[    0.865576] 0x000000300000-0x000002800000 : "rootfs"
[    0.866887] mtd: device 11 (rootfs) set to be root filesystem
[    0.870556] mtdsplit: no squashfs found in "rootfs"
[    0.875337] 0x000002800000-0x000005000000 : "alt_kernel"
[    1.008846] 0x000002b00000-0x000005000000 : "alt_rootfs"
[    1.010096] 0x000005000000-0x000005100000 : "sysdiag"
[    1.017602] 0x000005100000-0x000008000000 : "syscfg"

As a custom configuration it cannot be made a package, sadly. Anyway, if you have your partition map (from the kernel boot log) you can easily use it as an external root or overlay (I do recommend you to use it as an external overlay) from LuCi.

In my particular case, I do have some scripts to use it in replacement of the OpenWrt partition which is the exceding space from the firmware, and it ensures that it's cleaned during updates, that it's enabled by default and so on. Of course, it's a custom ROM and therefore I do have absolute control over it.


Post the relevant section of your kernel's boot log (if it already disapeared, reboot the device to get it again) and the output of the ls -l /dev command and I will kindly tell you how to enable it in LuCI and you can share the knowledge if you wish.

Maybe it's because br-lan is a virtual interface? I don't know, actually. The kernel people is weird sometimes.

Thanks for your reply. Here's the information. If you need more, please let me know.

[root@WRT1900ACS:~]# dmesg
...
[    1.023846] Creating 10 MTD partitions on "pxa3xx_nand-0":
[    1.029355] 0x000000000000-0x000000200000 : "u-boot"
[    1.034573] 0x000000200000-0x000000240000 : "u_env"
[    1.039648] 0x000000240000-0x000000280000 : "s_env"
[    1.044727] 0x000000900000-0x000000a00000 : "devinfo"
[    1.049967] 0x000000a00000-0x000003200000 : "kernel1"
[    1.055301] 0x000001000000-0x000003200000 : "rootfs1"
[    1.060627] 0x000003200000-0x000005a00000 : "kernel2"
[    1.065975] 0x000003800000-0x000005a00000 : "ubi"
[    1.070939] 0x000005a00000-0x000008000000 : "syscfg"
[    1.076189] 0x000000280000-0x000000900000 : "unused_area"
...
[root@WRT1900ACS:~]# cat /proc/mtd
dev:    size   erasesize  name
mtd0: 00200000 00020000 "u-boot"
mtd1: 00040000 00020000 "u_env"
mtd2: 00040000 00020000 "s_env"
mtd3: 00100000 00020000 "devinfo"
mtd4: 02800000 00020000 "kernel1"
mtd5: 02200000 00020000 "rootfs1"
mtd6: 02800000 00020000 "kernel2"
mtd7: 02200000 00020000 "ubi"
mtd8: 02600000 00020000 "syscfg"
mtd9: 00680000 00020000 "unused_area"
[root@WRT1900ACS:~]# df -hT
Filesystem           Type            Size      Used Available Use% Mounted on
/dev/root            squashfs       24.8M     24.8M         0 100% /rom
tmpfs                tmpfs         249.5M      8.6M    240.8M   3% /tmp
/dev/ubi0_1          ubifs           3.1M   1016.0K      1.9M  34% /overlay
overlayfs:/overlay   overlay         3.1M   1016.0K      1.9M  34% /
ubi1:syscfg          ubifs          29.6M    524.0K     27.5M   2% /tmp/syscfg
tmpfs                tmpfs         512.0K         0    512.0K   0% /dev
/dev/ubi1_0          ubifs          29.6M    524.0K     27.5M   2% /mnt/ubi1_0
/dev/sda1            ext4          915.9G    606.8G    263.5G  70% /mnt/nas

Yes, thanks for the info!
If you can see, there is a mount point for the syscfg partition and there is another ubi partition, both are the same size but different devices.

If you don't have an entry for them in your /etc/config/fstab, then the system mounts them automatically (a different behavior in this device because it doesn't even generate the /dev nodes and my script generates them during the "preboot" phase).

You will likely want to remove any /overlay and / entries in the fstab, if any, and add the syscfg (let aside the other partition because we don't know the implications, because the syscfg is cleaned internally by the Linksys firmware when the canary file is not found, so it's safe to use). Take this as an example (it should work right away):

config mount
    option enabled '1'
    option options 'bulk_read,compr=zlib'
    option target '/overlay'
    option device '/dev/ubi1_0'

There's no other /overlay or / entries in /etc/config/fstab. I tried the following config and reboot the router, but it's not working.

[root@WRT1900ACS:~]# cat /etc/config/fstab

config global
	option anon_swap '0'
	option auto_swap '1'
	option auto_mount '1'
	option delay_root '5'
	option check_fs '0'
	option anon_mount '1'

config mount
	option uuid '7ee22fac-df4b-482f-a776-825cea16f242'
	option enabled '1'
	option target '/mnt/nas'
	option options 'noatime'

config mount
	option device '/dev/ubi1_0'
	option options 'bulk_read,compr=zlib'
	option target '/overlay'
	option enabled '1'


There's no other entries in /etc/fstab neither.

[root@WRT1900ACS:~]# cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>

Maybe system will mount /overlay and syscfg without using these fstab config files. As you mentioned preinit, here's what I found.

[root@WRT1900ACS:~]# ls /lib/preinit
00_preinit.conf              10_indicate_failsafe         50_indicate_regular_preinit  81_linksys_syscfg
02_default_set_state         10_indicate_preinit          70_initramfs_test            81_urandom_seed
02_sysinfo                   30_failsafe_wait             79_move_config               99_10_failsafe_login
06_set_iface_mac             40_run_failsafe_hook         80_mount_root                99_10_run_init
[root@WRT1900ACS:~]# cat /lib/preinit/81_linksys_syscfg
#
# Copyright (C) 2014-2016 OpenWrt.org
# Copyright (C) 2016 LEDE-Project.org
#

preinit_mount_syscfg() {
	. /lib/functions.sh
	. /lib/upgrade/common.sh

	case $(board_name) in
	linksys,caiman|linksys,cobra|linksys,mamba|linksys,rango|linksys,shelby|linksys,venom)
		needs_recovery=0
		syscfg_part=$(grep syscfg /proc/mtd |cut -c4)
		ubiattach -m $syscfg_part || needs_recovery=1
		if [ $needs_recovery -eq 1 ]
		then
			echo "ubifs syscfg partition is damaged, reformatting"
			ubidetach -m $syscfg_part
			ubiformat -y -O 2048 -q /dev/mtd$syscfg_part
			ubiattach -m $syscfg_part
			ubimkvol /dev/ubi1 -n 0 -N syscfg -t dynamic --maxavsize
		fi
		mkdir /tmp/syscfg
		mount -t ubifs ubi1:syscfg /tmp/syscfg
		[ -f "/tmp/syscfg/$BACKUP_FILE" ] && {
		echo "- config restore -"
		cd /
		mv "/tmp/syscfg/$BACKUP_FILE" /tmp
		tar xzf "/tmp/$BACKUP_FILE"
		rm -f "/tmp/$BACKUP_FILE"
		sync
		}
		;;
	esac
}

boot_hook_add preinit_main preinit_mount_syscfg

I see, they did something very similar to what I did, but they didn't allowed it to be mounted as an external overlay.

Let me explain: when the device upgrades and there is a backup, it will store te backup as a compressed tar file in some persistent partition (in the case of your device, in the syscfg, in this device, inside the currently running partition which in theory will remain untouched since you write the firmware to the other partition).

What I've done in my device is a preboot script which will create the nodes for the syscfg, mount both the partition with the compressed backup and the syscfg and move the file if it's present from the other partition to syscfg. If a canary file is not found, then the partition will be wiped out as it means that the new partition came from a system upgrade (and many packages will prevent the device from running correctly, so you'll end up wiping it anyway).

What's the catch? The preboot scripts run just after the root is mounted, but before any other mount is reached. What does that means is that, in most cases, the root is a squashfs filesystem which is read-only. Any change made to any preboot script when the device is running will not be "visible" to the boot process since it lives in the overlay which is not mounted at that point. If you don't mount the overlay, you have a read-only filesystem leaving the only solution being a custom firmware with a custom preboot script.

In my particular case, since the syscfg is not assumed to exist by the vanilla OpenWrt, my script only moves and checks some files, it does not mount the /overlay itself. It is done in a pre-made /etc/config/fstab embedded in the read-only boot image.

TL;DR: you need a custom firmware to use it as intended.


I don't know who was the genius that disallowed the usage of that partition since it's pretty easy to get it to work, but it requires changes to the firmware image (the image that users flash).

I built my own OpenWrt 19.07.3 image. I'll try it when I build next one. Thank you for all the help. :beers: