Topic: Report from failed attempt with TL-WR1043ND

I have got myself brand new TL-WR1043ND (rev. 1.7) for Christmas. One (and only one) reason was that ASUS WL-500gP has died on me after more than two years of uninterrupted service (just out of warranty).

I set up my build environment for TL-WR1043ND and planned to build same image as I did for WL-500gPv1. I have been doing this for more than two years, from trunk, using latest kernel and building complete set for OpenWrt-Builder, so I thought I am not complete newbie on this one :). My image has few differences from "standard" one.
- I do not use X-Wrt neither Luci
- I do not use "standard" firewall, instead I am using shorewall-lite.
- The configuration for net is almost the same to standard except that wlan is on separate segment (not visible from lan) and there is also openvpn point-to-point tunnel to another machine.
- There are some other packages thrown to the mix (ntpd instead of ntpclient, snmpd, full blown vim, usb-storage support etc.).
Using bleeding edge kernel 2.6.36.2 with ath5k driver (I replaced Broadcom miniPCI in WL-500gP with Atheros part) without any problem, gave me some confidence for the new platform. Also the fact that the image fit into ~7MB and all of this worked fine on ASUS.

So I changed the config for ar71xx target, chose generic subtarget and TL-WR1043ND flavor and rebuilt OpenWrt-Builder. After flashing virgin WR1043ND with new OpenWrt image I got stuck. I could not connect to the router using any port. The router sent only DHCP broadcast, but when served did not ack and repeated the broadcast. Failsafe saved me.

Then I rebuilt the image again and was successful. Unfortunately I cannot say whether it was because:
a) I changed slightly vlan config (I still do not understand what is difference between using '*' and 't' when defining the port list for vlan tag), or
b) I did distclean this time before rebuilding the complete trunk.

Anyway I had at least more or less running image. At that time I faced two major problems:

1) Intermittent lockup of ssh session, sometimes I could reconnect by starting new ssh session. Sometimes I could not.

2) When rebooting the router, at certain time in booting process it seemed that wan and lan ports are completely open and switching, so my machine connected to lan port get served by DHCP server running on wan.
On top of that, it seemed that once the DHCP address was obtained through router in "switch" mode during boot, something got stuck and router never served new DHCP request.

Both problems made the router unusable in real life. So I started to experiment a bit, but with omnipresent ssh lockups it was quite unstable. During one unstable session, when I was trying to flash new image from mounted usb flash drive, something happened - the session did not lock up, but I was disconnected from the router - and since then the router is dead, it does not even get into failsafe mode, neither broadcasts anything.

I assume the flash process has been interrupted so the image is corrupted. I do not know if such corruption will corrupt the whole flash, or I might still have u-boot working, but I cannot verify right now. I am waiting for my Nokia CA-42 cable :).

In my tests I was just trying to get correctly running system, so I did not put the machine to any stress tests. I have not even used wireless part much (except for checking it correctly associates my iPod with WPA2-PSK). I was surprised by overall instability - I hope I will learn more once I get access to serial console.

I am also not sure if the instability may come from the toolchain, since I was opting for this configuration:

  x x       Binutils Version (binutils 2.19.1+20090205 with CodeSourcery enhancements)  --->                      x x
  x x       GCC compiler Version (gcc 4.4.1 with CodeSourcery enhancements)  --->                                 x x
  x x [*]   Compile in support for the new Graphite framework in GCC 4.4+                                         x x
  x x [*]     Use the system versions of PPL and CLooG                                                            x x
  x x       C Library implementation (Use uClibc)  --->                                                           x x
  x x       uClibc Version (uClibc 0.9.32 (with nptl support))  --->                                              x x

During this adventure I also noticed there has been some changes to the system, which made me sometimes pull my hair.

A) new "rdate" option in /etc/config/system. It seemed the new rdate script woke up on every new interface, e.g. eth0.1, eth0.2, br-lan, wlan, vpn, stalling the boot for significant time, because it is default behavior for not having this option set. Why, is not "not setting" this option actually disabling this feature?

B) I noticed script /etc/rc.d/S99sysctl, which seems to setup some net.ipv4 sysctls (using /etc/sysctl.conf), but I am pretty sure, some (maybe many) of them are also configured by shorewall-lite which is executed before. I would expect this to be executed at the time network and/or firewall is initialized (depending on actual setting, e.g. tcp settings at network time, conntrack at firewall time).

C) /etc/config/wireless defaults use "radio0" as moniker for "wifi-device" and "option-device" in wifi-iface, but when I had it, executing wifi down reported error about non-existent device. When I changed it (back) to wlan0 as I had it before on WL-500gP, wifi down stopped complaining.

D) My original configuration for vlans on ASUS was:

config switch_vlan eth0_0
        option device   "eth0"
        option vlan     0
        option ports    "1 2 3 4 5*"

config switch_vlan eth0_1
        option device   "eth0"
        option vlan     1
        option ports    "0 5"

while on TL-WR1043ND I had to: change vlan id 0->1, 1->2, because it seemed to not work with vlan id 0 and I also changed option 'ports' - though I cannot confirm it made difference - by changing '*' to 't' and adding 't' on wan vlan.

config 'switch_vlan'            'vlan1'
        option 'device'         'rtl8366rb'
        option 'vlan'           '1'
        option 'ports'          '1 2 3 4 5t'

config 'switch_vlan'            'vlan2'
        option 'device'         'rtl8366rb'
        option 'vlan'           '2'
        option 'ports'          '0 5t'

It is also not clear to me, why option 'device' is set to 'rtl8366rb', when system then reports it as eth0, but changing 'rtl8366rb' to 'eth0' (which is what I used on ASUS) would not work.

I understand it is shooting in the dark, but if anyone sees something suspicious in my story and let me know, I will appreciate it.

Re: Report from failed attempt with TL-WR1043ND

Can't help you with the 1043, but your 500gP is likely not dead, just it's power supply. Of the four 500gP's I distributed among my family, all four power adapters have died of capacitor plague, but the routers continue to work fine on new adapters.

Re: Report from failed attempt with TL-WR1043ND

I've had no problems in building the TL-WR1043ND image - though I have not changed /etc/config/net or /etc/config/wireless  -  I suspect most of the problems are due to that.
However, I can confirm that I have had the same issue that the WAN and LAN ports seem bridged during the booting process and hence a connected LAN PC gets DHCP lease from the device on the WAN  rather than from the router.

I don't recall having that issue with the WRT54G or WL500gp

Snowyowlster

Re: Report from failed attempt with TL-WR1043ND

This is a hardware issue on this particular model (and a few others). The bootloader initializes the switch with wan and lan bridged together, it remains this way until OpenWrt takes over control during boot.

Re: Report from failed attempt with TL-WR1043ND

Thanks jow. Will try with an unmodified TP-LINK and see if the problem continues
Snowyowlster

6 (edited by KillaB 2011-01-05 00:55:18)

Re: Report from failed attempt with TL-WR1043ND

snowyowlster wrote:

Thanks jow. Will try with an unmodified TP-LINK and see if the problem continues
Snowyowlster

Hi Snowyowlster,
Please contribute your findings with the stock firmware in the following ticket:
https://dev.openwrt.org/ticket/6819

KillaB wrote:

Loaded the original firmware back onto the WR1043ND and confirmed that there is still switch leakage, however the timing of it does not allow for the PC to obtain an IP from the upstream DHCP server.

KillaB wrote:

Yes, there is leakage with the stock firmware, but it is very minimal in comparison with OpenWrt.

The stock firmware appears to do multiple up/down/up actions which causes my PC to think there is a cable disconnect before it can finish it's first DHCP renew. By the time the switch is fully configured and the firewall is up, I can obtain an IP in the proper subnet. Sadly with OpenWrt, this is not the case. I'm able to obtain an IP through the switch during boot, which requires a manual DHCP release/renew to correct.

Re: Report from failed attempt with TL-WR1043ND

Guys, thanks for responses, I am still waiting for cables, so it may take some time I will be able to respond with some new info.
So far I can only confirm to stb - thanks for the tip - after trying different power supply, it seems there can still be some life in wl-500gP smile.

snowyowlster:
Just for the record, what is your rev? Mine is v1.7 - though I do not know if it refers to HW or SW rev. since there have already been so many revisions, it is a bit suspicious.
What configuration do you use when building you image? The one in svn? If not, do you modify toolchain settings?

Re: Report from failed attempt with TL-WR1043ND

I have built for different versions 1.6 and 1.7 I believe.
I build with a few extra packages and I also have a files directory where I modify /etc/config/network and a few more files.

I would suggest that you start building with the stock version in svn. Once that works fine, you can start making changes to files on a running version (without rebuilding). Once everything works, then you can copy the new files over to the files directory

Re: Report from failed attempt with TL-WR1043ND

Gabor has just checked in a fix for this (r24939). Please test
Many thanks to Gabor for this.
Snowyowlster


KillaB wrote:
snowyowlster wrote:

Thanks jow. Will try with an unmodified TP-LINK and see if the problem continues
Snowyowlster

Hi Snowyowlster,
Please contribute your findings with the stock firmware in the following ticket:
https://dev.openwrt.org/ticket/6819

KillaB wrote:

Loaded the original firmware back onto the WR1043ND and confirmed that there is still switch leakage, however the timing of it does not allow for the PC to obtain an IP from the upstream DHCP server.

KillaB wrote:

Yes, there is leakage with the stock firmware, but it is very minimal in comparison with OpenWrt.

The stock firmware appears to do multiple up/down/up actions which causes my PC to think there is a cable disconnect before it can finish it's first DHCP renew. By the time the switch is fully configured and the firewall is up, I can obtain an IP in the proper subnet. Sadly with OpenWrt, this is not the case. I'm able to obtain an IP through the switch during boot, which requires a manual DHCP release/renew to correct.

Re: Report from failed attempt with TL-WR1043ND

I can add a new part to this "failure" story smile. After successfully connecting through serial interface (using ripped Nokia CA-42 cable), I was able to enter boot prompt and flash the device. Unfortunately I made a mistake and flashed the image over the u-boot partition. Instead to 0xbf020000 I flashed it to 0xbf000000, which is pretty stupid mistake, and it would never happen if I had an access to the wiki page at that moment, but since I had to change my IP address to follow TFTP settings in u-boot and did not have an access to internet, I did it from memory ... sad

So now I wonder if it is possible somewhere to get u-boot image (wiki page is not working)? If not, can I build it myself? Also what about the configuration data, which are supposedly stored at the end of first partition?

I read that JTAG cannot be used to write to flash, but I did not understand if it is limitation of Atheros SoC, or tjtag utility. If the former, I will need to unsolder, if the latter I might try different JTAG device.

Thanks for any help.

Log from my u-boot (before I overwrote it):

U-Boot 1.1.4 (Feb  1 2010 - 10:11:24)

AP83 (ar9100) U-boot 0.0.11
DRAM:  
sri
32 MB
id read 0x100000ff
flash size 8MB, sector count = 128
Flash:  8 MB
Using default environment

In:    serial
Out:   serial
Err:   serial
Net:   ag7100_enet_initialize...
No valid address in Flash. Using fixed address
: cfg1 0xf cfg2 0x7114
eth0: 00:03:7f:09:0b:ad
eth0 up
eth0
Autobooting in 1 secondsar7100>

11

Re: Report from failed attempt with TL-WR1043ND

Do it at your own risk.

OpenWrt / Memory mod on Dlink DIR-825
DD-WRT Forum :: View topic - SOLVED - D-LINK DIR-825 - FLASH MEMORY ERASED BY ACCIDENT (photos attached)

RayeR's homepage/Programmer SPI FlashROM for parallel port
RayeR's homepage/Programming - SPIPGM.ZIP ver. 1.8 [79 kB]

SPI FlashROM supported:
***********************
ST Microelectronic:
M25P10 (128kB)
M25P20 (256kB)
M25P40 (512kB)
M25P80 (1MB)
M25P16 (2MB)
M25P32 (4MB)
M25P64 (8MB)
M25P128 (16MB)

Programming an ASUS P5B BIOS | Adventures in Home Computing