OpenWrt Forum Archive

Topic: Router hangs after reboot on atheros ar71xx using Engenius 1650

The content of this topic has been archived on 24 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

I have openwrt 8.09.(2) installed on an Engenius 1650. Works fine and very comparable to DIR-300s and Nanostaion2 and Fonera+ I also tested.

Until I issue a simple

reboot

on the machine.

I have a little script in init.s that is called by cron every morning at 4am local time which reboots the router. I have hooked the router to a remote loggin server to see what happens and that is it:

Nov 20 04:00:02 60-234-221-116.bitstream.orcon.net.nz crond[1147]: USER root pid 20467 cmd /etc/init.d/chilli_startup start
Nov 20 04:00:25 60-234-221-116.bitstream.orcon.net.nz init: init: starting pid 20638, tty '': '/etc/init.d/rcS K stop'

/etc/init.d/chilli_startup is my little script that calculates local time and sets up the 4am cron job and is called itself by cron to perform a reboot - which it does as we can see in the next line. The machine response to that command in so far as it 'closes up' that is it is not contactable anymore through ping or other wise but does not enter a complete reboot routine - as I can tell by watching the LEDs.

To ensure it is not my script I issues a 'reboot' on the command line and got the same effect. When I issue a command line reboot I get this through the remote logging:

Nov 20 07:15:01 60-234-221-116.bitstream.orcon.net.nz init: init: starting pid 2004, tty '': '/etc/init.d/rcS K stop'
Nov 20 07:15:03 60-234-221-116.bitstream.orcon.net.nz root: stopping ntpclient

The machine will recover fine if I force a hard reboot by pulling the power plug.

I notice the same effect when flashing a new firmware version and using the mtd -r command. The flash will happen alright, but the reboot at the nd does not. When issuing a reboot command on redboot, the machine will reboot fine.

The log would suggest there is an issue with the /etc/init.d/rcS K stop script but looking at that, it does not actually do much.

Is there something I am doing wrong or awe looking at a bug here? Any advice welcome.

(Last edited by chillifire on 19 Nov 2009, 19:26)

Can confirm this also happens in trunk.
Have submitted ticket.

And sorry if I confused anyone - The Engenius EOC1650 runs on a Atheros AR231x/5312 platform - not ar71xx. Seem I cannot chnage the tittle of thread though.

(Last edited by chillifire on 23 Nov 2009, 10:59)

Hi,

I've got exactly the same problem with the EAP-3660.

Will investigate further tomorrow for more clues.

Cheers

Ok,

We managed to attach a Serial cable to the EnGenius EAP-3660 and I get the following output:

---
root@OpenWrt:/# br-lan: port 1(eth0) entering disabled state
device eth0 left promiscuous mode
br-lan: port 1(eth0) entering disabled state
Restarting system.
watchdog expired, rebooting system
---

I'm not sure if the 'watchdog expired' may be a clue.

Fixed mine smile

Please check out this ticket:

https://dev.openwrt.org/ticket/3953

Reversing the patch fixed it and the unit now reboots without a problem every time!

So just replace the:
    emergency_restart();
With:
    sysRegWrite(AR5315_COLD_RESET, RESET_SYSTEM);

The message:
watchdog expired, rebooting system

Turned out indeed to point to the problem.

Cheers

Thanks for following up and solving the issue.
Well, its nice that a fix was found but this leave me with two questions:

1) Why does that only create a problem for the Engenius devices - all other atheros devices I tested including Nano2, Fon+, Fon2, and Dir300 work fine with that patch and reboot as expected.

2) What does this now take away from the watchdog? Does that mean watchdog will not reboot the system, if it gets into trouble? What is the impact of reversing the patch, is I guess what I am asking.

The patch was signed in by 'cr'. Could maybe the developer leave a comment in this thread? Much appreciated.

Thanks

I'm totally new to this low level code hacking, and I may be wrong in my conclusions, but it seems the:

emergency_restart();

function is part of the gpio abstaction layer. This will in turn do the reset work depending on which chip it runs.

More on the gpio abstraction patch:
https://dev.openwrt.org/ticket/1861

I've came across a Meraki patch which has the following:

---
if (started) {
        printk(KERN_CRIT "Watchdog rebooting...\n");
//        sysRegWrite(AR5315_COLD_RESET, RESET_SYSTEM);
        emergency_restart(); //2315 needs gpio based restart unlike 2317
       
    } else {
----

Somehow it then seems that the 2315 of EnGenius does not like the restart coming from the gpio system, but would rather like it the direct way.

It would still be nice to hear a developers point on this issue, as this is all new ground to me.

Thank you for pointing this out. I had a quick look at th code there is nothing in there that jumps at me which ould ugest the system is ware it handles a 2315 vs. a 2317 vs. a 5315 processor and so forth. And there is only one file in the code repository named ar2315-wtd.c; there is no ar2317-wtd.c or ar5315-wtd.c in addition to that.

So I am at a loss to suggest an easy pacth. The developer who checked in the patch was nbd.
Maybe you (nbd) have some comments, advice vies or suggestions?

Thanks

The point is that EOC 1650 reboot only through GPIO pin 0.

Have a look at the patch I have tested: https://dev.openwrt.org/ticket/6202

Now the problem is choosing the right pin number based on the board type and finally to port all the GPIO reset stuff to the stable branch.

Thanks for this input albe.

I am unclear as to what you are saying. Are you saying the patch you provide solves the problems- or is it just a first step and other things still need to be done?
Is it a solution for trunk only, or does it solve the issue in 8.09 as well.

Please advise.

Thanks

Hi all,

It turns out my 'fix' for EAP-3660 in post No5 was no fix at all!
Here's what happened:

1.) Identified the reboot command problem when Wireless was already activated.
2.) Modified the code as specified in post No5, re-flash, and test.
3.) The unit reboots - WOW fixed! ...... NOT SO FAST!!!! :-(
4.) As soon as the Wireless is activated, the reboot command does not work.

Can it be that the EAP-3660 use yet another GPIO reset pin? (other than the pin 0 as EOC 1650)

Any comments welcome - its is such a small problem that stands between me and the completion of an enterprise size auto configuration system.

I had the same observation for an EOC 1650., The proposed patch did not allow a software reboot at all.

Since there seems to be a bit of interest here, what is the best way to support the developer(s) to fix the issue? Hardware donation, project donation, paid work. I also have an interest in this work being completed. Is there anyway we can move this on a 'priorityl list' through donations or otherwise?

Anyone got any advice/comments?

I'm sure I can tweak my employer's arm to do a payment for work done.

I'm also willing to sponsor the purchase of an EAP-3660.

The problem is that the Ubiquity PicoStation2's we deployed are so unreliable that every time someone sneezes near a power outlet it refuse to return to come back on - and it seems to be a general problem:

http://www.ubnt.com/forum/showthread.ph … amp;page=2

The EAP-3660 seems to be an ideal replacement.

The patch I posted works out-of-the-box for the EOC 1650. It can be easily backported to the stable 8.09.1.
However, from the Openwrt developer's point of view, it HAS to be modified in order to be compatible with other devices, like Ubiquity NS for ex.

dvdwalt, (btw are you the same dvdwalt who developed YFi?)
I suggest finding the correct GPIO trying with gpioctl from 0 to 20. Revert your patch and apply the mine, inserting the correct GPIO number for the EAP3660.

(Last edited by albe on 10 Dec 2009, 00:11)

Hi,

I tried back porting the patch to 8.09 but the lines you are patching in the patch file do not exist in the 8.09 latest buildroot. Whatever the backport - it is not trivial in the sense of applying the same patch to the patch.

Could you please point out how the patch can be ported to 8.09?

Thank You.

@albe

Thanks for the suggestions - did not even knew about the gpioctl command -very handy.

I tried the gpioctl command this morning and the unit sure reboots on a clear of the following pins: 0,32,64.

When I edit the patch file (100-board.patch) and I run a make V=99 again, will it apply the changes I've made to the patch to the new build?

Somehow I've got a suspicion that the changes to the patch does not end up in the new build.

I've tried a make clean, but it leaves some files which then make complains about and refuse to build when I run make again.

I've now tried make distclean and will see how it goes - its doing a lot of downloading again.

Yes I'm the YFi developer.

I'm now working on an add-on which can be used to centrally store OpenWRT device's configuration.
The idea is to have a minimal firmware which causes the unit to boot, get a DHCP addrress and then contact the YFi server, giving its MAC as identification.
The YFi server then hands the unit its settings to it (IE Network, Wireless, OpenVPN tunnel etc).
The unit then reconfigures itself according to the settings given, restart its network.

This will enable us to pre-flash the units with a standard firmware, no additional set-up is then needed (our network uses DHCP)
Swap out's will be a simple manner of replacing a mac entry in the YFi web front-end.
New units installations should also be a snap because we can pre-populate a list of units in YFi and assign the MAC's to the settings as the deployment evolves.

But for all this to work - we need to be able to reset the unit through software!
And that's  how I end up on this problem!

The EAP-3660 is an ideal candidate since it is cheap, and supports POE out of the box.

Cheers

Here's a reply to the last post on the changes not getting to the firmware build:

When I do a find for 100-board.patch I get:

bash-3.2$ find . -name '100-board.patch'
./build_dir/linux-atheros/linux-2.6.30.9/.pc/platform/100-board.patch
./build_dir/linux-atheros/linux-2.6.30.9/patches/platform/100-board.patch
./target/linux/ifxmips/patches-2.6.30/100-board.patch
./target/linux/atheros/patches-2.6.31/100-board.patch
./target/linux/atheros/patches-2.6.30/100-board.patch
./target/linux/amazon/patches/100-board.patch

Changing the files under /target does NOT go into the kernel when I changes to 100-board.patch and run make again.
Changing the files under /build_dir does NOT go into the kernel when I changes to 100-board.patch and run make again.

However
Changing the files which the patches changed in the build_dir caused the changes to to land into the firmware.

A way to confirm that the changes did make it into the kernel, is by adding a printk line in the source to inform you when a piece of code is executed.

printk(KERN_CRIT "This is experimental code\n");

Is there perhaps a more elegant way to force the re-application of kernel patches?

And yes the EnGenius EAP-3660 reboots now using GPIO pin 0 even with WiFi enabled.

Now I can finish the back-end big_smile

Interesting, this would explain my experience. Patching the patches did nothing - just as you experienced.

So you are sayin don't bother patching the patches, change the files directly? Sounds weired as you would expect the patches then to overwrite the changes in the file that you made- - but obviously they do not.

So which are the files then that you changed directly? And did you do this in 8.09 or trunk?

Please advise.

(Last edited by chillifire on 10 Dec 2009, 18:10)

@Chillifire

I've edited the following files, and it was a trunk SVN checkout.

To change the GPIO reset pin number edit:
/build_dir/linux-atheros/linux-2.6.30.9/arch/mips/include/asm/mach-ar231x/ar2315_regs.h

#define AR2315_GPIO_INT_LVL_HIGH                        2   /* High Level Triggered */
#define AR2315_GPIO_INT_LVL_EDGE                        3   /* Edge Triggered */

#define AR2315_RESET_GPIO       0
#define AR2315_NUM_GPIO         22

/*
 *  PCI Clock Control

Then to get the kernel to output a message when it is rebooting - just to prove how cool this patch is working, you can edit the following file (inserting the printk line):
build_dir/linux-atheros/linux-2.6.30.9/arch/mips/ar231x/ar2315.c

static void
ar2315_restart(char *command)
{
        printk(KERN_CRIT "This patch is the best!\n");
        void (*mips_reset_vec)(void) = (void *) 0xbfc00000;

        local_irq_disable();

        /* try reset the system via reset control */
        ar231x_write_reg(AR2315_COLD_RESET,AR2317_RESET_SYSTEM);

        /* Cold reset does not work on the AR2315/6, use the GPIO reset bits a workaround.
         * give it some time to attempt a gpio based hardware reset
         * (atheros reference design workaround) */
        gpio_direction_output(AR2315_RESET_GPIO, 0);
        mdelay(100);

        /* Some boards (e.g. Senao EOC-2610) don't implement the reset logic
         * workaround. Attempt to jump to the mips reset location -
         * the boot loader itself might be able to recover the system */
        mips_reset_vec();
}

Try this and let us know if yours also now reboots with Wifi active.

I looked into the 8.09 build_dir directory and it appears 8.09 is based on the 2.6.26.8 kernel, not the 2.6.30 kernel. The directory structure is different and there is no include directory to start with a

find -name "ar2315*"

shows no results so the files you manipulated do not exist in 8.09. Back to square one for 8.09 I am afraid. Thank you for the info on trunk though. much appreciated.

The discussion might have continued from here.