OpenWrt Forum Archive

Topic: ar9331's usb stability issue - [SOLVED]

The content of this topic has been archived between 23 Mar 2017 and 6 May 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

here are the register values when using a USB hub...


[ 2340.510000] ar933x reg dump: 0x1c001000, 0x0, 0x84100095, 0xf01b6c
[ 2341.520000] ar933x reg dump: 0x1c001000, 0x0, 0x84100095, 0xf01b6c
[ 2342.530000] ar933x reg dump: 0x1c001000, 0x0, 0x84100095, 0xf01b6c
[ 2343.350000] usb 1-1: new high-speed USB device number 6 using ehci-platform
[ 2343.500000] hub 1-1:1.0: USB hub found
[ 2343.500000] hub 1-1:1.0: 4 ports detected
[ 2343.540000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2344.550000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2345.560000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2346.570000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2347.580000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2348.590000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2349.600000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2350.610000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2351.620000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2352.630000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2353.640000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2354.650000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2355.110000] usb 1-1.2: new full-speed USB device number 7 using ehci-platform
[ 2355.230000] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device
[ 2355.660000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2356.670000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2357.680000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2358.690000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[ 2359.700000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c

note that values do not vary when the usb 1.1 is being used. Also note that the values are different for the hub than for the USB 1.1 device directly connected.

And as always... the USB 1.1. device works perfectly when connected to the hub.

--luis

lsoltero wrote:

I will also mention that like @Squonk I am not seeing ar933x_wmac_reset(void) called once on boot up and when the WLAN is reset.  Otherwise this function is not being called.

--luis

Thank you Luis for performing these tests while I am recovering wink

However, this doesn't give us a clue on what is going on, then sad

@lsoltero:
From your register dump,the 0xb8116c84 and 0xb8116c88 are not 0xdeadbeef as Squonk dump.If the value are 0xdeadbeef,that's means a invalid access...it's a good news that the dumped value seems make sense.
Maybe first we watch the reigister change before and after usb hang is a wrong idea,from the tips we get we need let some bit set,some bit clear about these two register.If these register value are not right,the use will hang,that's the hang reason,and hang without any change about these register ...
Maybe first we need these two register have right value(some bit must set,some bit must clear),in your kernel thread you can add these check(let's it do very offten) and revert and test if the usb will hang.

hello @mips,

here is my current kernel thread.


static int ar933regmon(void *data)
{
    while ( 1 ) {
        pr_err("ar933x reg dump: 0x%x, 0x%x, 0x%x, 0x%x", __raw_readl((const volatile void *) 0xbb000184), __raw_readl((const volatile void *) 0xbb000188),
                __raw_readl((const volatile void *) 0xb8116c84), __raw_readl((const volatile void *) 0xb8116c88));
        msleep(1000);
    }
    return 0;
}

what i will do is i will change the code to dump the values when they change and then do an msleep(10);   This way we will only get print statements when something changes.

I will try to do that now.

OK...

here is the new monitor.


static int ar933regmon(void *data)
{
    unsigned long a = 0xdeadbeef;
    unsigned long b = 0xdeadbeef;
    unsigned long c = 0xdeadbeef;
    unsigned long d = 0xdeadbeef;

    while ( 1 ) {
        unsigned long aa = __raw_readl((const volatile void *) 0xbb0001
        unsigned long bb = __raw_readl((const volatile void *) 0xbb0001
        unsigned long cc = __raw_readl((const volatile void *) 0xb8116c
        unsigned long dd = __raw_readl((const volatile void *) 0xb8116c

        if ( a != aa || b != bb || c != cc || d != dd ) {
                pr_err("ar933x reg dump: 0x%lx, 0x%lx, 0x%lx, 0x%lx", a
                a = aa;
                b = bb;
                c = cc;
                d = dd;
        }
        msleep(1);
    }
    return 0;
}

and here are the results.


root@Optimizer:/# dmesg | grep ar933
[    0.500000] ar933x: WMAC reset called
[    0.500000] ar933x reg dump: 0xdeadbeef, 0xdeadbeef
[    0.510000] ar933x reg dump: 0x1c000004, 0x0
[    0.510000] ar933x reg dump: 0x1c000004, 0x0, 0xdeadbeef, 0xdeadbeef
[    0.660000] ar933x-uart: ttyATH0 at MMIO 0x18020000 (irq = 11) is a AR933X UART
[   19.620000] ar933x reg dump: 0x1c000004, 0x0, 0x84100095, 0xf01b6c
[   20.570000] ar933x reg dump: 0x1c000000, 0x0, 0x84100095, 0xf01b6c
[   20.630000] ar933x reg dump: 0x1c001000, 0x0, 0x84100095, 0xf01b6c
[   28.810000] ar933x reg dump: 0x1c001000, 0x0, 0xdeadbeef, 0xdeadbeef
[   28.910000] ar933x reg dump: 0x1c001000, 0x0, 0x84100095, 0xf01b6c
root@Optimizer:/# [   66.270000] ar933x reg dump: 0x10001801, 0x0, 0x84100095, 0xf01b6c
[   66.290000] ar933x reg dump: 0x10001803, 0x0, 0x84100095, 0xf01b6c
[   66.310000] ar933x reg dump: 0x10001801, 0x0, 0x84100095, 0xf01b6c
[   66.470000] ar933x reg dump: 0x10001101, 0x0, 0x84100095, 0xf01b6c
[   66.530000] ar933x reg dump: 0x10001805, 0x0, 0x84100095, 0xf01b6c
[   66.580000] usb 1-1: new full-speed USB device number 2 using ehci-platform
[   66.590000] ar933x reg dump: 0x10001101, 0x0, 0x84100095, 0xf01b6c
[   66.650000] ar933x reg dump: 0x10001805, 0x0, 0x84100095, 0xf01b6c
[   66.740000] cdc_acm 1-1:1.0: ttyACM0: USB ACM device
[  192.820000] usb 1-1: USB disconnect, device number 2
[  192.830000] cdc_acm 1-1:1.0: acm_ctrl_irq - usb_submit_urb failed: -19
[  192.840000] ar933x reg dump: 0x1c001000, 0x0, 0x84100095, 0xf01b6c

So... there were no changes that could be detected in any of the 4 registers during USB failure.  The USB started up fine... started a PPP session... the phone dialed connected and durning the PPP negotiation the USB interface stopped working.


here is the dump for a successful connection using a USB 2.0 passive HUB.


[  329.080000] ar933x reg dump: 0x10001801, 0x0, 0x84100095, 0xf01b6c
[  329.240000] ar933x reg dump: 0x10001501, 0x0, 0x84100095, 0xf01b6c
[  329.260000] ar933x reg dump: 0x10001901, 0x0, 0x84100095, 0xf01b6c
[  329.300000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[  329.350000] usb 1-1: new high-speed USB device number 3 using ehci-platform
[  329.360000] ar933x reg dump: 0x18001701, 0x0, 0x84100095, 0xf01b6c
[  329.380000] ar933x reg dump: 0x18001b01, 0x0, 0x84100095, 0xf01b6c
[  329.400000] ar933x reg dump: 0x18001701, 0x0, 0x84100095, 0xf01b6c
[  329.420000] ar933x reg dump: 0x18001205, 0x0, 0x84100095, 0xf01b6c
[  329.500000] hub 1-1:1.0: USB hub found
[  329.500000] hub 1-1:1.0: 4 ports detected
[  337.000000] usb 1-1.2: new full-speed USB device number 4 using ehci-platform
[  337.150000] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device

looking forward to comments.

--luis

@lsoltero:
In your while kernel thread,don't monitor as before.
Check the register 0xb8116c84 bit 20,make sure it's always 0,if bit 20 is 1,show the register value and change the bit 20 to 0,other bits in this register should not change.
Check the register 0xb8116c88,make sure the bit 21,bit 22,bit 23 are always 1,make sure the bit 24 and bit 25 are always 0.

hello @mips,

the current monitor prints to the console when any value of the 4 monitored register changes.

        unsigned long aa = __raw_readl((const volatile void *) 0xbb0001
        unsigned long bb = __raw_readl((const volatile void *) 0xbb0001
        unsigned long cc = __raw_readl((const volatile void *) 0xb8116c
        unsigned long dd = __raw_readl((const volatile void *) 0xb8116c
        if ( a != aa || b != bb || c != cc || d != dd ) {
                pr_err("ar933x reg dump: 0x%lx, 0x%lx, 0x%lx, 0x%lx", a
                a = aa;
                b = bb;
                c = cc;
                d = dd;
        }

for some reason the copy paste did not work... so here are the register definitions again..

unsigned long aa = __raw_readl((const volatile void *) 0xbb000184);
unsigned long bb = __raw_readl((const volatile void *) 0xbb000188);
unsigned long cc = __raw_readl((const volatile void *) 0xb8116c84);
unsigned long dd = __raw_readl((const volatile void *) 0xb8116c88);

       

it does this every millisecond.

so we know that 0xb8X registers are not changing  the only register that changes 0xbb000184 and it only changes when when a USB device is plugged into the unit. 

[   66.740000] cdc_acm 1-1:1.0: ttyACM0: USB ACM device

no changes in the registers happened here during a test.

[  192.820000] usb 1-1: USB disconnect, device number 2

i will try to run the test with finer resolution than 1 msec in the loop and report back..

--luis

ok. as it turns out msleep(1) is the smallest interruptible sleep that can be done in the kernel. udelay() does a busy way and basically bricks the kernel when the thread is spawned...

BUT... doing a thread to monitor the register values is probably not the way to go anyway.  If the bits are changing rapidly then the thread can miss them... especially since < 20 ms sleeps are unreliable in linux and usually take longer than they are supposed to.

probably the thing to do is to add the register test in the ehci usb driver before and then again after a read/write.   I am not familiar with the USB code to know where the best place to do this is...

anyone have any ideas were best to place the register test/dump code?

--luis

Hi all,

This may be the final word on this whole story regarding the AR9331's USB stability issue...

Following a conversation that took place between Elektra from the Village Telco and nbd from this forum, it appeared that the "AR9331 chip only implements EHCI, so the USB port is only 2.0 compliant".

The "USB 2.0" term used by Atheros (and other manufacturers too) is rather confusing, since there can be some "USB 2.0"-labeled devices which are in fact only using Full-Speed or even Low-Speed actual protocol. The "USB 2.0" sticker only says that they passed the corresponding version USB-IF compliance, but not at which speed...

At this light, I tried several configurations, and it looks like that the AR9331 only effectively implements EHCI only and not OHCI, so only High-Speed (480Mbps) devices are supported by the AR9331 chip.

So, to summarize, i strongly suspect that:

  • REAL USB 2.0 High-Speed 480 Mbps devices such as 3G modems or USB drives are supported by the AR9331

  • SO-CALLED USB2.0 or ACTUALLY USB 1.1 Full-Speed 12 Mbps such as almost all cdc-acm devices (including GSM/GPRS modems, FTDI chip-based devices, POTS modems, GPS devices, Arduino boards, etc.) appear to work but in fact feature the strange bugs we found and do not work reliably

  • SO-CALLED USB2.0 or ACTUALLY USB 1.1 Low-Speed 1.5 MBps such as mice, keyboards, joysticks and Software-based USB AVR implementations do not work at all

In order for such USB Full-Speed 12 Mbps or Low-Speed 1.5 MBps devices to work with the AR9331, you need to add a High-speed hub (active or passive, it doesn't matter, provided you respect the 100 mA w/o negotiation / 500 mA max current from the USB spec) in-between.

As it looks like it really is a low-level hardware limitation and not a software OHCI driver problem, in this case they seem to work (at least for days) without problem.

(Last edited by Squonk on 13 Nov 2012, 19:34)

lsoltero wrote:

which begs the question...

how can TP-LINK sell TP-MR3040 routers which support a plethora of USB 1.1 GSM 3G and 4G modems on an AR9331 based processor?

http://www.tp-link.com/us/products/deta … =TL-MR3040

http://www.tp-link.com/us/support/3g-co … =TL-MR3040
http://www.tp-link.com/us/support/3g/

I personally own a USB760 modem (one supported by TP-LINK) and know for a fact that this is a USB 1.1 device.

As the USB760 integrates a removable microSD slot, are you sure it doesn't also contain an integrated High-Speed hub, such as a AU6350 chip?

How does it enumerate under OpenWrt? Directly, or do you also see a high-speed hub in-between?

Most modern 3G modems are High-Speed devices, and use /dev/HStty ports.

(Last edited by Squonk on 13 Nov 2012, 19:56)

no hub...

[50838.780000] usb 1-1: USB disconnect, device number 8
[50840.060000] usb 1-1: new full-speed USB device number 9 using ehci-platform

note that on openwrt i have to use usb_modeswitch to get it form the USB storage media to USB modem mode.

but... there is no HUB. 

And BTW... I have not yet been able to get the USB760 working on WRT. Still trying to figure out how to get the virtual serial ports recognized on the system although the important thing here is to note that this is a supported device for TP-LINK and its a USB 1.1 device.

--luis

BTW... just incase anyone cares... here is the usb_modswitch.conf file required to switch the mode on the U760

########################################################
# Novatel U760

DefaultVendor= 0x1410
DefaultProduct=0x5030

TargetVendor=  0x1410
TargetProduct= 0x6000

CheckSuccess=20

MessageContent="5553424312345678000000000000061b000000020000000000000000000000"

I don't know the exact details for this particular modem, but the fact that there is a microSD slot is a clue that it also contains some kind of hub, that TP-Link may be using when they have it working. I can't imagine they provide only a pretty useless only full-speed SDcard support (too slow), and if they have high speed, then why not use it for the modem too?

However, it will be difficult to check the list of supported 3G modems one by one and make sure they are full or high speed devices, and if they work reliably or not.

We should try if possible to get an confirmation by Atheros or some reliable 3rd-party of the fact that the AR9331 USB only supports EHCI and not OHCI, this would clear this issue definitely.

But it seriously looks like it is the case, and as for me, I am now buying small/cheap passive hubs sad

(Last edited by Squonk on 13 Nov 2012, 20:09)

we are doing the same thing...

we are designing passive hubs to be included in our boards as well... sigh....

--luis

Does anyone know anything about why a passive hub fixes this issue, or maybe hooked up a scope to see what is going on? I'm not familiar enough with the USB spec to make any logical guesses.

The issue I'm seeing is actually a little different, as for some AR9331 devices from the same model have issues recognizing certain USB 2.0 high speed devices. The capacitor across C113 trick mentioned in other threads did not help, but a passive hub did. So I'm wondering what exactly the hub does (timing, voltage levels?).

Thanks.

(Last edited by SharkAttack99 on 28 Nov 2012, 07:51)

SharkAttack99 wrote:

Does anyone know anything about why a passive hub fixes this issue, or maybe hooked up a scope to see what is going on? I'm not familiar enough with the USB spec to make any logical guesses.

With High-Speed USB, we speak of signals at 480Mbps... This would require at least a 10x oversampling to get the correct signal shape (Nyquist says 2x, but this is for sine waves...), so this requires in practice an expensive 5Gsps scope!!!

And then, the problem arises after from a few minutes to an hour, so you would have to find a way to trigger when something unknown happens, you can't just press "record" and search in there over this long period.

This is just to give you an hint on why it is so difficult, probably even for Atheros... It is like searching for (something that may not even be)  a needle in a haystack.

SharkAttack99 wrote:

The issue I'm seeing is actually a little different, as for some AR9331 devices from the same model have issues recognizing certain USB 2.0 high speed devices. The capacitor across C113 trick mentioned in other threads did not help, but a passive hub did. So I'm wondering what exactly the hub does (timing, voltage levels?).

It is probably a timing issue, as I was able to connect twice the same "load" (Arduino boards) through a passive hub and have it work perfectly for days, as opposed to a single "load" crashing after a few minutes when connected directly to the USB port.  So it is probably not a voltage or current overload problem here. This is confirmed with the C113 not having any influence.

The problem may happen more often with some chips than with others, and there may also be a connection between wireless and the USB clock PLL loosing sync, but this is purely hypothetical at the moment. You may confirm that by disconnecting wireless while performing your tests, if this is possible.

(Last edited by Squonk on 28 Nov 2012, 08:50)

Squonk wrote:

The problem may happen more often with some chips than with others, and there may also be a connection between wireless and the USB clock PLL loosing sync, but this is purely hypothetical at the moment. You may confirm that by disconnecting wireless while performing your tests, if this is possible.

Thanks for the response. And yes, it would require a very pricey scope. It does not appear WiFi related to me.

Though because it is an issue of the device not being recognized (or more precisely, the driver recognizes a USB device, but then times out trying to communicate with it) it is theoretically much easier to diagnose than the intermittent problem you see with the USB 1.0 devices.

Your best bet would be to put a hardware USB analyzer in-between to see what is going on, but that's also a very expensive tool, not within reach for hobbyists.

Unless we you can catch it using a cheap USB "spy" device or capture software, but I doubt it would give anything good in HS.

Yeah, I was thinking along the lines of a hardware USB analyzer, but it looks like starting price for a high speed is about a grand.

Maybe rent one?

Well if I ever figure anything out I'll be sure to update this thread so others can perhaps benefit.

hello all,

here is a very interesting find...Its the complete data sheet for the AR9331

https://forum.openwrt.org/viewtopic.php … 17#p185817

I have looked at it and cant make much sense of it.  Maybe someone with a better understanding of the OpenWRT USB implementation can use this document to tell us once and for all of this chipset supports USB 1.1 devices.

--luis

Nice find Isoltero. A quick glance through suggests the chip can do high-speed and full-speed (ie USB 1 and 2). So maybe it is the firmware incorrectly setting the select-bits, or it is the circuit design.
This is beyond my knowledge though. Squonk??

Be careful, some parts of the datasheet (p 237) which seem to be generic tell about low/full speed hand-off, but in some other parts, it is explicitly told that there is no hand-off (pp 251-252).

It isn't the circuit design: this happens on several models from TP-Link and Alfa (at least...) which only have in common that they are AR9331-based.

From my experience with the TP-Link T-WR703N, the routing is clean with controlled length and impedance traces, no stub, and only ESD protection devices in the middle.

It is not related to the chip revision either, but results may vary largely from one chip to the other.

Also, if you turn off the wireless, you have no longer problems... Same if you insert a passive hub.

It looks like wireless radio interferes with the USB clock PLL in some way, only in full-speed mode...

Any idea?