1 (edited by lsoltero 2013-12-16 23:16:08)

Topic: ar9331's usb stability issue - [SOLVED]

Hello All,

we are experiencing USB issues with CDC/ACM and USB to Serial adapters connected to an Alfa Hornet-U board.  This board is based on the Atheros 9331 chipset.   

We are working with a very recent version of AA from trunk (33561)

The problem is very similar in nature to one reported for TP-LINK routers also based on the 9331 chipset.  The interesting thing is that if a passive (non powered) USB 2.0 hub is used then CDC/ACM and USB to Serial adapters work fine.

Here is a full description of the problem and symptoms.


When working withe USB we notice that when plugging in CDC/ACM devices (i.e. satellite terminals or GSM dongles) the USB port works for a bit and then becomes unresponsive.  Resetting the USB port by unplugging the cable and putting it back in gets the device back on line. However, none of the devices plugged into the port will work sufficiently long to send data.  You can connect to them with a terminal emulator (cu in our case) and type AT commands but as soon as you do an ATDT and get a CONNECT and ppp data starts moving through the internetface the serial port 'hangs'.

The interesting thing is that we just noticed this... we have been working with the USB port extensively but always using a USB hub.  The passive hub (i.e. not powered) has the CDC/ACM device plugged into it as well as a CMedia USB sound card.  What is strange is that the Vitual serial port works perfectly.  Data connections work without issue over the satellite phones in this configuration.

Googling the internet for this issue shows a few results.  it seems that this is a fairly common problem with Atheros processors.

the tl-mr3040 is known to have this issue.
http://wiki.openwrt.org/toh/tp-link/tl-mr3040

if you look on the openwrt site and scroll down to the hardware description you will find the following.

USB Issue

A problem has been detected with the USB interface on the device. When connected to different models of serial adaptor, it functions for a few minutes then fails. This can be resolved by putting a passive hub in-line with the device. Voltage and current testing will be performed to identify why this behaviour occurs.

this device is based on the AR9331 as is the Hornet.

Also
http://www.mail-archive.com/openwrt-devel@lists.openwrt.org/msg13693.html

if you look at the 5 entries in this list you will note that the AR9331 is not the only device that is having this issue.

this posting
http://www.mail-archive.com/openwrt-devel@lists.openwrt.org/msg13690.html

suggest that the problem has to do with hardware voltage and other possible issues with the board implementation. a link is included to a hardware mod discussion that presents the issue with the Atherus based boards.


here is more detailed info on the USB problem we are having with CDC/ACM modems with the native serial port on the AP121U.

Here is a connection using a passive (non powered) 2.0 usb hub and an iridium phone.  You will note that the connection works perfectly.  Following that I will show logs for the same device connected directly to the Hornet-UB usb which fails.

------ USB HUB -- WORKING CONNECTION ----



This the kernel mesgs detecting the HUB

[  483.150000] usb 1-1: USB disconnect, device number 3
Oct 16 17:39:12 Optimizer kern.info kernel: [  483.150000] usb 1-1: USB disconnect, device number 3
[  493.450000] usb 1-1: new high-speed USB device number 4 using ehci-platform
[  493.600000] hub 1-1:1.0: USB hub found
[  493.600000] hub 1-1:1.0: 4 ports detected
Oct 16 17:39:22 Optimizer kern.info kernel: [  493.450000] usb 1-1: new high-speed USB device number 4 using ehci-platform
Oct 16 17:39:22 Optimizer kern.info kernel: [  493.600000] hub 1-1:1.0: USB hub found
Oct 16 17:39:22 Optimizer kern.info kernel: [  493.600000] hub 1-1:1.0: 4 ports detected


this is the kernel msgs for detecting the USB ACM device.

[  528.330000] usb 1-1.2: new full-speed USB device number 5 using ehci-platform
[  528.450000] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device
Oct 16 17:39:57 Optimizer kern.info kernel: [  528.330000] usb 1-1.2: new full-speed USB device number 5 using ehci-platform
Oct 16 17:39:57 Optimizer kern.info kernel: [  528.450000] cdc_acm 1-1.2:1.0: ttyACM0: USB ACM device

here is a successful PPP connection...

Oct 16 17:40:21 Optimizer daemon.notice pppd[2018]: pppd 2.4.5 started by root, uid 0
Oct 16 17:40:22 Optimizer local2.info chat[2028]: abort on (BUSY)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: abort on (VOICE)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: abort on (NO CARRIER)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: abort on (NO DIALTONE)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: abort on (NO DIAL TONE)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: timeout set to 10 seconds
Oct 16 17:40:22 Optimizer local2.info chat[2028]: send (AT^M)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: expect (OK)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:22 Optimizer local2.info chat[2028]: OK
Oct 16 17:40:22 Optimizer local2.info chat[2028]:  -- got it
Oct 16 17:40:22 Optimizer local2.info chat[2028]: send (AT&F^M)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: expect (OK)
Oct 16 17:40:22 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:22 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:22 Optimizer local2.info chat[2028]: OK
Oct 16 17:40:22 Optimizer local2.info chat[2028]:  -- got it
Oct 16 17:40:22 Optimizer local2.info chat[2028]: send (AT &F E0 V1 &D2 &C1 W2 S95=47 S0=0 +cbst=71,0,1^M)
Oct 16 17:40:23 Optimizer local2.info chat[2028]: timeout set to 60 seconds
Oct 16 17:40:23 Optimizer local2.info chat[2028]: expect (OK)
Oct 16 17:40:23 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:23 Optimizer local2.info chat[2028]: AT &F E0 V1 &D2 &C1 W2 S95=47 S0=0 +cbst=71,0,1^M^M
Oct 16 17:40:23 Optimizer local2.info chat[2028]: OK
Oct 16 17:40:23 Optimizer local2.info chat[2028]:  -- got it
Oct 16 17:40:23 Optimizer local2.info chat[2028]: send (ATDT008816000025^M)
Oct 16 17:40:23 Optimizer local2.info chat[2028]: expect (CONNECT)
Oct 16 17:40:23 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: CARRIER 9600^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: PROTOCOL: IRLP^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: COMPRESSION: NONE^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: ^M
Oct 16 17:40:33 Optimizer local2.info chat[2028]: CONNECT
Oct 16 17:40:33 Optimizer local2.info chat[2028]:  -- got it
Oct 16 17:40:33 Optimizer daemon.info pppd[2018]: Serial connection established.
Oct 16 17:40:33 Optimizer daemon.info pppd[2018]: Using interface ppp0
Oct 16 17:40:33 Optimizer daemon.notice pppd[2018]: Connect: ppp0 <--> /dev/ttyACM0
Oct 16 17:40:40 Optimizer daemon.info pppd[2018]: CHAP authentication succeeded
Oct 16 17:40:40 Optimizer daemon.notice pppd[2018]: CHAP authentication succeeded
Oct 16 17:40:46 Optimizer daemon.notice pppd[2018]: replacing old default route to eth1 [192.168.0.1]
Oct 16 17:40:46 Optimizer daemon.notice pppd[2018]: local  IP address 192.168.21.75
Oct 16 17:40:46 Optimizer daemon.notice pppd[2018]: remote IP address 192.168.21.254
Oct 16 17:40:46 Optimizer daemon.notice pppd[2018]: primary   DNS address 12.127.17.72
Oct 16 17:40:46 Optimizer daemon.notice pppd[2018]: secondary DNS address 204.97.212.10
Oct 16 17:40:48 Optimizer daemon.info pppd[2061]: CCP terminated by peer
Oct 16 17:40:48 Optimizer daemon.notice pppd[2061]: Compression disabled by peer.
Oct 16 17:40:56 Optimizer daemon.info pppd[2061]: Terminating on signal 15
Oct 16 17:40:56 Optimizer daemon.info pppd[2061]: Connect time 0.2 minutes.
Oct 16 17:40:56 Optimizer daemon.info pppd[2061]: Sent 12 bytes, received 12 bytes.
Oct 16 17:40:56 Optimizer daemon.notice pppd[2061]: restoring old default route to eth1 [192.168.0.1]
Oct 16 17:40:57 Optimizer daemon.notice pppd[2061]: Connection terminated.
Oct 16 17:40:58 Optimizer local2.info chat[2096]: abort on (BUSY)
Oct 16 17:40:58 Optimizer local2.info chat[2096]: abort on (VOICE)
Oct 16 17:40:58 Optimizer local2.info chat[2096]: abort on (NO CARRIER)
Oct 16 17:40:58 Optimizer local2.info chat[2096]: abort on (NO DIALTONE)
Oct 16 17:40:58 Optimizer local2.info chat[2096]: abort on (NO DIAL TONE)
Oct 16 17:40:58 Optimizer local2.info chat[2096]: timeout set to 10 seconds
Oct 16 17:40:58 Optimizer local2.info chat[2096]: send (+++^M)
Oct 16 17:40:58 Optimizer local2.info chat[2096]: expect (OK)
Oct 16 17:41:00 Optimizer local2.info chat[2096]: TUUUUUUUUUUUU^M
Oct 16 17:41:00 Optimizer local2.info chat[2096]: NO CARRIER
Oct 16 17:41:00 Optimizer local2.info chat[2096]:  -- failed
Oct 16 17:41:00 Optimizer local2.info chat[2096]: Failed (NO CARRIER)
Oct 16 17:41:00 Optimizer daemon.warn pppd[2061]: disconnect script failed
Oct 16 17:41:01 Optimizer daemon.info pppd[2061]: Exit.


-----------



Logs for DCD/ACM device directly connected to the AP121U which fails...

----- DIRECT CONNECTION --- FAILS ---



USB device is plugged into the system.
[  839.120000] usb 1-1: new full-speed USB device number 6 using ehci-platform
[  839.290000] cdc_acm 1-1:1.0: ttyACM0: USB ACM device
Oct 16 17:45:08 Optimizer kern.info kernel: [  839.120000] usb 1-1: new full-speed USB device number 6 using ehci-platform
Oct 16 17:45:08 Optimizer kern.info kernel: [  839.290000] cdc_acm 1-1:1.0: ttyACM0: USB ACM device


the following commands are executed from the command line to make sure the device is working properly.
root@Optimizer:/sys/devices/platform/ehci-platform/usb1/1-1# echo 0 > authorized
root@Optimizer:/sys/devices/platform/ehci-platform/usb1/1-1# echo 1 > authorized

A PPPD session is started...

[  881.310000] cdc_acm 1-1:1.0: ttyACM0: USB ACM device
[  881.320000] usb 1-1: authorized to connect
Oct 16 17:45:50 Optimizer kern.info kernel: [  881.310000] cdc_acm 1-1:1.0: ttyACM0: USB ACM device
Oct 16 17:45:50 Optimizer kern.info kernel: [  881.320000] usb 1-1: authorized to connect
Oct 16 17:46:04 Optimizer daemon.notice pppd[2229]: pppd 2.4.5 started by root, uid 0
Oct 16 17:46:05 Optimizer local2.info chat[2240]: abort on (BUSY)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: abort on (VOICE)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: abort on (NO CARRIER)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: abort on (NO DIALTONE)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: abort on (NO DIAL TONE)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: timeout set to 10 seconds
Oct 16 17:46:05 Optimizer local2.info chat[2240]: send (AT^M)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: expect (OK)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: ^M
Oct 16 17:46:05 Optimizer local2.info chat[2240]: OK
Oct 16 17:46:05 Optimizer local2.info chat[2240]:  -- got it
Oct 16 17:46:05 Optimizer local2.info chat[2240]: send (AT&F^M)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: expect (OK)
Oct 16 17:46:05 Optimizer local2.info chat[2240]: ^M
Oct 16 17:46:05 Optimizer local2.info chat[2240]: ^M
Oct 16 17:46:05 Optimizer local2.info chat[2240]: OK
Oct 16 17:46:05 Optimizer local2.info chat[2240]:  -- got it
Oct 16 17:46:05 Optimizer local2.info chat[2240]: send (AT &F E0 V1 &D2 &C1 W2 S95=47 S0=0 +cbst=71,0,1^M)
Oct 16 17:46:06 Optimizer local2.info chat[2240]: timeout set to 60 seconds
Oct 16 17:46:06 Optimizer local2.info chat[2240]: expect (OK)
Oct 16 17:46:06 Optimizer local2.info chat[2240]: ^M
Oct 16 17:46:06 Optimizer local2.info chat[2240]: AT &F E0 V1 &D2 &C1 W2 S95=47 S0=0 +cbst=71,0,1^M^M
Oct 16 17:46:06 Optimizer local2.info chat[2240]: OK
Oct 16 17:46:06 Optimizer local2.info chat[2240]:  -- got it
Oct 16 17:46:06 Optimizer local2.info chat[2240]: send (ATDT008816000025^M)
Oct 16 17:46:06 Optimizer local2.info chat[2240]: expect (CONNECT)
Oct 16 17:46:06 Optimizer local2.info chat[2240]: ^M
Oct 16 17:47:06 Optimizer local2.info chat[2240]: alarm
Oct 16 17:47:06 Optimizer local2.info chat[2240]: Failed
Oct 16 17:47:06 Optimizer daemon.err pppd[2229]: Connect script failed
Oct 16 17:47:22 Optimizer daemon.info pppd[2229]: Exit.

the connection script failed... never got the CONNECT... yet the call was made and time was accumulating on the phone... Basically data stopped flowing through the interface...


the following commands
root@Optimizer:/sys/devices/platform/ehci-platform/usb1/1-1# echo 0 > authorized
root@Optimizer:/sys/devices/platform/ehci-platform/usb1/1-1# echo 1 > authorized

now result in this error...

[  996.220000] usb 1-1: can't re-read device descriptor for authorization: -145
Oct 16 17:47:45 Optimizer kern.err kernel: [  996.220000] usb 1-1: can't re-read device descriptor for authorization: -145

so... the USB device is no longer responsive.


--------


Note that exactly the same happens when using a PL2303HX USB to serial adapter.  So... it seems that whatever USB protocol is being used by all the serial adapters is at fault.  We observe the same symptoms with any/all CDC/ACM and USB to Serial port adapters that we have tested.

However... when connected through a HUB everything works perfectly.

Any insight in to the problem would greatly be appreciated.

--luis

2 (edited by Squonk 2012-10-17 07:01:05)

Re: ar9331's usb stability issue - [SOLVED]

I don't really bite at a board implementation problem, at least for the TL-WR703N router, which experiences this kind of problem: see my reverse-engineered schematic and layout.

Concerning the USB physical implementation, it only consists in a 0 ohm series resistor (R101/R102), a footprint for an unmounted decoupling capacitor (R103/R104)and a dual-rail clamping diode (D1/D2) for each data line. As for the USB power, it comes straight from an unidentified USB power switch (U6) with current-limiting capabilities, and looks properly decoupled with 2 capacitors (C113/C115) placed close to the USB connector itself. Coupling between shield and ground is also pretty standard (C114). Traces seem to be properly routed, respecting a differential routing with matched lengths and impedances.

Another problem that may be related to this one is the inability to use USB low-speed devices, such as a low-end keyboard or mouse: the device does not even enumerate, showing repeated messages in the logs. If you connect it through a hub, you don't get the problem. However, I am not sure that the AR9331 USB hardware IP supports low-speed devices at all sad

I guess that the only way to continue investigating is to stick a USB analyzer in between and see what is going on.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

it would be nice to get a definitive answer on this. 

another interesting fact is that the only issue we have seen is with the full speed USB serial devices.    Testing with a full speed USB sound card works fine.  We can record and play audio at 48Khz all day long with a number of different CM119 based devices without issue.

I am not an USB expert but understand that there are several different types of endpoints.  It could be that only some of the USB 1.1 (2.0 full-speed) end points don't work with the 9331 chipset.  This would explain why USB sound works but not USB serial ports.

It would be nice to know if anyone out there is using 9331 based systems where USB 2.0/full-speed are working.   I know that many of the TL routers advertise themselves as 3G routers.  Most of the 3G modems use CDC/ACM interfaces.  It would be good to know if users with 9331 based TL routers are able to use 3G modems while running OpenWRT AA.

Although I agree that it is unlikely that the issue is with the board implementation there could still be a problem with the chipset implementation.  I am hoping that the problem is with the echi USB driver but not sure if it is at this time.

--luis

4 (edited by Squonk 2012-10-29 08:47:37)

Re: ar9331's usb stability issue - [SOLVED]

There is definitely a bug in the hardware USB block.

I ran a simple test with an Arduino Duemilanove or a Vinciduino running bitlash with the following simple bitlash script on the Arduino:

function toggle13 {d13 = !d13;print i;i = i + 1;}
function startup {pinmode(13,1); run toggle13,1000;}

...that is, blink the LED every second while incrementing and printing a monotonic counter at the same rate on the UART via USB.

On the TP-LINK TL-WR703N side, I just have the driver module corresponding to the facing USB chip in the Arduino (ftdi_sio + usbserial as /dev/ttyUSB0 for the Duemilanove, cdc_acm as /dev/ttyACM0 for the Vinciduino), and the simple BusyBox's "microcom" terminal emulator ("microcom -s 57600 /dev/ttyUSB0" for the Duemilanove, "microcom -s 57600 /dev/ttyACM0" for the Vinciduino).

The communication hangs after a variable amount of time (ranging from a few minutes up to an hour), depicted by the Arduino's TX LED staying permanently on. The Arduino is still running however, as the d13 Led continues to flash steadily every second.

No matter if you exit from microcom, from your session, or remove the driver modules and reinsert it, the communication with the Arduino board does not work anymore. The only way to restore things back to a working condition is to toggle the Arduino power. If they are powered from the USB, it is achieved by issuing the commands:

echo 0 > /sys/classes/gpio/gpio8/value
echo 1 > /sys/classes/gpio/gpio8/value

I found out that if you insert a passive USB hub like the one described in this topic, you can run the exact same test for more than 10 hours without a glitch... I can even run BOTH Arduino boards at the same time without problem!

From this, we can draw the following conclusions:

  • the Arduino is working like a Swiss cuckoo clock smile

  • this is not a power supply problem, as I was able to run the USB passive hub + the 2 Arduino boards at the same time

  • this is not an applicative software problem on the TP-LINK TL-WR703N side, as the software is exactly the same in both cases

  • if this is a software problem, it is a very low-level one related to different timing/protocol differences between the direct/through hub configurations, but not in the USB driver itself, as it does the same bug in both the ftdi_sio + usbserial and the cdc_acm drivers, so it is in the very low-level interrupt handling/register access/DMA, etc.

  • more probably, this is a hardware USB block bug sad

This last conclusion only confirms that the USB block within the AR9331 is not very good, as I already found out that the AR9331 is not capable of handling USB low-speed devices like a simple keyboard or mouse directly, but it works ok through a passive hub.

The solution: ALWAYS insert a passive hub, like the "Octopus", which includes a GL850G USB hub chip in SSOP28 package. Beware that there are some items looking exactly the same externally, but that contain a chip die directly bonded to the PCB and blobbed into silicon instead of the SSOP28 package. It is not important if you just use the hub as is, but it is a pain if you want to unsolder the chip for a hack... It is about the same price you pay for the chip alone, but you get extra USB cable + hanging plugs, a reusable plastic case, and a PCB with a crystal smile

The fact that USB sound works and not USB serial ports is not related to different endpoints, but it may be related to the USB transaction type: for serial communications, error-free "bulk" transactions are used with error detection and retry as reliable transmission is required, while sound (or video) data are using "isochronous" transactions which do not guarantee error-free transmission, as usually this kind of data can suffer some glitches without being noticed.

It is just a guess, but given the random nature of the bug, it looks like there is a problem when a transmission error occurs in the retry mechanism, only present in the "bulk" transactions. This may only be confirmed by inserting TRANSPARENTLY an USB protocol analyzer in-between an Arduino board and the TP-LINK TL-WR703N router.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

I am currently working with boards from TP-LINK and Alfa.  The TP-LINK is a MR3040 and the alfa boards are Hornet-UB both based on the AR9331 chipset.   I just refreshed the OpenWRT AA sources and currently running r3397. The same image has been installed the same image on all boards routers.

What is interesting is that the CDC/ACM and USB to serial adapters (PL2303hx based) worked fine for the MR3040.  On the Alfa boards I have one that consistently fails for all the devices tested but works fine if the devices are plugged into a USB 2.0 hub.  The other Alfa board works fine. 

So... i am now wondering if the issue has been addressed in newer releases of the Athrus chipsets and the problem is hit or miss based on the age of the board.  However, the processor on both alfa boards is identical... The chip is identified as

Alpha-Hornet-UB
AR9331-AL1A
PFH104.006B
1153
Taiwan

so i wonder if its an issue with the supporting circuitry? however, I can't see any discernible differences between the boards.   and both board part and rev levels are identical.  As far as I can tell both boards are identical. I have sent the serial and batch numbers to Alfa and currently waiting to hear from them. 

Interestingly the TP-LINK processor is made in Korea while the one used in the Hornet is manufactured in Taiwan.
AR9331-AL1A
PFU609.001C
1205
Korea

Bottom line... i have one alfa board that fails consistently and only works if USB 1.1 devices are plugged into a HUB.  I have a second alfa board that works fine with USB 1.1 devices.  I also have a TPL-MR3040 that works fine with USB 1.1 devices.

Any thoughts are greatly appreciated.

--luis

Re: ar9331's usb stability issue - [SOLVED]

I tested with a TP-LINK TL-WR703N Rev 1.3 (PCB Rev 1.1, but there is only one revision anyway, except for a few 1.0 PCB boards that are hard to find).

My AR9331 chip is identified as:
AR9331-AL1A
PBA262.003B
1126
Taiwan

So it is even older than your Alpha-Hornet-UB ones.

I have a second TP-LINK TL-WR703N Rev 1.6 (PCB Rev 1.1 too), marked as:
AR9331-AL1A
PHP569.003C
1221
Korea

So Let me perform the same tests with this one and I will let you know about the results.

But you are right, Atheros switched their manufacturing from Taiwan to Korea between week 53 of 2011 and week 5 of 2012. Probably from TSMC to Samsung smile

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

Do any of you think that this is the problem I am having with my router droping internet Topic here https://forum.openwrt.org/viewtopic.php?id=40178

Re: ar9331's usb stability issue - [SOLVED]

TheRaster wrote:

Do any of you think that this is the problem I am having with my router droping internet Topic here https://forum.openwrt.org/viewtopic.php?id=40178

Possibly...

Can you tell us the marking on the AR9331 main chip (of course, you will have to open the case for that...)?

It may also be interesting to perform the same test using the original firmware on the same device.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

ORG FW works fine on all workings. I changed to wrt because the org fw taken the piss to find the modem and connect to it. on a reboot it taken around 4-6 mins to connect to the net. With wrt its up as soon as the router boots.
http://img441.imageshack.us/img441/3808/dsc0033wp.th.jpg
http://img43.imageshack.us/img43/3788/dsc0034lm.th.jpg

Not shore its the same chip as in the one you have but seems to be the same problem.

Sorry for crappy images shakey hands lmao big_smile

Router is TP-MR3420

Hope this helps.

10 (edited by Squonk 2012-10-29 17:27:00)

Re: ar9331's usb stability issue - [SOLVED]

Ok, thanks!

Unfortunately, the chip we are interested in is below the big heatsink sad

But it is good to know that the original firmware worked correctly: they certainly applied a software workaround that is not in the open-source kernel...

At least, there is still a hope to correct this bug by software!

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

I have been wondering if it is a power problem with the USB. Do you know of any good powered usb hubs that are supported by wrt? I wish to get one any how for other things. But worth a test.

Re: ar9331's usb stability issue - [SOLVED]

TheRaster wrote:

I have been wondering if it is a power problem with the USB. Do you know of any good powered usb hubs that are supported by wrt? I wish to get one any how for other things. But worth a test.

It is not a power problem, see my post above: it works with a passive USB hub and 2 Arduino boards, whereas it hangs with a single Arduino board.

I tried with my second TP-LINK TL-WR703N Rev 1.6 (PCB Rev 1.1) without passive USB hub: it crashed after 1h32, so the bug is still present in chips manufactured in Korea as late as end of May 2012 sad

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

sad Shame it was not so simple..... What hub do you use? I am wanting to get one to mount a HD and a 2nd dongle see if I can get faster net big_smile

Re: ar9331's usb stability issue - [SOLVED]

Seems following Squonk's advice works: router--passive hub--driven hub. YMMV

Re: ar9331's usb stability issue - [SOLVED]

TheRaster wrote:

sad Shame it was not so simple..... What hub do you use? I am wanting to get one to mount a HD and a 2nd dongle see if I can get faster net big_smile

Please check the links I provide in my post above smile

I use the "Octopus", that can be mounted into the WR703N case like described in this topic: https://forum.openwrt.org/viewtopic.php?id=34188.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

16 (edited by lsoltero 2012-10-29 23:50:52)

Re: ar9331's usb stability issue - [SOLVED]

Here is more info...

i went through our pile of brand new Alfa AP121 and programmed and tested every single one of them. One unit is the orinal unit that i got earlier and fails. the other 7 are brand new from the same batch received 5 days ago.

4 work perfectly with USB 1.1 devices plugged into the native port
4 fail immediately when USB 1.1 devices are plugged into the native port.

all work perfectly when using a USB 2 HUB.

all units were teste back to back using the same power supply, same USB 1.1 modem, same firmware, same computer.   BTW... i am using a 30Amp power supply running at 13.8 volts... so i know its not a power issue.

All the AP121U were manufactured in Taiwan.  Looking at the boards all the part numbers and batch numbers are the same.  I can't physically distinguish between one and the other.  They are all identical as far as I can tell.

I have ordered a few MR3040 to see if I can get any of these routers to fail.

One thing I did notice is that the TP-LINK has an AR9331 manufactured in Korea.  TP-Link sells the mr3040 as a mobile 3G routers supporting manu CDC/ACM devices... So... either they have a software patch that makes the AR9331 work correctly OR the Korean made AR9331 chips work properly OR there is some hardware patch that they have figured out to make the issue work.

more soon...

--luis

17 (edited by Squonk 2012-10-30 00:01:39)

Re: ar9331's usb stability issue - [SOLVED]

Thanks Luis for performing all these tests!

No: I tested one WR703N router with a new Korean chip, and it failed the same as the older Taiwanese one.

And hardware-wise, the USB goes straight to the connector through 0 ohm series resistors, with unpopulated resistors to ground and dual clamping diodes, all using impedance-controlled traces, so nothing to expect on this side, see my reverse engineering of the TL-WR703N:
http://squonk42.github.com/TL-WR703N/

So this must be a hardware bug inside the AR9331, that is corrected by a workaround patch not available in the open source Linux kernel.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

18 (edited by Squonk 2012-10-30 00:09:54)

Re: ar9331's usb stability issue - [SOLVED]

Actually, you can see in the kernel source tree if you try to build your own OpenWrt distro (build_dir/linux-ar71xx_generic/linux-3.3.8/arch/mips/ath79/dev-usb.c), that there are indeed some workarounds for some Atheros chips USB bugs:

For example in ath79_usb_setup():

    /* Turning on the Buff and Desc swap bits */
    __raw_writel(0xf0000, usb_ctrl_base + AR71XX_USB_CTRL_REG_CONFIG);

    /* WAR for HW bug. Here it adjusts the duration between two SOFS */
    __raw_writel(0x20c00, usb_ctrl_base + AR71XX_USB_CTRL_REG_FLADJ);

... But nothing for the AR933x chip sad

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

Well the fact that there are bug works arounds for other Athrus based processors gives me hope that a software work around might be possible.  Has any one examined the TP-LINK source code carefully?  its available here
http://www.tp-link.com/en/support/gpl/?categoryid=547

It could be that the required patch is buried in the the kernel source tree.

what is killing me here is that for our application routers that fail fail immediately while routers that work seem to work fine. 

--luis

20 (edited by Squonk 2012-10-30 09:44:19)

Re: ar9331's usb stability issue - [SOLVED]

Let me check the kernel sources from TP-LINK and possibly from other vendors using the AR9331.

EDIT: I checked the kernel sources provided by TP-LINK from the link above and also from the Fritzbox 7270, but there is nothing regarding USB.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

I managed to compile "usbmon" on the TL-WR703N in order to track the bug.

The format for the usbmon output is documented here: http://www.kernel.org/doc/Documentation/usb/usbmon.txt

The full compressed log is here: http://dl.free.fr/iokffINVE

But here are the useful chunks excerpt from it:

### echo 1 > /sys/classes/gpio/gpio8/value

80dac680 2948836581 C Ii:1:001:1 0:2048 1 = 02
80dac680 2948836624 S Ii:1:001:1 -150:2048 4 <
819cc300 2948836817 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819cc300 2948836830 C Ci:1:001:0 0 4 = 01010100
819cc300 2948836840 S Co:1:001:0 s 23 01 0010 0001 0000 0
819cc300 2948836844 C Co:1:001:0 0 0
819cc300 2948836851 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819cc300 2948836854 C Ci:1:001:0 0 4 = 01010000
819cc300 2948873603 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819cc300 2948873626 C Ci:1:001:0 0 4 = 01010000
819cc300 2948913601 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819cc300 2948913624 C Ci:1:001:0 0 4 = 01010000
819cc300 2948953620 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819cc300 2948953644 C Ci:1:001:0 0 4 = 01010000
819cc300 2948993601 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819cc300 2948993624 C Ci:1:001:0 0 4 = 01010000
819ccd00 2948993678 S Co:1:001:0 s 23 03 0004 0001 0000 0
819ccd00 2948993682 C Co:1:001:0 0 0
80dac680 2949048700 C Ii:1:001:1 0:2048 1 = 02
80dac680 2949048719 S Ii:1:001:1 -150:2048 4 <
819ccd00 2949053680 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819ccd00 2949053703 C Ci:1:001:0 0 4 = 03010000
819ccd00 2949113600 S Co:1:001:0 s 23 01 0014 0001 0000 0
819ccd00 2949113620 C Co:1:001:0 0 0
819cca00 2949119182 S Ci:1:000:0 s 80 06 0100 0000 0040 64 <
819cca00 2949120705 C Ci:1:000:0 0 8 = 12010002 00000008
819ccd00 2949120826 S Co:1:001:0 s 23 03 0004 0001 0000 0
819ccd00 2949120836 C Co:1:001:0 0 0
819ccd00 2949173612 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
819ccd00 2949173647 C Ci:1:001:0 0 4 = 03011000
819ccd00 2949233629 S Co:1:001:0 s 23 01 0014 0001 0000 0
819ccd00 2949233649 C Co:1:001:0 0 0
819ccd00 2949233663 S Co:1:000:0 s 00 05 0006 0000 0000 0
819ccd00 2949235703 C Co:1:000:0 0 0
819ccd00 2949263609 S Ci:1:006:0 s 80 06 0100 0000 0012 18 <
819ccd00 2949264705 C Ci:1:006:0 0 18 = 12010002 00000008 03040160 00060102 0301
819ccd00 2949264854 S Ci:1:006:0 s 80 06 0600 0000 000a 10 <
819ccd00 2949265707 C Ci:1:006:0 -32 0
819ccd00 2949265965 S Ci:1:006:0 s 80 06 0600 0000 000a 10 <
819ccd00 2949266710 C Ci:1:006:0 -32 0
819ccd00 2949267146 S Ci:1:006:0 s 80 06 0600 0000 000a 10 <
819ccd00 2949267709 C Ci:1:006:0 -32 0
81b1a980 2949268024 S Ci:1:006:0 s 80 06 0200 0000 0009 9 <
81b1a980 2949268708 C Ci:1:006:0 0 9 = 09022000 010100a0 2d
81aab200 2949268871 S Ci:1:006:0 s 80 06 0200 0000 0020 32 <
81aab200 2949269703 C Ci:1:006:0 0 32 = 09022000 010100a0 2d090400 0002ffff ff020705 81024000 00070502 02400000
819ccd00 2949270335 S Ci:1:006:0 s 80 06 0300 0000 00ff 255 <
819ccd00 2949270708 C Ci:1:006:0 0 4 = 04030904
819ccd00 2949271237 S Ci:1:006:0 s 80 06 0302 0409 00ff 255 <
819ccd00 2949273703 C Ci:1:006:0 0 32 = 20034600 54003200 33003200 52002000 55005300 42002000 55004100 52005400
81aaa900 2949274117 S Ci:1:006:0 s 80 06 0301 0409 00ff 255 <
81aaa900 2949275705 C Ci:1:006:0 0 10 = 0a034600 54004400 4900
81aaad80 2949275845 S Ci:1:006:0 s 80 06 0303 0409 00ff 255 <
81aaad80 2949277705 C Ci:1:006:0 0 18 = 12034100 36003000 30006100 65005800 5000
819ccd00 2949278266 S Co:1:006:0 s 00 09 0001 0000 0000 0
819ccd00 2949278705 C Co:1:006:0 0 0
80febc00 2949280425 S Ci:1:006:0 s 80 06 0302 0409 00ff 255 <
80febc00 2949282710 C Ci:1:006:0 0 32 = 20034600 54003200 33003200 52002000 55005300 42002000 55004100 52005400
81aaad00 2949310171 S Ci:1:006:0 s c0 0a 0000 0000 0001 1 <
81aaad00 2949310706 C Ci:1:006:0 0 1 = 10
81a92980 2949313673 S Co:1:006:0 s 40 09 0001 0000 0000 0
81a92980 2949314707 C Co:1:006:0 0 0
81b10b80 2949322106 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81b10b80 2949322133 C Ci:1:001:0 0 4 = 03010000

...

### microcom -s 57600 /dev/ttyUSB0
### bitlash here! v2.0 RC5pre (c) 2011 Bill Roy -type HELP- 953 bytes free
### > 0
### 1
### 2
### 3
### 4
### 5
### 6
### 7
### 8

81b3db80 2953804840 S Co:1:006:0 s 40 00 0000 0000 0000 0
81b3db80 2953806722 C Co:1:006:0 0 0
81b3db80 2953807442 S Co:1:006:0 s 40 04 0008 0000 0000 0
81b3db80 2953808727 C Co:1:006:0 0 0
81b3db80 2953809570 S Co:1:006:0 s 40 03 0034 0000 0000 0
81b3db80 2953810731 C Co:1:006:0 0 0
81b3db80 2953811799 S Co:1:006:0 s 40 01 0303 0000 0000 0
81b3db80 2953812742 C Co:1:006:0 0 0
81b3db80 2953813773 S Co:1:006:0 s 40 02 0000 0000 0000 0
81b3db80 2953814725 C Co:1:006:0 0 0
80feb100 2953815074 S Bi:1:006:1 -150 512 <
80febb00 2953815111 S Bi:1:006:1 -150 512 <
81b3d980 2953815135 S Co:1:006:0 s 40 01 0303 0000 0000 0
80feb100 2953815725 C Bi:1:006:1 0 2 = 0160
80feb100 2953815745 S Bi:1:006:1 -150 512 <
80febb00 2953815764 C Bi:1:006:1 0 2 = 0160
80febb00 2953815768 S Bi:1:006:1 -150 512 <
81b3d980 2953815780 C Co:1:006:0 0 0
81b3d980 2953816260 S Co:1:006:0 s 40 02 0000 0000 0000 0
80feb100 2953816725 C Bi:1:006:1 0 2 = 0160
80feb100 2953816745 S Bi:1:006:1 -150 512 <
80febb00 2953816767 C Bi:1:006:1 0 2 = 0160
80febb00 2953816771 S Bi:1:006:1 -150 512 <
81b3d980 2953816784 C Co:1:006:0 0 0
80feb100 2953818725 C Bi:1:006:1 0 2 = 0160
80feb100 2953818751 S Bi:1:006:1 -150 512 <
80febb00 2953818776 C Bi:1:006:1 0 2 = 0160
80febb00 2953818779 S Bi:1:006:1 -150 512 <
80feb100 2953820719 C Bi:1:006:1 0 2 = 0160
80feb100 2953820745 S Bi:1:006:1 -150 512 <

...

80feb100 2955266733 C Bi:1:006:1 0 2 = 0160
80feb100 2955266763 S Bi:1:006:1 -150 512 <
80febb00 2955267728 C Bi:1:006:1 0 2 = 0160
80febb00 2955267759 S Bi:1:006:1 -150 512 <
80feb100 2955268731 C Bi:1:006:1 0 5 = 01606269 74
80feb100 2955268772 S Bi:1:006:1 -150 512 <
80febb00 2955269753 C Bi:1:006:1 0 8 = 01606c61 73682068
80febb00 2955269791 S Bi:1:006:1 -150 512 <
80feb100 2955270734 C Bi:1:006:1 0 8 = 01606572 65212076
80feb100 2955270764 S Bi:1:006:1 -150 512 <
80febb00 2955271732 C Bi:1:006:1 0 8 = 0160322e 30205243
80febb00 2955271757 S Bi:1:006:1 -150 512 <
80feb100 2955272728 C Bi:1:006:1 0 7 = 01603570 726520
80feb100 2955272750 S Bi:1:006:1 -150 512 <
80febb00 2955273728 C Bi:1:006:1 0 8 = 01602863 29203230
80febb00 2955273766 S Bi:1:006:1 -150 512 <
80feb100 2955274734 C Bi:1:006:1 0 8 = 01603131 2042696c
80feb100 2955274766 S Bi:1:006:1 -150 512 <
80febb00 2955275734 C Bi:1:006:1 0 8 = 01606c20 526f7920
80febb00 2955275775 S Bi:1:006:1 -150 512 <
80feb100 2955276734 C Bi:1:006:1 0 8 = 01602d74 79706520
80feb100 2955276775 S Bi:1:006:1 -150 512 <
80febb00 2955277734 C Bi:1:006:1 0 8 = 01604845 4c502d20
80febb00 2955277775 S Bi:1:006:1 -150 512 <
80feb100 2955278731 C Bi:1:006:1 0 8 = 01603935 33206279
80feb100 2955278764 S Bi:1:006:1 -150 512 <
80febb00 2955279728 C Bi:1:006:1 0 8 = 01607465 73206672
80febb00 2955279769 S Bi:1:006:1 -150 512 <
80feb100 2955280733 C Bi:1:006:1 0 7 = 01606565 0d0a3e
80feb100 2955280769 S Bi:1:006:1 -150 512 <
80febb00 2955281734 C Bi:1:006:1 0 3 = 016020
80febb00 2955281765 S Bi:1:006:1 -150 512 <
80feb100 2955282734 C Bi:1:006:1 0 2 = 0160
80feb100 2955282766 S Bi:1:006:1 -150 512 <
80febb00 2955283726 C Bi:1:006:1 0 2 = 0160
80febb00 2955283749 S Bi:1:006:1 -150 512 <
80feb100 2955284730 C Bi:1:006:1 0 2 = 0160
80feb100 2955284760 S Bi:1:006:1 -150 512 <

...

80febb00 2964611786 C Bi:1:006:1 0 2 = 0160
80febb00 2964611816 S Bi:1:006:1 -150 512 <
80feb100 2964612787 C Bi:1:006:1 0 2 = 0160
80feb100 2964612816 S Bi:1:006:1 -150 512 <
80febb00 2964613781 C Bi:1:006:1 0 2 = 0160
80febb00 2964613810 S Bi:1:006:1 -150 512 <
80feb100 2964614787 C Bi:1:006:1 0 2 = 0160
80feb100 2964614815 S Bi:1:006:1 -150 512 <
80febb00 2964615784 C Bi:1:006:1 0 2 = 0160
80febb00 2964615811 S Bi:1:006:1 -150 512 <
80feb100 2964616785 C Bi:1:006:1 0 2 = 0160
80feb100 2964616815 S Bi:1:006:1 -150 512 <

### CRASH

### echo 0 >/sys/classes/gpio/gpio8/value

80febb00 2987549051 C Bi:1:006:1 -79 0
80dac680 2987549075 C Ii:1:001:1 0:2048 1 = 02
80dac680 2987549088 S Ii:1:001:1 -150:2048 4 <
81b3d980 2987549253 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81b3d980 2987549266 C Ci:1:001:0 0 4 = 00010100
81b3d980 2987549276 S Co:1:001:0 s 23 01 0010 0001 0000 0
81b3d980 2987549280 C Co:1:001:0 0 0
80feb100 2987553001 C Bi:1:006:1 -143 0
81b3de80 2987566352 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81b3de80 2987566366 C Ci:1:001:0 0 4 = 00010000
81aaad80 2987603613 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81aaad80 2987603634 C Ci:1:001:0 0 4 = 00010000
81aaad80 2987643602 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81aaad80 2987643625 C Ci:1:001:0 0 4 = 00010000
81b3d000 2987683698 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81b3d000 2987683721 C Ci:1:001:0 0 4 = 00010000
81aab200 2987723605 S Ci:1:001:0 s a3 00 0000 0001 0004 4 <
81aab200 2987723627 C Ci:1:001:0 0 4 = 00010000

Does anybody have a clue regarding the "-79" status value on the callback line after the crash?

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

hello all,

here is more info... so... i acquired two brand new tp-link mr3040 for a total of 3 now.  I installed AA on the new units (same build as previous) and all 3 routers work perfectly with our USB 1.1 devices.  So.. we are currently shooting 100% with this routers although admittedly we are woking with a small pool.

Could it be that the korenas fixed their version of the AR9331 and you are using an older chipset?  Can you publish the chip info here? here is what we have

AR9931-AL1A
PFU609.001C
1205
Korea

look forward to your thoughts

--luis

Re: ar9331's usb stability issue - [SOLVED]

TP-LINK TL-WR703N Rev 1.3 (PCB Rev 1.1)I with a first My AR9331 chip identified as:
AR9331-AL1A
PBA262.003B
1126
Taiwan

I have a second TP-LINK TL-WR703N Rev 1.6 (PCB Rev 1.1 too), marked as:
AR9331-AL1A
PHP569.003C
1221
Korea

Both units are having the same problem, installed with the latest AA beta2 branch from svn.

2x TP-Link TL-MR3020, 2x TP-Link TL-WR703N, 1x TP-Link TL-MR11U, 1x Hame MPR-L8

Re: ar9331's usb stability issue - [SOLVED]

This problem was first described as a power lack of the USB,It seems you hit the point.Really hope to solve it:)

Re: ar9331's usb stability issue - [SOLVED]

sigh....

well so much for that theory.   Although I am not sure exactly what the markings on the chipset mean it seems that one of your chips might be older while the other newer than the ones I have in the MR3040.   I have two more mr3040 on order and will report my findings here. 

--luis