OpenWrt Forum Archive

Topic: Severe usb-ohci error

The content of this topic has been archived on 22 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Hi Guys,

I'm trying to make use of a USB to serial adapter PL2303 on an Asus WL-520GU. (BCM5354) I checked out yesterday's trunk build, and it appeared to have the patches applied from bug 3803. When the router boots it sees the serial adapter and and assigns it to /dev/usb/tts/0. Things start off fine until I blast it with some traffic to test stability. After 5-10 minutes I receive the following message:


usb-ohci.c: OHCI Unrecoverable Error, controller usb-00:03.0 disabled
__alloc_pages: 0-order allocation failed (gfp=0x20/0)
__alloc_pages: 0-order allocation failed (gfp=0x20/0)
__alloc_pages: 0-order allocation failed (gfp=0x20/0)
__alloc_pages: 0-order allocation failed (gfp=0x20/0)


When this happens I lose the serial port and all network connectivity, luckily I have a console connected to /dev/tts/0 so that I am able to record the error message. A reboot is required to get things working again. Please let me know if you have any trouble reproducing this one, I can reliably reproduce it on my end.

Also, I have noticed that this problem is causing some sort of memory leak. The system memory is slowly depleted from boot up until the problem occurs. Typing 'free' reveals that the system has gone from having 3mb of free ram to approximately 1mb of free ram. The only app I am running is ser2net, and my test script simply sends data via telnet to the input port which is forwarded to /dev/usb/tts/0. Memory is not reallocated when ser2net is killed.

Comments or suggestions welcome, attached is the log output after a fresh reboot. Any help would be greatly appreciated.

Thanks,
Shawn






CFE version 1.0.37 for BCM947XX (32bit,SP,LE)
Build Date: Mon Apr 16 14:41:05 CST 2007 (root@localhost.localdomain)
Copyright (C) 2000,2001,2002,2003 Broadcom Corporation.

Initializing Arena
Initializing Devices.
Boot partition size = 131072(0x20000)
et0: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 4.130.23.0
CPU type 0x29029: 240MHz
Total memory: 16384 KBytes

Total memory used by CFE:  0x80400000 - 0x8049A790 (632720)
Initialized Data:          0x80430720 - 0x80432D10 (9712)
BSS Area:                  0x80432D10 - 0x80434790 (6784)
Local Heap:                0x80434790 - 0x80498790 (409600)
Stack Area:                0x80498790 - 0x8049A790 (8192)
Text (code) segment:       0x80400000 - 0x80430720 (198432)
Boot area (physical):      0x0049B000 - 0x004DB000
Relocation Factor:         I:00000000 - D:00000000

Null Rescue Flag.
Loader:raw Filesys:tftp Dev:(null) File:: Options:(null)
Loading: TFTP Server.
Failed.
Could not load :: Error
Loader:raw Filesys:raw Dev:flash0.os File: Options:(null)
Loading: .. 3760 bytes read
Entry at 0x80001000
Starting program at 0x80001000
CPU revision is: 00029029
Primary instruction cache 16kB, physically tagged, 4-way, linesize 16 bytes.
Primary data cache 16kB, 2-way, linesize 16 bytes.
Linux version 2.4.35.4 (shawn@localhost.localdomain) (gcc version 3.4.6 (OpenWrt-2.0)) #43 Thu Sep 18 12:10:52 EDT 2008
Setting the PFC to its default value
Determined physical RAM map:
memory: 01000000 @ 00000000 (usable)
On node 0 totalpages: 4096
zone(0): 4096 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Kernel command line: root=/dev/mtdblock2 rootfstype=squashfs,jffs2 init=/etc/preinit noinitrd console=ttyS0,115200
CPU: BCM5354 rev 2 at 240 MHz
Using 120.000 MHz high precision timer.
Calibrating delay loop... 238.38 BogoMIPS
Memory: 14208k/16384k available (1474k kernel code, 2176k reserved, 100k data, 88k init, 0k highmem)
Dentry cache hash table entries: 2048 (order: 2, 16384 bytes)
Inode cache hash table entries: 1024 (order: 1, 8192 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 4096 (order: 2, 16384 bytes)
Checking for 'wait' instruction...  unavailable.
POSIX conformance testing by UNIFIX
PCI: no core
PCI: Fixing up bus 0
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
Registering mini_fo version $Id$
devfs: v1.12c (20020818) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
JFFS2 version 2.1. (C) 2001 Red Hat, Inc., designed by Axis Communications AB.
squashfs: version 3.0 (2006/03/15) Phillip Lougher
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0xb8000300 (irq = 3) is a 16550A
ttyS01 at 0xb8000400 (irq = 3) is a 16550A
b44.c:v0.93 (Mar, 2004)
eth0: Broadcom 47xx 10/100BaseT Ethernet 00:1f:c6:44:36:7e
Amd/Fujitsu Extended Query Table v1.1 at 0x0040
number of CFI chips: 1
cfi_cmdset_0002: Disabling fast programming due to code brokenness.
Flash device: 0x400000 at 0x1c000000
bootloader size: 131072
Physically mapped flash: Filesystem type: squashfs, size=0x1bd134
Creating 5 MTD partitions on "Physically mapped flash":
0x00000000-0x00020000 : "cfe"
0x00020000-0x003f0000 : "linux"
0x0009ec00-0x00260000 : "rootfs"
mtd: partition "rootfs" doesn't start on an erase block boundary -- force read-only
0x003f0000-0x00400000 : "nvram"
0x00260000-0x003f0000 : "rootfs_data"
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 1024 bind 2048)
ip_conntrack version 2.1 (5953 buckets, 5953 max) - 344 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
NET4: Ethernet Bridge 008 for NET4.0
802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
All bugs added by David S. Miller <davem@redhat.com>
VFS: Mounted root (squashfs filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 88k freed
Algorithmics/MIPS FPU Emulator v1.5
mount: mounting sysfs on /sys failed: No such device
mount: mounting devfs on /dev failed: Device or resource busy
- preinit -
Press CTRL-C for failsafe
diag: Detected 'ASUS WL-520gU'
b44: eth0: Link is up at 100 Mbps, full duplex.
b44: eth0: Flow control is off for TX and off for RX.
roboswitch: Probing device eth0: found!
switch-robo             5084   0 (unused)
switching to jffs2
mini_fo: using base directory: /
mini_fo: using storage directory: /jffs
- init -

Please press Enter to activate this console. jffs2.bbc: SIZE compression mode activated.
b44: eth0: Link is up at 100 Mbps, full duplex.
b44: eth0: Flow control is off for TX and off for RX.
BFL_ENETADM not set in boardflags. Use force=1 to ignore.
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
device eth0.0 entered promiscuous mode
eth0.0: dev_set_promiscuity(master, 1)
device eth0 entered promiscuous mode
br-lan: port 1(eth0.0) entering learning state
br-lan: port 1(eth0.0) entering forwarding state
br-lan: topology change detected, propagating
PCI/DMA
wl0: wlc_attach: chiprev 2 coreunit 0 corerev 13 cccap 0x104007ea maccap 0x30482205 band 2.4G, phy_type 5 phy_rev 0 ana_rev 6
wl0: Broadcom BCM4318 802.11 Wireless Controller 4.150.10.5
CSLIP: code copyright 1989 Regents of the University of California
PPP generic driver version 2.4.2
ipt_time loading
lp: driver loaded but no devices found
Software Watchdog Timer: 0.05, timer margin: 60 sec
SB USB20H init
SB COREREV: 2
SB USB20H resetting
USB20H fcr: 0x64
USB20H shim cr: 0x8f7
usb-ohci.c: USB OHCI at membase 0xb8003000, IRQ 6
usb-ohci.c: usb-00:03.0, PCI device 14e4:471a
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb.c: registered new driver serial
usbserial.c: USB Serial support registered for Generic
usbserial.c: USB Serial Driver core v1.4
usbserial.c: USB Serial support registered for PL-2303
pl2303.c: Prolific PL2303 USB to serial adaptor driver v0.10.1
hub.c: new USB device 00:03.0-1, assigned address 2
usbserial.c: PL-2303 converter detected
usbserial.c: PL-2303 converter now attached to ttyUSB0 (or usb/tts/0 for devfs)



BusyBox v1.11.1 (2008-09-18 01:26:33 EDT) built-in shell (ash)
Enter 'help' for a list of built-in commands.

  _______                     ________        __
|       |.-----.-----.-----.|  |  |  |.----.|  |_
|   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
|_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
KAMIKAZE (bleeding edge, r12620) -------------------
  * 10 oz Vodka       Shake well with ice and strain
  * 10 oz Triple sec  mixture into 10 shot glasses.
  * 10 oz lime juice  Salute!
---------------------------------------------------
root@OpenWrt:/#

(Last edited by shawnperkins on 19 Sep 2008, 15:51)

Any luck with this?  I know a lot of others have had issues with USB on this box.

I'm trying to decide if the 520 would make a good streaming audio box but not if the USB support is wonky.

No luck so far. It looks like things are close, perhaps it is something simple, I"m really not sure. I would even give a donation of $100.00 via Paypal to anyone who could provide a patch for this one if it's fixed within a week or so.

If the follow lines are added to the pcibios_enable_device function (pcibios.c) , it probably works better in the wl520gu (Sbrown's hint)....

                       tmp = 0x00fe00fe;
                        writel(tmp, (ulong)regs + 0x894);
                        tmp = readl((uintptr)regs + 0x894);
                        printk("USB20H syn01 register : 0x%x\n", tmp);

                        tmp = 0x1;
                        writel(tmp, (ulong)regs + 0x89c);
                        tmp = readl((uintptr)regs + 0x89c);
                        printk("USB20H syn03 register : 0x%x\n", tmp);

But, in the same way...

usb-ohci.c: OHCI Unrecoverable Error, controller usb-00:03.0 disabled
usbserial.c: generic_write - port 0: failed submitting write urb (-143)
usbserial.c: generic_write - port 0: failed submitting write urb (-143)
usbserial.c: generic_write - port 0: failed submitting write urb (-143)
usbserial.c: generic_write - port 0: failed submitting write urb (-143)
.
.
.

Hi Guys,

First I want to thank you for getting back to me so quickly on this, I appreciate it.

I tried sbrowns hint with not very much success. I added the code to pcibios.c see below. (lines 343-351...I think this is the correct placement)

I also tried replacing lines 328-340 with lines 343-351 and it crashed a lot sooner when I did this.

This time I have more debug information....when the problem occurred, the information below was displayed on the local console.

Thanks again for the assistance on this one.
Shawn





usb-ohci.c: OHCI Unrecoverable Error, controller usb-00:03.0 disabled
pl2303.c: pl2303_write - failed submitting write urb, error -143
pl2303.c: pl2303_open - failed submitting read urb, error -143
usbserial.c: __serial_close - port 0: not open
pl2303.c: pl2303_open - failed submitting read urb, error -143
usbserial.c: __serial_close - port 0: not open
pl2303.c: pl2303_open - failed submitting read urb, error -143
Unable to handle kernel paging request at virtual address 00000024, epc == c01d2                                                          374, ra == c01d2300
Oops in fault.c::do_page_fault, line 206:
$0 : 00000000 1000fc00 00000000 18e4bbe9 8082b998 8094dd10 8094dd18 ffff00ff
$8 : 8094dfe0 0000fc00 00000000 8018b000 801ad2a4 03ffffff 801ad4a8 000fffff
$16: 80779020 80779080 8082b000 80779000 80788b40 c01d0000 00000000 80788b40
$24: 801ad6ac 8001bd0c                   8094c000 8094dd38 00409fb4 c01d2300
Hi : 00000000
Lo : 00000006
epc   : c01d2374    Tainted: P
Status: 1000fc03
Cause : 00000008
PrId  : 00029029
Process ser2net (pid: 650, stackpage=8094c000)
Stack:    8094a660 800985fc ffffff71 8023f0c8 8082b000 00000000 80788b40
00000000 8094a660 801b0730 800988f8 8009873c 00000000 80030664 00001000
8094dda8 8082b000 8082b000 c01d4b28 8082b000 80a78740 00000000 80779000
80779020 80779000 8082b000 00000001 00000000 c01d0000 80788b40 80779000
c01d20b8 00000000 00000000 00000000 1000fc00 ffffffb9 00000800 80788b40
00000882 ...
Call Trace:   [<800985fc>] [<800988f8>] [<8009873c>] [<80030664>] [<c01d4b28>]
[<c01d20b8>] [<8009a788>] [<8009a6f8>] [<80043524>] [<800442f8>] [<80043c60>]
[<800cc794>] [<8006bb4c>] [<8004478c>] [<80044774>] [<80037120>] [<80037280>]
[<8004a698>] [<800c4eec>] [<80037518>] [<80008a60>] [<80008bc0>] [<8005bb0c>]

Code: ae020058  8e020004  8c420008 <8c420024> 10400005  02802821  0040f809  0200                                                          2021  080748f6
root@OpenWrt:/#






pcibios.c


324                 /* PRxxxx: War for 5354 failures. */
325                 if (corerev == 1 || corerev == 2) {
326                         uint32 tmp;
327
328                         /* Change Flush control reg */
329                         tmp = readl((uintptr)regs + 0x400);
330                         tmp &= ~8;
331                         writel(tmp, (uintptr)regs + 0x400);
332                         tmp = readl((uintptr)regs + 0x400);
333                         printk(KERN_INFO "USB20H fcr: 0x%x\n", tmp);
334
335                         /* Change Shim control reg */
336                         tmp = readl((uintptr)regs + 0x304);
337                         tmp &= ~0x100;
338                         writel(tmp, (uintptr)regs + 0x304);
339                         tmp = readl((uintptr)regs + 0x304);
340                         printk(KERN_INFO "USB20H shim cr: 0x%x\n", tmp);
341
342
343                         tmp = 0x00fe00fe;
344                         writel(tmp, (ulong)regs + 0x894);
345                         tmp = readl((uintptr)regs + 0x894);
346                         printk("USB20H syn01 register : 0x%x\n", tmp);
347
348                         tmp = 0x1;
349                         writel(tmp, (ulong)regs + 0x89c);
350                         tmp = readl((uintptr)regs + 0x89c);
351                         printk("USB20H syn03 register : 0x%x\n", tmp);

Dear fmay,

if you or sbrown need a wl-520gu, I have an extra one with a direct serial port wired up....say the word and I'll throw it in the mail tomorrow....yours to keep of course smile

I'm not so sure that the problem is with the usb subsystem and not the usbserial and/or pl2303. I've tried several disk drives on my 'gu using both the ohci and ehci drivers and haven't had problems. They have a lot higher data rate than a serial adapter.

I just looked at the history of usbserial and pl2303 in kernel.org's 2.4 git tree (2.4.37-rc1). The version of pl2303 has substantial changes and usbserial has added a few missing lock/unlocks. The comments by the author of the pl2303 patch are really interesting.

I experienced major major data loss on a PL-2303 USB-serial converter
under 2.4.36, which I remedied by back-porting the pl2303.c from the
latest 2.6 kernel tree.

Here's a new patch, which is more complete than my previous one.
It's based on the 2.6.24.1.

There's a lot of trivial white-space changes and some things that have
been moved, which make the patch rather larger than it could be.  I
didn't include those changes before, but have now in order that the
driver be closer to the 2.6 driver.  It'll never be identical, of course.

Note, too, that the 2.6 driver (and thus the patched 2.4) includes a 1k
circular buffer which rather duplicates a buffer in the 2.4 usbserial.c;
2.6's usb-serial has had that buffer removed.  As the buffer resolves
loss of the occasional putchar (e.g. from n_tty's opost), it is
important and correct, even in 2.4.

Speaking as a user, I no longer see any problems with PL2303, and I
think this is okay to release.

Somebody with a pl2303 adapter might want to see if this solves the problem.
Steve

Shawn

try it:

                       tmp = 0x00fe00fe;
                        writel(tmp, (uintptr)regs + 0x894);
                        tmp = readl((uintptr)regs + 0x894);
                        printk("USB20H syn01 register : 0x%x\n", tmp);

                        tmp = readl((uintptr)regs + 0x89c);
                        tmp |= 0x1;
                        writel(tmp, (uintptr)regs + 0x89c);
                        tmp = readl((uintptr)regs + 0x89c);
                        printk("USB20H syn03 register : 0x%x\n", tmp);

let us to know your results.

It doesn't resolve my problem yet, but it does work better with the EHCI.



-------------------------------------------------------------------------------------
usb-ohci.c: OHCI Unrecoverable Error, controller usb-00:03.0 disabled
usb-ohci.c: hcca branch int  0( 0): ed: 11082;
-------------------------------------------------------------------------------------

(Last edited by fmay on 24 Sep 2008, 22:13)

Hi Guys,

I tried applying the patch for pl2303.c and pl2303.h and received a bunch of errors. So I obtained the files with the patch already applied from kernel 2.4.37-rc1 and built a new image...and got the same results.

usb-ohci.c: OHCI Unrecoverable Error, controller usb-00:03.0 disabled

The files modified were located in:

./build_dir/linux-brcm-2.4/linux-2.4.35.4/drivers/usb/serial/


I then tried fmays suggestion, and again got the same results.

usb-ohci.c: OHCI Unrecoverable Error, controller usb-00:03.0 disabled



Just to verify, I'm applying this changes to:

/build_dir/linux-brcm-2.4/linux-2.4.35.4/arch/mips/bcm947xx/pcibios.c

and I'm also deleting the .o files in these directories just in case.

Thanks,
Shawn

I found a Prolific usb dongle (067b/2303) in the junk drawer and ran several experiments comparing the gpv2 and the gu. Both use the same chip, except the gpv2 has a hub. On the gu, if I cat even a small file to the usb dongle, I get the well known unrecoverable usb error. On the gpv2, the usb dongle fails to enumerate (error -32). I'm running the same image on both. Disks enumerate, mount and function on both.

I tried 2.6 (svn 12660) w/ the ehci/ohci patch I posted on the openwrt-dev list last summer with the added initializations from fmay's earlier post. The serial adapter works with both the gpv2 and gu. Disks also work.

The disk controller and the prolific dongle both have two bulk endpoints. The only difference is that dongle also has an interrupt endpoint.

I'm starting to think that this is a bug in either the usb or usb serial code that has been fixed and not back ported to 2.4.

If somebody can suggested a more definitive torture test, I'll try that.

Steve

Hi Steve,

Here are the steps to reproduce:

1. Make sure ser2net is selected when you do a 'make menuconfig' this is located within the 'network' menu

2. load the new image, telnet to the system and replace /etc/ser2net.conf with the following line:

4500:raw:0:/dev/usb/tts/0:19200 NONE 1STOPBIT 8DATABITS

3. Start ser2net from the command line: 'ser2net -u -n -p 4200 &'

4. Create a file called '/jffs/test.sh' and place the following within the file:


#!/bin/sh

while [ 1 ]

do
telnet localhost 4500 </etc/passwd
sleep 1
done

4. chmod 755 /jffs/test.sh

5. Sent data to ser2net (port 4500) by starting the test script './test.sh &'

6. To really beat things up, start another three sessions of test.sh './test.sh &' './test.sh&' './test.sh'

7. To verify that data is actually being sent to ser2net, log on to the monitor port and watch the data come through: 'telnet <router IP> 4200' and from the -> prompt type 'monitor tcp 4500' you will then be able to verify the data that the test scripts are sending

8. While this is happening, you can type 'free' to watch the available memory disappear


Within five to ten minutes or so you should see the crash. If it does not happen type 'killall test.sh' then restart the test scripts, this will give it a good kick

Please note: when the OHCI error occurs, you will lose all network connectivity, the only was to see the errors is with a local console attached to /dev/tts/0


Let me know if you have any problems reproducing.


Also a couple of notes, may or may not be relevant: I was able to get the USB 2.0 storage working perfectly with Kamikaze 2.4 kernel a month or so ago. I beat the heck out of it and it didn't even miss a beat. This however that does not use OHCI if I remember correctly. Also on Whiterussian 0.9 with an Asus 500GPv1, the Pl2303 has been working perfectly for months.


Thanks,
Shawn

Shawn,

Thanks. I'll give it a try and report.

FWIW: If you rmmod ehci-hcd, the disk will use the ohci driver. You can look at /proc/bus/usb/devices to confirm that it enumerates at 12Mhz.

Steve

Hi Steve,

I did a rmmod on my other router running the usb disk....that system is running v12230 with some patches.

cat /proc/bus/usb/devices provides the following information:

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc=  0/900 us ( 0%), #Int=  0, #Iso=  0
D:  Ver= 1.10 Cls=09(hub  ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 0.00
S:  Product=USB OHCI Root Hub
S:  SerialNumber=b8003000
C:* #Ifs= 1 Cfg#= 1 Atr=40 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=255ms


I also tried my test on my other 520gu running build 12230. This system has a solid USB storage system. When running the serial test with OHCI it fails the same way build 12632 does.


Thanks,
Shawn

Paydirt. I get a continuous output of the following sequence.

handlers:                                                                       
[<c00857d8>] (usb_hcd_irq+0x0/0x154 [usbcore])                                  
[<c00857d8>] (usb_hcd_irq+0x0/0x154 [usbcore])                                  
Disabling IRQ #5                                                                
irq 5: nobody cared (try booting with the "irqpoll" option)                     
Call Trace:                                                                     
[<800094c4>] dump_stack+0x8/0x34                                                
[<8005108c>] __report_bad_irq+0x4c/0xac                                         
[<80051354>] note_interrupt+0x268/0x2d4                                         
[<8005184c>] handle_percpu_irq+0x74/0xac                                        
[<80001e10>] plat_irq_dispatch+0x168/0x1d8                                      
[<80002ca4>] ret_from_irq+0x0/0x4                                               
[<8002b454>] __do_softirq+0x4c/0x100                                            
[<8002b564>] do_softirq+0x5c/0x94                                               
[<80001e28>] plat_irq_dispatch+0x180/0x1d8                                      
[<80002ca4>] ret_from_irq+0x0/0x4                                               
[<8006ed3c>] do_mmap_pgoff+0x14/0x334                                           
[<8000834c>] old_mmap+0x9c/0x150                                                
[<8000b4b0>] stack_done+0x20/0x3c

At least this might be easier to track down.

The test ran on the gpv2 for 20min w/o problems.

Next, I put the 4 port hub in my monitor between the dongle and the gu. It's been running for an hour w/o a problem. The free memory is stable at 1960k. It explains why the gpv2 doesn't have the problem.

Steve

Very sensible problem:

/ # cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  14639104 13848576   790528        0  1372160  3702784
Swap:        0        0        0
MemTotal:        14296 kB
MemFree:           772 kB
MemShared:           0 kB
Buffers:          1340 kB
Cached:           3616 kB
SwapCached:          0 kB
Active:           2424 kB
Inactive:         2564 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        14296 kB
LowFree:           772 kB
SwapTotal:           0 kB
SwapFree:            0 kB


After two seconds and before it freeze:

/ # cat /proc/meminfo
        total:    used:    free:  shared: buffers:  cached:
Mem:  14639104 14376960   262144        0  1478656  4091904
Swap:        0        0        0
MemTotal:        14296 kB
MemFree:           256 kB
MemShared:           0 kB
Buffers:          1444 kB
Cached:           3996 kB
SwapCached:          0 kB
Active:           2644 kB
Inactive:         2832 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        14296 kB
LowFree:           256 kB
SwapTotal:           0 kB
SwapFree:            0 kB

(Last edited by fmay on 25 Sep 2008, 13:16)

Here's some more detail on the failure.

So far,

- The UE is claimed to most often be caused a dma failure.

- Both the hub I used as well as the one in the gpv2 has a "transaction translator" that converts the 12meg transaction to a 480meg transaction. Thus, with the hub, the serial dongle is handled by the ehci, not the ohci driver. This is confirmed by the debug output when using either hub. So, the problem seems specific to the ohci part of the core.

- I searched the histories in the kernel.org git repository and cannot find anything that might bare on this problem subsequent to the 2.6.25.6 kernel used by Openwrt. I suppose I could have missed something.

- I doubt I can get much interest on the linux-usb list unless the failure is demonstrated on a current kernel. Updating Openwrt to a 2.6.27-rcx tree is, at least for me, non-trivial.

- Lastly, I don't understand where the irq 5 complaint is coming from. It looks like this is an unhandled usb interrupt. I put a printk in the ohci interrupt handler where it exits with a  IRQ_NOTMINE. It never went off.

Anybody have an idea of where to go from here?

Steve


ohci_hcd ssb0:1: OHCI Unrecoverable Error, disabled
ohci_hcd ssb0:1: OHCI controller state
ohci_hcd ssb0:1: OHCI 1.0, NO legacy support registers
ohci_hcd ssb0:1: control 0x0af HCFS=operational BLE IE PLE CBSR=3
ohci_hcd ssb0:1: cmdstatus 0x00004 SOC=0 BLF
ohci_hcd ssb0:1: intrstatus 0x00000030 FNO UE
ohci_hcd ssb0:1: intrenable 0x8000005e MIE RHSC UE RD SF WDH
ohci_hcd ssb0:1: ed_controlhead 00bbe000
ohci_hcd ssb0:1: ed_bulkhead 00bbe040
ohci_hcd ssb0:1: ed_bulkcurrent 3c1f8000
ohci_hcd ssb0:1: hcca frame #9de4
ohci_hcd ssb0:1: roothub.a 02001202 POTPGT=2 NOCP NPS NDP=2(2)
ohci_hcd ssb0:1: roothub.b 00000000 PPCM=0000 DR=0000
ohci_hcd ssb0:1: roothub.status 00008000 DRWE
ohci_hcd ssb0:1: roothub.portstatus [0] 0x00000103 PPS PES CCS
ohci_hcd ssb0:1: roothub.portstatus [1] 0x00000100 PPS
ohci_hcd ssb0:1: HC died; cleaning up
irq 5: nobody cared (try booting with the "irqpoll" option)
Call Trace:
[<80009524>] dump_stack+0x8/0x34
[<8004f79c>] __report_bad_irq+0x4c/0xac
[<8004fa64>] note_interrupt+0x268/0x2d4
[<8004ff5c>] handle_percpu_irq+0x74/0xac
[<80001e10>] plat_irq_dispatch+0x168/0x1d8
[<80002cc4>] ret_from_irq+0x0/0x4
[<8007b51c>] cache_alloc_debugcheck_after+0x190/0x230
[<8007d678>] kmem_cache_alloc+0x1d8/0x220
[<8016dd30>] __alloc_skb+0x58/0x148
[<801b7a14>] tcp_send_ack+0x38/0x104
[<801bb078>] tcp_delack_timer+0x180/0x284
[<8002e8f8>] run_timer_softirq+0x158/0x1fc
[<80029ae4>] __do_softirq+0x78/0x100
[<80029bc8>] do_softirq+0x5c/0x94
[<80001e28>] plat_irq_dispatch+0x180/0x1d8
[<80002cc4>] ret_from_irq+0x0/0x4
[<8007ad50>] check_poison_obj+0x50/0x214
[<8007b3cc>] cache_alloc_debugcheck_after+0x40/0x230
[<8007d678>] kmem_cache_alloc+0x1d8/0x220
[<8008b008>] getname+0x20/0xd4
[<8007f294>] do_sys_open+0x30/0xcc
[<8000b510>] stack_done+0x20/0x3c

handlers:
[<c0087230>] (usb_hcd_irq+0x0/0x214 [usbcore])
[<c0087230>] (usb_hcd_irq+0x0/0x214 [usbcore])
Disabling IRQ #5

Hi Steve,

From what I understand, the radio doesn't work with the 2.6 kernel on the BCM5354. This would make it very difficult for me to upgrade as well.

I heard a report of someone getting the radio to run in client mode, although I tried with no success. Client mode is all I currently need for my project, if someone could shed some light on this it would be greatly appreciated.

In the mean time, hopefully someone can point us in the right direction to getting this one resolved.

Thanks,
Shawn

Shawn,

The idea is to use a more current kernel to track down the problem and then backport it to 2.4. I think a more current one will provide a better debugging environment and the likelihood of getting help from the linux-usb gang.

Also, I tested an Alix board (x86) with both 2.6.25.16 and 2.6.27-rc6. It also has an OHCI interface. Neither version exhibited the usb problem or presumed memory leak. At this point, the problem is probably either a) ohci hardware, b) ssb driver code, c) ssb-specific parts of the usb code or d) some corner case in the usb code. My suspicion is the last one. The ohci code has lots of workarounds for specific controller bugs. Maybe it needs one more. I've extensively tested several disk controllers using the ohci (not ehci) driver and can't get them to fail.  One explanation is that something in the handling of interrupt endpoints plus a bug in the ohci hardware is causing this. The disk controllers don't have an interrupt endpoint.

I'm going to revisit getting a current kernel to run on my gu. Getting help on linux-usb for an old kernel and weird hardware is unlikely.

I'm sure open to suggestions,

Steve

An update.

I'm using a 2.6.25.26 kernel with the ehci driver not configured.

The problem seems to be caused by a bogus value in the ohci controller's ed_bulkcurrent reg. In the above dump it was 3c1f8000. There was never anything allocated at that location, so the controller goes looking for td's, probably gets a bus error and reports a UE. It's downhill from there. All the failures are similar in that there is a random bad value in the ed_bulkcurrent reg. There are routines in ohci-dbg to dump the bulk queue. I'm trying to get them hooked up to see if the bad address is in the data structures or just the controller reg.

If anybody is familiar with this stuff, I could sure use some help. I still don't have anything worth posting to linux-usb.

Steve

Hi Steve,

I'm willing to hire someone to fix this problem if possible. I will in return make sure the fix get's incorporated with SVN for others to make use of. Any suggestions as to who may be willing to spend some time on this one?

Thanks,
Shawn

I reproduced the problem on 2.6.27 and posted the result to linux-usb on 10/15. I got no reply.

It appears to me that the problem is a bug in the ohci core. A dump of the queue at the time of the error shows a reasonable looking ed, but the ed_bulkcurrent register in the controller is bogus. The UE is caused when the controller tries to load data referenced by that reg. The address isn't in the address space, the controller gets a bus error and sets the UE interrupt. My reading the spec is that the loading of this reg is under control of the hardware. A zero is written to it by ohci-q and ohci-hcd to initialize the controller.  Otherwise, it's untouched. There is only one ed in the queue. I've got no idea how the reg gets screwed up.

Other circumstantial evidence.
  - The ohci controller on my Alix will run the test for hours without problem. This would seem to eliminate the ohci  and serial drivers.
  - The ehci core and the hub w/ a transaction translator on the 500gpv2 will also run the test without problem. This would seem to also suggest that the serial driver is not the problem.

I'm not sure where to go.

Steve

Hi Steve,

Do you think I'd have better luck using a FTDI based USB to serial adapter?

There are some available with the FT232R chip, they claim that they are USB 2.0, and EHCI host controller compatible. If this is the case, from the way I understand this issue, they should load up the EHCI driver and work fine. Any thoughts on this?

I've attached a bit more info on the chip below, let me know what you think.

Thanks,
Shawn

http://www.ftdichip.com/Products/FT232R.htm



- Single chip USB to asynchronous serial data transfer interface.
- Entire USB protocol handled on the chip - No USB-specific firmware programming required.
- UART interface support for 7 or 8 data bits, 1 or 2 stop bits and odd / even / mark / space / no parity.
- Fully assisted hardware or X-On / X-Off software handshaking.
- Data transfer rates from 300 baud to 3 Megabaud (RS422 / RS485 and at TTL levels) and 300 baud to 1 Megabaud (RS232).
- In-built support for event characters and line break condition.
- New USB FTDIChip-IDâ„¢ feature.
- New configurable CBUS I/O pins.
- Auto transmit buffer control for RS485 applications.
- Transmit and receive LED drive signals.
- New 48MHz, 24MHz,12MHz, and 6MHz clock output signal Options for driving external MCU or FPGA.
- FIFO receive and transmit buffers for high data throughput.
- 256 Byte receive buffer and 128 Byte transmit buffer utilising buffer smoothing technology to allow for high data throughput.
- Adjustable receive buffer timeout.
- Synchronous and asynchronous bit bang mode interface options with RD# and WR# strobes.
- New CBUS bit bang mode option.
- Integrated 1024 bit internal EEPROM for I/O configuration and storing USB VID, PID, serial number and product description strings.
- Device supplied preprogrammed with unique USB serial number.
- Support for USB suspend / resume.
- Support for bus powered, self powered, and high-power bus powered USB configurations.
- Integrated 3.3V level converter for USB I/O .
- Integrated level converter on UART and CBUS for interfacing to 5V - 1.8V Logic.
- True 5V / 3.3V / 2.8V / 1.8V CMOS drive output and TTL input.
- High I/O pin output drive option.
- Integrated USB resistors.
- Integrated power-on-reset circuit.
- Fully integrated clock - no external crystal, oscillator, or resonator required.
- Fully integrated AVCC supply filtering - No separate AVCC pin and no external R-C filter required.
- UART signal inversion option.
- USB bulk transfer mode.
- 3.3V to 5.25V Single Supply Operation.
- Low operating and USB suspend current.
- Low USB bandwidth consumption.
- UHCI / OHCI / EHCI host controller compatible.
- USB 2.0 Full Speed compatible.
- -40°C to 85°C extended operating temperature range.
- Available in compact Pb-free 28 Pin SSOP and QFN-32 packages (both RoHS compliant).

I have one of these and use it as the console interface for my 520gu. It has 3.3v I/O. It doesn't have rs232 level converters. Also, it only enumerates as a 12MHz device. I unloaded my pc's uhci driver and plugged it in. It won't enumerate w/ just the ehci driver.

I'm still convinced that there is a bug in the usb2 core in the 5354. I just don't know how to identify it. The "memory leak" is probably a clue and is probably caused by the core bug. I have an Alix 3c2 that also has a ohci controller. It and the ohci and pl2303 drivers run without any problem with the same usb serial dongle. I appreciate one is mips and the other i386, but can't imagine that has anything to do with the problem. When I have a time, I'll see if I can figure out if the memory disappearing in the ohci driver, the pl23203 driver or the ser/net program.

Steve

Steve,

Thanks for the info.

Again, I appreciate all of the time you put in to this one.

Shawn

Steve,

Here is a strange one...I'm now using a Belkin F5U109 USB to serial adapter with OHCI and guess what....it works fine....no problems, no memory leaks, and I beat the daylights out of it for three days straight! I'll attach the DMESG log below.

I also tried a FTDI converter with the FT232BL chip. No luck, runs about 20 seconds, the quits. Although they claim that you can set it to run in 2.0 mode, it always grabs the OHCI driver and won't run with the EHCI.

I wonder why the Belkin runs ok? I just wish this little guy wasn't twenty something bucks and hard so hard to find. sad

Thanks,
Shawn



SB USB20H init
SB COREREV: 2
SB USB20H resetting
USB20H fcr: 0x64
USB20H shim cr: 0x8f7
PCI: Setting latency timer of device 00:03.0 to 64
usb-ohci.c: USB OHCI at membase 0xb8003000, IRQ 6
usb-ohci.c: usb-00:03.0, PCI device 14e4:471a
usb.c: new USB bus registered, assigned bus number 1
usb.c: kmalloc IF 80de7c60, numif 1
usb.c: new device strings: Mfr=0, Product=2, SerialNumber=1
usb.c: USB device number 1 default language ID 0x0
Product: USB OHCI Root Hub
SerialNumber: b8003000
hub.c: USB hub found
hub.c: 2 ports detected
hub.c: standalone hub
hub.c: ganged power switching
hub.c: no over-current protection
hub.c: Port indicators are not supported
hub.c: power on to power good time: 4ms
hub.c: hub controller current requirement: 0mA
hub.c: port removable status: RR
hub.c: local power source is good
hub.c: no over-current condition exists
hub.c: enabling power on all ports
usb.c: hub driver claimed interface 80de7c60
usb.c: kusbd: /sbin/hotplug-call add 1
hub.c: port 1, portstatus 101, change 1, 12 Mb/s
hub.c: port 1 connection change
hub.c: port 1, portstatus 101, change 1, 12 Mb/s
usb.c: registered new driver serial
usbserial.c: USB Serial support registered for Generic
usbserial.c: USB Serial Driver core v1.4
hub.c: port 1, portstatus 101, change 0, 12 Mb/s
hub.c: port 1, portstatus 101, change 0, 12 Mb/s
usbserial.c: USB Serial support registered for MCT U232
mct_u232.c: Magic Control Technology USB-RS232 converter driver z2.0
hub.c: port 1, portstatus 101, change 0, 12 Mb/s
hub.c: port 1, portstatus 101, change 0, 12 Mb/s
hub.c: port 1, portstatus 103, change 10, 12 Mb/s
hub.c: new USB device 00:03.0-1, assigned address 2
usb.c: kmalloc IF 80de73e0, numif 1
usb.c: new device strings: Mfr=1, Product=2, SerialNumber=3
usb.c: USB device number 2 default language ID 0x409
Manufacturer: Belkin USB PDA Adapter
Product: Belkin Components
SerialNumber: 762841
usbserial.c: MCT U232 converter detected
usbserial.c: MCT U232 converter now attached to ttyUSB0 (or usb/tts/0 for devfs)
usb.c: serial driver claimed interface 80de73e0
usb.c: kusbd: /sbin/hotplug-call add 2
hub.c: port 2, portstatus 100, change 0, 12 Mb/s
usbserial.c: USB Serial support registered for PL-2303
pl2303.c: Prolific PL2303 USB to serial adaptor driver v0.10.1

(Last edited by shawnperkins on 23 Nov 2008, 23:21)

The discussion might have continued from here.