Bringing up BCM3380 - extremely slow instruction execution?

I'm trying to bring up BCM3380 on a VMDG480. Things most look OK as the mainline kernel already has bcm63xx support so it mostly involves tweaking the register definitions by cross-referencing open source router firmware of the same SoC. One thing that has been blocking me so far is the slow processor execution speed though: the kernel is showing a BogoMIPS of 3.69 (lpj=7392), and it takes more than 5 mins to execute a simple loop of 333,000,000 integer additions (processor is supposed to be running at 333MHz).

A couple of things I tried:

  • I noticed the stock bootloader is disabling cacheability on kseg0 before jumping to linux, so I turned it back on with change_c0_config(CONF_CM_CMASK, CONF_CM_CACHABLE_NONCOHERENT), but that didn't help. I also checked the kernel is running from 0x80010000 which is indeed kseg0.
  • Disabled all hardware block clocks except the main processor clock.

Does anyone have any suggestions I could try out?

Figured it out myself. Looks like I need to turn on caching for kseg0 in CP0 config resgister, as well as in a broadcom specific config register $22 (CP0_BCM_CFG_ICSHEN | CP0_BCM_CFG_DCSHEN).