Why router SoCs are plagued with outdated CPU architecture?

Respectfully I believe that is not true, I cant tell you how many times I have seen Chinese only versions of routers some crippled with way less flash memory or ram then their counterparts designed for capitalistic markets. If what you say were true then routers made and sold exclusively in communist china would have MORE flash ram etc., not less..... Because you know, those darn capitalists demand only the cheapest.....

I thought more of clothes and cars and so on…

Memory chips, they never gets expendable and “cheep” like cpu:s since they are built in everywhere in numbers.
In the end if someone finds some unused memory lying around they are sold and built into usb flash drives or ssd.

Memory is actually one of very very few products in the world sold with day price based on availability instead of fixed price.

The cpu found in RaspberryPi are leftovers from old cell phones never manufactured.

Is the router CPU strong enough to do the job while also cheap enough for a consumer device? That's all that matters.

NAS functionality in a router or network device in general is secondary. Low power consumption and overall cost are primary concerns for the designers.

Yeah no.
Chips are like tiny circuit boards, with small traces for electric curent to move between small components in them.

Manufacturing chips is like a CNC, there is a laser drill that etches the electronic circuit and the transistors and whatever else on a special silicon surface.

This laser CNC can be more or less precise, so it can make the circuits smaller or bigger.

Due to how electronics works, what is OK for a particular size of circuit is NOT OK for another. So each CPU design is bound to a specific CNC laser size.

The "size" is called "process node". So chips designed to be manufactured at 14nm (nanometers) process node can only be built by factories that have a laser CNC device that can do that.
Note that this is also not exactly like a CNC, if your factory equipment can make a chip at 14nm, that's all it can do. You cannot take that and use it to make a 28nm or 60nm or whatever just because it is "bigger". So it's very inflexible.

For obvious reasons, older designs are using older (and bigger) process nodes.

The costs of manufacture are related to how many factories there are for a particular process node and how much worldwide demand there is.

In general, older stuff gets cheaper, then there is a cut off point where it becomes more expensive as bigger factories for that process node were shut down and only smaller ones remain.

While I'm calling the factory's tooling "laser CNC" for simplicity, this is VERY high tech and precise machinery, and this means it is VERY expensive.
This is laser machinery that can etch a silicon transistor of the size of a few nanometers. It's VERY small.

Making a new factory for even an existing process node requires more than 5 billion dollars of investment, just to buy the machines, and 3-4 years of setup time before the factory is ready to do anything useful. It's a very slow process that lives of future predictions, because you can't afford to just invest so much money and time like that to realize 2 years later that nobody wants to use that.

This is one of the reasons why there is a so-called "chip shortage" right now and GPUs are so hard to find. There are only two factories in the world that can build the latest GPU chips, and they have been at 100% capacity for at least a year now. There is simply no more available factories to make more of those chips, and "making more factories" requires 2-3 years and stupid amounts of money.

Because they would hurt the sales of their newer CPUs.

But this is economics and there is an important difference between Intel/AMD and most ARM/Mips/Power/whatever else used outside of PC world.

Intel and AMD manufacture and sell a CPU (and chipset and whatnot), a phisical chip. While ARM/Mips/Power/whatever sell the license to use the CPU design on a custom chip, done by someone else (Qualcomm, Broadcomm, Mediatek, Realtek and so on).

As I said above, the costs to design and manufacture actual hardware are VERY high, so you cannot afford to keep making older chips that are still pretty decent and will significantly decrease the sales of your newer stuff.

But ARM/Mips/Power/whatever don't have these costs. They have made a blueprint, a circuit design for a CPU. So for them it is still profitable to keep selling even very old CPU designs to other companies.

And the companies that actually make the chip don't care about CPU design age or anything like that. They are making a chip for a specific usecase.

For example a wifi router, or a media center, or a smart TV, or a businness firewall appliance, or anything else that requires a digital CPU.

The CPU is only one of the components in that chip that is called "SoC" or "System on a Chip", there will be wifi, ethernet switching, pcie and USB controllers, and other types of components for specific applications, that are also a very important selling point of that product.

As long as their design is good enough to do the job they are selling it for, it is good for them.

10 Likes

That was partly true for the first gen. Raspberry foundation got a sweet deal on older mediacenter or TV recorder chips from Broadcomm, that was trying to get rid of old inventory.

But once Broadcomm realized that these mad lads were able to sell even crappy garbage like hot cakes they got a much better treatment.

Some SoCs used in newer raspberry 2 and 3 devices were a custom design, that Broadcomm more or less made just for them, and with raspberry 4 they got the full "serious businness partner treatment" aka they got access to the latest designs for the segment they are in (media center), before official release.

Because yeah, they keep selling like hot cakes, and Broadcomm likes that.

3 Likes

Decent power, enough grunt to be a nas and dual lan ports for routing. Or get a decentish router with wifi and snap up a raspberry pi to do your nas work.

There was a time that picking up an all in one device to do your home network was a no brainer.

However with the advance of tech and easy of use of things like the Pi. You might just find that seperate networking components are easier. Getting a router like the R4S, then maybe a wifi unit like a ubiquiti ac-lite (cos it will be miles better than most ISP router wifis, throw in a cheap 8 port switch or even an ex-corperate rackmount switch if u have lots of devices to wire up and you end up with a much better solution. However its how much of a solution u want vs cash you want to spend.

If only the CPU were the weak spot of that SoC. The real trick is finding the internal bandwidth to move data to and from it. The thing just about manages to route 1gbit total across all peripherals (not counting switched data which doesn't touch the CPU). 2013 not withstanding, the host should be able to process 4 times that without getting warm, but it'll never get the chance.

Well, IP40xx cant route any better than 1Gbit as the internal switch has a single 1Gbit upstream port to the ethernet controller, hence the bottleneck.
But you have to understand that this thing was awesome when it came out, it had a quad Cortex A7 CPU with NEON/VFP4-D32, 5 port gigabit switch, and more importantly dual-band concurrent 802.11ac Wawe-2 radios and it was cheap.

That's what I was saying: the CPU is not the problem with that SoC, it's actually capable of processing more data than you can actually deliver to it. That's not to say you can't max it out if you throw a lot of encryption demands on it, but for most general networking purposes, that quad A7 is not the problem.

Ok, I misunderstood you then.
Yeah, encryption is not its strong point despite it having a encryption engine but the engine itself is a lot of times actually slower then using the CPU as it only handles one block size fast.

This note is in part derived from what I hope will remembered as a hilarious short term bug in starlink's beta firmware: ( https://www.youtube.com/watch?v=c9gLo6Xrwgw ) . But, in poking further into that chipset, it is looking like the codel portion of the offload is broken, and I'm not sure how well the wifi performs. While my fervent wish is to see embedded cpus with more memory bandwidth and less offloads, most of the market forces for the chipmakers have leaned the other way for a decade - trying to reduce power, trying to improve throughput (if not latency), trying to have vendor lock with proprietary APIs - trying to save cents on the BOM in an era of bad benchmarks and inherited IP.

It would be my hope that router makers have learned from the chipmaker's treatment of "software" as if were transistors and painful abandonment at the end of each 18 month cycle, in light of ISPs wanting support for CPE of 8 years or more, and begun to re-insist on full sources, but that's just a hope. It has always been better for smart users to go x86 but with complicated interfaces like vdsl and cable, people
are increasingly stuck with the box the isp supplies them.

I'm still unsure if the dishy can be fixed, and unsure if the ip8014 can be either (still looking for confirmation on the codel issue, btw)

Maybe in the coming years risc-v and other open hardware stacks like openwifi will reopen the edge to innovation.

Actually I don't quite see those performance numbers on ipq4019, the best I can achieve without software flow-offloading is around 360 MBit/s (no VLAN tagging, no PPPoE, plain DHCP on WAN, routing, NAT). It's still a pretty nice chipset (and software flow-offloading can extend that by quite a bit), apart from the pesky switch driver.

1 Like

In short, you need lots of smart people in order to make working and usable products and both time and costs add up at each level of design and manufacturing.

Generally it takes time to develop the IP blocks and software around the soc, and there's also power efficiency, factory time for a given process node, and various licensing costs meant to cover the R&D.

Also, demand for newer processes relative to old processes is getting higher, and there's just too many different manufacturers competing against each other / driving to differentiate themselves from their competition in some unique and useful way. This is driving the costs up and introducing delays.

For example if you look at cortex X1 with its wide decoders and deep speculative pipelines and fancy branch caches, that give it that nice performance per clock and high IPC, it needs a lot of transistors and all of them are doing lots of work. It's mostly made using relatively fancy Samsung's 7nm finfet and tsmc 7nm and 5nm processes which are in very high demand and end up relatively expensive per chip. This is in order to be able to hit the power targets and clocks.

I think most A53 cores are built on various 28nm for which capacity is relatively plentiful these days.

You could probably synthesize a few X1 cores in 28nm and get 500MHz clock rates out of it, maybe more with active cooling, ... at that point A53 might actually be faster.

The biggest driver of cost for newer processes are that you need to fund salaries of people to cover the R&D, who are always working on new stuff, not the cost of materials. Your ROI also needs to cover initial deployment costs for manufacturing (when you're installing equipment at a factory it'll sit idle for a few days/or even months because stars won't be aligned correctly for whatever reason.. there's both real and opportunity costs to that, in addition to time delays). So, if Samsung or TSMC think they can charge X dollars per chip based in factory floor time and chip design, and reinvest the money into hiring and growth, they probably will.

And so, when it comes to home routers, they're probably going to be based on left-over IP from SoCs from phones married to recycled network IP blocks updated for the newer process and will be designed to reuse whatever factory capacity was abandoned from being needed for phones and TVs.

And so simply because most folks won't pay $500+ for their home router, and some folks can probably still sell your 2010 era 700MHz mips74kc routers in fancy new triangular plastic boxes somewhere in the world where income is low and internet is slow, you end up with old and cheap stuff.

...

You could totally make a "home router" today with 25Gbps SFP ports, and a couple of those connected to an M1 Mac mini.

I don't think there is a bottleneck, because hardware NAT is done within the switch.

To be clear, that is not how ICs are made. The circuits and micro devices aren't made by anything like laser drill. They are made using a multi-stage photolithography process. The circuits are built up in layers. A pattern is printed onto a big wafer, one processor at a time. Once the entire wafer has been printed it's processed to etch away some of the silicon, or alter it's properties, or lay down a layer of metal. This process is repeated multiple times to achieve a finished wafer which is then diced into individual chips.

It is absolutely the case that chip designs and process nodes move pretty-much hand in hand. Newer designs are faster because they find ways to use more transistors, transistors that are made available by new process nodes. New process nodes are expensive, and get even more expensive as the technology progresses.

Home router CPUs don't advance quickly for multiple reasons. The price-points are very low. $100-150/unit, which doesn't fund a lot of innovation. Most of the work is done by specialized ICs or functional blocks and the work being done does not increase quickly. Home wired networks have been gigabit for years. WiFi standards advance slowly.

5 Likes

The most expensive part of a home router is the colored carton it arrives in anyway.

The next expensive part is the designed plastic chassi.

Do you remember in the 90’ and 00’ when we could buy PC soundcards and graphics card as one of these:
Retail = colorful carton and drivers and papers snd the card for 100€.
Bulk = the same as retail but with a brown carton for 50€.
OEM = just the card in ESD bag for 25€.

And everyone got their profit even with OEM price anyway.

3 Likes

Not really, the switch inside is modified QCA8337N which has some HNAT capabilities but they are not used at all by QSDK or OpenWrt

1 Like

Yeah I know. I think I said that the "laser CNC" is an oversimplification (photolitography as a whole is still a process where you are removing material to etch something and there are lasers involved at some point so it's "close enough" for a quick explanation).
I wanted to focus on the actual important parts that answer the question. My post got pretty big already.

Actually, every router I worked have a bottleneck. Thats is a hardware implementation problem. That is because the HWNAT is usually done in the SoC, not in the switch. Switch is just a normal level 2 switch chip. And HWNAT is a level 3 (IP) implementation. You can get full switch only in bridge mode.

Take the MT7621, for example. It connects the switch to the HWNAT engine in the SoC with a 2Gbits port. The MT7620 have a 1Gbit port only. Realtek 8197F/G have a 1GBit RGMII port for external switch. Don't know about QSDK, but I think the design is the same...

This is true most of the time, but do take a look at QCA8337N datasheet.

1 Like