[Proxmox] Unable to reach ISP speed of 1gbps

I currently using HP Prodesk G3 Mini I5 6500T as a router for OpenWRT and the max speed I can get from this is 800/550 however on my old ISP's router I was able to reach 1000/500 without any problem and my Mini PC should definitely be more powerful than router from 2015. Anyone know What is wrong with my mini pc or do I need to turn on some setting in OpenWRT?

Software Flow Off loading enabled
Packet Steering Enabled
Not using SQM

It should, easily, even with sqm (even more so without it).

Have a look at htop during a speed test for the CPU utilization nevertheless (I'd suggest to set show_cpu_frequency=1, e.g. by hitting F2 Setup and toggling Display Options, Also show CPU frequency). Maybe dmesg also has complaints about something on your system.

Are you using two dedicated ethernet cards (which) or a "one-armed router (on a stick)" setup with only a single NIC and an external switch?

On Htop when running speedtest it seem that only one core are running at 100% and others are running at 20 - 25% and I'm using Single NIC + External Switch

There is a reason why I asked about the frequency, as the PC might not have any reason to fully clock up to achieve the necessary speed (the i5-6500T can clock between 2.5-3.1 GHz, 4 cores, no HT).

I rather suspect the one-armed router setup here, keep in mind that your single ethernet card has to act as WAN- and LAN at the same time - 1 GBit/s in, 1 GBit/s out, that's twice what it can do (a second ethernet card is required to get full speed on a 1 GBit/s ethernet connection, realtek rtl815x USB3 should already do; PCIe is obviously a better solution if you have the slot available).

1 Like

Alr, I'll try to get my hand on the realtek rtl-825x and will update the result

I do need some kind of Ethernet to USB cable right? I got no PCIe Slot

This https://www.tp-link.com/en/home-networking/usb-hub-and-converter/ue300/ (rtl8152) seems to be quite popular among RPi4 users (which also comes with a single network card and relies on USB3 for adding a second one), I have (almost) no personal experience with USB3 network cards myself (so feel free to follow up with the OpenWrt on RPi4 crowd).

I just got back home, I took a look at the cpu Freq. and just as you said the frequency freeze at 2496mhz both when idle and speedtesting

A single gigabit network interface should be just enough to provide more than 800 MBit/s - even when shared for WAN and LAN - even with the VLAN Overhead (which is neglectable for normal length ethernet packages), as it can provide its bandwidth fullduplex.
Also, a 6500t core should normally not be utilized by 100% when packets are just forwarded.

What's your configuration, is there any SQM or filtering involved? Normally, you don't need/want SQM on a very fast downstream link.

Can you get better stats from htop/top ? The top paragraph shows "interrupt" and "waiting" percentages - do these go up when you utilize the downstream? Which processes use the CPU? 2,5 GHz is fine for the 6500t, although it does neither power down nor turbo up, e.g. you would get 25% more power (which would even bring you close to the 1000 MBit/s...) when turbo mode would work. I have no experience with OpenWRT on X86, so I don't know where to configure the cpu governor - still, I would expect this only to cover the actual problem of generating too much load in the first place.

There are no SQM or Filtering configured, here are the screenshot of htop when doing speedtest

Here are another htop ss when idling

Yes and no.
Yes, the card can do 1 GBit/s in full-duplex mode - obviously.
But that doesn't directly translate into being able to route between WAN and LAN at 1 GBit/s in the router-on-a-stick configuration on a single interface, as no downloading is possible without uploading (acks - and more) and vice-versa, depending on the actual speedtest (modern ones -rightfully so- do test concurrent down- and upload). So while the card is certainly operating in full-duplex mode itself, it effectively becomes half-duplex at full utilization.
2*(1+1) would be required, but with one card, you can only get 1*(1+1) GBit/s from it.
…and just over 800-850 MBit/s is exactly within the bracket you would expect to get from a one-armed-router setup.

That is true - and surprising (thermal throttling?, it really shouldn't be while doing this little, but if the fan (USFF) isn't working properly or is there are dust buildups/ dust-clogged heatsinks, this may easily happen.

Just for comparison:

  • an ivy-bridge c1037u (much slower) with two igb interfaces will route 1 GBit/s at wirespeed at its lowest clockrate (800 MHz) and ~15% core usage
    • it will do the same with sqm/cake at full line-rate at ~30% core usage, without having to clock up (so also at 800 MHz)
  • a braytrail-d Atom j1900 (even much slower) with four igb interfaces will route 1 GBit/s at wirespeed and ~25-30% core usage, but with a more oscillating (~1.3-2.0 GHz) clockrate.
    • the same board cannot achieve wirespeed with sqm+cake (~830 MBit/s at 100% core usage)

While I agree that the speedtest COULD do this concurrently, I am certain we don't see it here as the sum of down + up is quite above 1000. Also, for sure you need some ACK traffic and stuff, so i'd be fine with a limit somewhere between 900-950 Mbit/s, but NOT with 800 Mbit/s with 100% CPU. If there would not have been the 100% CPU, I would have been 100% fine with your point :slight_smile:

So: Htop shows, there is almost all the time spent in the kernel (red CPU bar). I am no htop expert, but according to https://simonfredsted.com/1622 you can go to Setup and enable Detailed CPU time in Display Options. Press F2, TAB (or cursor right), cursor down to detailed CPU time and F10 to exit. Also, you could uncheck "Hide kernel threads" in the same menu.

Edit: Thermal throttling would be seen in the clock frequency. Also, it is quite unlikely to thermally throttle a 6500t with a bit more than 1 core loaded at base frequency - it should barely use 10W (I guess more around 5W) in this scenario.

Let's run the numbers, shall we?
gigabit ethernet, VLAN/PPPoE/IPv4/TCP goodput (~ what "speedtests"* measure) [Mbps]:

1000 * ((1500 - 8 - 20 - 20) / (1500 + 4 + 38)) = 941.63 Mbps

On a single ethernet interface router running a speedtest on a computer on the LAN side of the router we will saturate both the up- and down-load side of that interface (aka egress and ingress).
But wait, TCP requires reverse ACK traffic (for TCP RENO roughly 1/40 the volume of the forward traffic) so we need to account for that as well

(1000 * ((1500 - 8 - 20 - 20) / (1500 + 4 + 38)) ) / 40 = 23.54 Mbps

So we end up with a rough estimate for the upper goodput limit for TCP traffic of

941.63 - 23.54 = 918.09 Mbps

So yes, even with the unavoidable additional cross traffic and inefficiencies, it seems clear that >> 800 Mbps throughput should be achivable.

HTOP: Why are we only seeing 3 cores? I though that i5 should be a quad-core CPU?

During the speedtest you are CPU limited, please do the following:
0) optionally reboot the router to get smaller numbers

  1. post the output of:
    a) cat /proc/interrupts
    a) cat /proc/softirqs

  2. run a speedtest, e.g https://www.waveform.com/tools/bufferbloat or https://speed.cloudflare.com and post a screenshot of the results

  3. post the output of:
    a) cat /proc/interrupts
    a) cat /proc/softirqs

This is aimed at figuring out what processing bunches up on that poor CPU2 to make it hit saturation, also please configure htop for detailed CPU stats ( F2 Setup and toggling Display Options , Detailed CPU time (System/IO-Wait/Hard-IRQ/Soft-IRQ/Steal/Guest)) but I bet we will see mostly Soft-IRQ (shown in magenta/pink)...

*) Speed is distance/time, "speed tests" return data_volume/time, this clearly does not measure a speed, and hence we should stop calling these things "speedtests"... that ship however is already so far out to sea that some might claim is "has sailed already"

1 Like

Indeed, that's extremely weird - all variants of the i5-6500{,T,TE} are 4c4t. I don't know of any Intel CPU that would ever bring up 3 cores (under any circumstances) - unless this would be from within a(n accordingly configured) VM (and there I could accept this CPU utilization as well)…

1 Like

Yeah I forgot to mention that I run these on proxmox VM and this is the only VM running, but I thought I assigned 4 core to the VM i must've mess smth up

These are speedtest after I properly assigned another core to OpenWRT

Here is the new htop screenshot during speedtest

Output of interrupts

Output of softirq
image

What network driver are you using? Please use the virtio network driver in proxmox.

Edit: Hm, to me it looks like you are already. There should be much higher throughput possible, though.
In a VM setting, it is reasonable to leave (at least) 1 core to the host, so the 3 core setup is fine.
Unfortunately, I am no proxmox user. Anyway, this info should have been in the first post...
Almost all time is spend in soft irq (htop: Magenta), so there we go. Something seems odd, likely related to some VM stuff.

As mentioned, this changes the situation completely we don't even know how you've set up the VM, with PCIe pass-through, virtio network drivers or some emulation (and a ton of other questions).

While I personally can't reproduce any slow behaviour under kvm (virtio, no proxmox, plain qemu/ kvm on Debian) on much older hardware (sandy-bridge i7-2600k) at all, we've gotten similar reports before - with proxmox in particular (which went away when running on the bare iron).

Aside from this, if running in an emulation (on a 4c4t CPU), I wouldn't assign more than 2 cores to the VM (it won't do you much good, and the hypervisor host also needs some resources).


Do yourself a favour, write OpenWrt to a small USB stick and boot it from the bare iron, without proxmox - that should give you quite different figures.

1 Like

Alright I'll try that

Boot up OpenWRT without proxmox, speed still around at 800/500