IQrouter V3/pro automatic latency managment reverse engineering

Hi everyone

considering that the IQrouters run openwrt, and that the evenroute company is dead,

has anyone tried to reverse-engineer that system so that it could be included in mainstream openwrt? considering that it should not really be anything that could be truly count as propetary...

According to New OpenWRT - Owner of orphaned IQRouter V2 v3 is ZBT, Pro is x86.

not what i was asking. what i mean is, does anyone own these devices and tried to reverse engineer their sauce that allowed them to have adaptive SQM so that it might be included upstream?

Well, cake-autorate is as close as we got, it solves the same issue with a different method.
Evenroute, as far as I know, operated their own speedtest servers in the cloud so that they could measure the true achievable throughput cyclically and adjust the shaper rate to avoid bufferbloat.
I think evenroute has a couple of patents related to that stuff, so I never looked too closely. Also, I miss them, in my view they were part of the "good team"* offering a service to make decent responsiveness something available to everybody.

*) Just look how they organized what must have been a quite unpleasant exercise shutting down their company, and they still managed to make sure their users where not just abandoned with nowhere to go.

Just to be clear, I am not up to participating in such a reverse engineering* unless the principals of evenroute would give us their permission. I also believe that with cake-autorate and libreqos there are alternatives around that allow similar results without having to irritate the remainder of evenroute.

*) Given that the remote side infrastructur is shut down, I also believe:
a) reverse engeneering things just from the client side will be tricky
b) without anybody stepping up to fund the remote measuring servers the whole approach will likely not be useable by OpenWrt.

Sound like a full adaptive loop like YouTube’s ABR.

I've cooked this up for myself and still testing...

  • requires packages: lua, luaposix
  • jiggle parameters and test
  • chmod +x adaptive_sqm_ingress.lua
  • lua adaptive_sqm_ingress.lua
#!/usr/bin/env lua
-- adaptive_sqm.lua
--
-- Adaptive SQM Controller (Lua)
-- Monitors the receive bytes on the ifb4eth1 interface (used for download shaping),
-- computes throughput using EWMA and a PID controller, and then updates the SQM
-- shaping rate (sqm.eth1.download) via UCI.
--
-- This version uses LuaPosix to handle Ctrl‑C (SIGINT) gracefully and uses posix.sleep
-- instead of os.execute("sleep ...") for better signal handling.
--
-- Assumptions:
--   - Target shaping rate is 600,000 bps (600 Kbps).
--   - SQM is configured via UCI on eth1 (with ifb4eth1 used for ingress shaping).
--   - SQM restarts take ~2-3 seconds.
--   - The script updates SQM only if the computed change exceeds 10% of the setpoint.
--   - The script monitors ifb4eth1.
--
-- Tunable parameters (PID constants, smoothing alpha, update interval, etc.) are provided below.

local posix = require("posix")
-- Register SIGINT handler for graceful termination
local function handle_sigint(signum)
  print("\nSIGINT received, exiting gracefully.")
  os.exit(0)
end
posix.signal(posix.SIGINT, handle_sigint)

-- Configuration
local interface = "ifb4eth1"        -- Interface for download shaping (ingress)
local update_interval = 5           -- seconds between measurements
local setpoint = 600000             -- target shaping rate in bps (600 Kbps)
local min_rate = setpoint * 0.5     -- minimum shaping rate allowed (50% of setpoint)
local update_threshold = 0.1 * setpoint  -- update only if rate change exceeds 10%
local interface_timeout = 10        -- seconds to wait for interface after SQM restart

-- PID parameters (tune these empirically for your network)
local Kp = 0.1
local Ki = 0.01
local Kd = 0.05

-- EWMA smoothing parameter (alpha between 0 and 1)
local alpha = 0.2

-- State variables
local previous_smoothed = 0
local previous_error = 0
local integral = 0
local current_sqm_rate = setpoint  -- start at the setpoint

-- Utility: sleep using posix.sleep (which can be interrupted by signals)
local function sleep(n)
  posix.sleep(n)
end

-- Wait for a specified interface to appear in /proc/net/dev.
local function wait_for_interface(iface, timeout)
  local start_time = os.time()
  while os.difftime(os.time(), start_time) < timeout do
    local file = io.open("/proc/net/dev", "r")
    if file then
      for line in file:lines() do
        if line:match(iface .. ":") then
          file:close()
          return true
        end
      end
      file:close()
    end
    sleep(1)
  end
  return false
end

-- Read rx bytes for a given interface from /proc/net/dev.
local function read_rx_bytes(iface)
  local max_attempts = 5
  local attempt = 1
  while attempt <= max_attempts do
    local file = io.open("/proc/net/dev", "r")
    if not file then
      error("Failed to open /proc/net/dev")
    end
    for line in file:lines() do
      if line:match(iface .. ":") then
        local parts = {}
        for token in line:gmatch("%S+") do
          table.insert(parts, token)
        end
        file:close()
        return tonumber(parts[2])
      end
    end
    file:close()
    print("Interface " .. iface .. " not found, attempt " .. attempt .. " of " .. max_attempts)
    sleep(2)
    attempt = attempt + 1
  end
  error("Interface " .. iface .. " not found after " .. max_attempts .. " attempts")
end

-- Update the SQM shaping rate via UCI.
local function update_sqm_rate(new_rate)
  local diff = math.abs(new_rate - current_sqm_rate)
  if diff < update_threshold then
    return  -- Do not update if change is below threshold
  end
  local cmd = string.format("uci set sqm.eth1.download=%d && uci commit sqm && /etc/init.d/sqm restart", new_rate)
  print("Running command: " .. cmd)
  local res = os.execute(cmd)
  if res ~= 0 then
    print("Error updating SQM rate with command: " .. cmd)
  else
    print(string.format("SQM rate updated to %d bps", new_rate))
    sleep(3)  -- Wait for SQM restart to complete
    if not wait_for_interface(interface, interface_timeout) then
      io.write("Warning: Interface " .. interface .. " did not reappear within " .. interface_timeout .. " seconds.\n")
    end
  end
  current_sqm_rate = new_rate
end

-- Clear the screen and print a header.
io.write("\27[2J\27[H")
io.write("Adaptive SQM Controller - Press Ctrl-C to exit\n")
io.flush()

-- Main control loop
local last_bytes = read_rx_bytes(interface)
while true do
  sleep(update_interval)
  local new_bytes = read_rx_bytes(interface)
  local diff_bytes = new_bytes - last_bytes
  last_bytes = new_bytes

  -- Calculate instantaneous throughput in bps.
  local throughput = (diff_bytes * 8) / update_interval
  
  -- Apply EWMA smoothing.
  local smoothed = alpha * throughput + (1 - alpha) * previous_smoothed
  previous_smoothed = smoothed
  
  -- Compute error (setpoint minus smoothed throughput).
  local error = setpoint - smoothed
  
  -- Update integral and derivative for PID control.
  integral = integral + error * update_interval
  local derivative = (error - previous_error) / update_interval
  
  -- Compute PID correction (in bps).
  local correction = Kp * error + Ki * integral + Kd * derivative
  
  -- Calculate new SQM rate, bounded between min_rate and setpoint.
  local new_rate = current_sqm_rate + correction
  if new_rate < min_rate then new_rate = min_rate end
  if new_rate > setpoint then new_rate = setpoint end
  
  update_sqm_rate(new_rate)
  previous_error = error
  
  io.write(string.format("\rMeasured: %.2f bps | Smoothed: %.2f bps | Error: %.2f | Correction: %.2f | New Rate: %.2f bps   ",
      throughput, smoothed, error, correction, new_rate))
  io.flush()
end

I only just wrote this recently, so it's in beta.

shows:

  • The measured throughput (current speed in bps).
  • The smoothed throughput.
  • The error (difference from the target).
  • The correction computed by the PID controller.
  • The new shaping rate.

FINAL FOR NOW:

UPDATES:

  • Dynamically reads rates even if changed in sqm configs/luci
  • Reduce by 5% (jiggle it)
  • Ignores max spikes (jiggle the period)
#!/usr/bin/env lua
-- adaptive_sqm.lua
--
-- Adaptive SQM Controller (Lua)
-- Monitors the ifb4eth1 interface for download traffic, applies a PID controller,
-- and updates sqm.eth1.download via UCI. This version imposes:
--   1) a strict max step of 5% from the current rate each update,
--   2) a min rate of 70% of the setpoint,
--   3) outlier detection to reset the PID state, and
--   4) dynamic setpoint updates from UCI.

local posix = require("posix")

-- Register SIGINT handler for graceful termination
local function handle_sigint(signum)
  print("\nSIGINT received, exiting gracefully.")
  os.exit(0)
end
posix.signal(posix.SIGINT, handle_sigint)

-- Helper: run cmd, capture output
local function capture(cmd)
  local f = assert(io.popen(cmd, "r"))
  local s = f:read("*a")
  f:close()
  return s
end

-- Retrieve current SQM download rate (setpoint) from UCI
local function get_setpoint()
  local output = capture("uci get sqm.eth1.download")
  if output then
    local rate = tonumber(output:match("(%d+)"))
    if rate then
      return rate
    end
  end
  error("Failed to retrieve sqm.eth1.download from UCI")
end

-- Script constants
local interface = "ifb4eth1"
local update_interval = 5

-- We'll define variables that can be updated in main loop
local setpoint = get_setpoint()        -- in bps
local min_frac = 0.70                 -- 70% as minimum fraction
local step_frac = 0.05                -- 5% step limit
local outlier_mult = 5.0             -- outlier threshold multiple

local Kp = 0.05
local Ki = 0.005
local Kd = 0.01
local alpha = 0.2

local previous_smoothed = 0
local previous_error = 0
local integral = 0
local current_sqm_rate = setpoint

-- We recalc these dynamically each loop
local min_rate = math.floor(setpoint * min_frac)
local outlier_threshold = math.floor(setpoint * outlier_mult)
local update_threshold = math.floor(0.05 * setpoint)  -- 5% update threshold

-- Sleep using posix.sleep
local function sleep(n) posix.sleep(n) end

-- Wait for interface
local function wait_for_interface(iface, timeout)
  local start_time = os.time()
  while os.difftime(os.time(), start_time) < timeout do
    local file = io.open("/proc/net/dev", "r")
    if file then
      for line in file:lines() do
        if line:match(iface .. ":") then
          file:close()
          return true
        end
      end
      file:close()
    end
    sleep(1)
  end
  return false
end

-- Read rx bytes
local function read_rx_bytes(iface)
  local max_attempts = 5
  for attempt=1,max_attempts do
    local file = io.open("/proc/net/dev", "r")
    if file then
      for line in file:lines() do
        if line:match(iface .. ":") then
          local parts = {}
          for token in line:gmatch("%S+") do
            table.insert(parts, token)
          end
          file:close()
          return tonumber(parts[2])
        end
      end
      file:close()
    end
    print("Interface " .. iface .. " not found, attempt " .. attempt .. " of " .. max_attempts)
    sleep(2)
  end
  error("Interface "..iface.." not found after "..max_attempts.." attempts")
end

-- Hard-limit a new rate to only 5% change from old rate
-- We'll do that after computing new_rate from the PID
local function limit_rate_change(old_rate, raw_new_rate)
  local max_step = old_rate * step_frac   -- 5% of the current rate
  local desired_change = raw_new_rate - old_rate
  
  if desired_change > max_step then
    return old_rate + max_step
  elseif desired_change < -max_step then
    return old_rate - max_step
  else
    return raw_new_rate
  end
end

-- Update SQM
local function update_sqm_rate(new_rate)
  local diff = math.abs(new_rate - current_sqm_rate)
  if diff < update_threshold then
    return
  end
  
  local cmd = string.format("uci set sqm.eth1.download=%d && uci commit sqm && /etc/init.d/sqm restart", math.floor(new_rate))
  print("Running command: "..cmd)
  local res = os.execute(cmd)
  if res ~= 0 then
    print("Error updating SQM rate with command: "..cmd)
  else
    print(string.format("SQM rate updated to %d bps", math.floor(new_rate)))
    sleep(3)
    if not wait_for_interface(interface, 10) then
      io.write("Warning: Interface "..interface.." did not reappear.\n")
    end
  end
  current_sqm_rate = new_rate
end

-- Clear screen
io.write("\27[2J\27[H")
io.write("Adaptive SQM Controller - Press Ctrl-C to exit\n")
io.flush()

local last_bytes = read_rx_bytes(interface)

while true do
  sleep(update_interval)
  
  -- Re-read setpoint from UCI
  local new_setpoint = get_setpoint()
  if new_setpoint ~= setpoint then
    print("\nSetpoint changed from "..setpoint.." bps to "..new_setpoint.." bps.")
    setpoint = new_setpoint
    current_sqm_rate = setpoint
    min_rate = math.floor(setpoint * min_frac)
    outlier_threshold = math.floor(setpoint * outlier_mult)
    update_threshold = math.floor(0.05 * setpoint)
    previous_smoothed = setpoint
    previous_error = 0
    integral = 0
  end

  local new_bytes = read_rx_bytes(interface)
  local diff_bytes = new_bytes - last_bytes
  last_bytes = new_bytes
  local throughput = (diff_bytes * 8)/update_interval
  
  -- Outlier check
  if throughput > outlier_threshold then
    print("\nOutlier detected ("..throughput.." bps). Resetting PID state.")
    previous_smoothed = throughput
    previous_error = 0
    integral = 0
  else
    -- EWMA
    local smoothed = alpha*throughput + (1-alpha)*previous_smoothed
    previous_smoothed = smoothed
    
    -- PID error
    local err = setpoint - smoothed
    
    integral = integral + err*update_interval
    local derivative = (err - previous_error)/update_interval
    
    local correction = Kp*err + Ki*integral + Kd*derivative
    
    local raw_new_rate = current_sqm_rate + correction
    
    -- clamp to [min_rate, setpoint]
    if raw_new_rate < min_rate then raw_new_rate = min_rate end
    if raw_new_rate > setpoint then raw_new_rate = setpoint end
    
    -- limit to 5% step from current_sqm_rate
    local limited_rate = limit_rate_change(current_sqm_rate, raw_new_rate)
    update_sqm_rate(limited_rate)
    
    previous_error = err
    
    io.write(string.format("\rMeasured: %.2f bps | Smoothed: %.2f bps | Err: %.2f | Corr: %.2f | New Rate: %d bps   ",
      throughput, smoothed, err, correction, math.floor(limited_rate)))
    io.flush()
  end
end


@mindwolf - This is an interesting bit of code. The comments say:


-- Adaptive SQM Controller (Lua)
-- Monitors the ifb4eth1 interface for download traffic, applies a PID controller,
-- and updates sqm.eth1.download via UCI...

I'm not sure how this affects the performance of the router. It appears to look at the traffic on the download link (computed from /proc/net/dev) then adjusts the SQM Download parameter as needed, using a PID controller to avoid sudden changes.

Does it change things based on some measure of latency? Thanks

No, it's a test of a simpler form of managing it. Since we would really need to measure the difference between the syn-ack and ack to get a proper measurement of latency.

I wonder what the goal here is? And what theory sits behind the control loop design?
I might be misunderstanding this, but that really seems to vary the ingress shaper in the range of 0.7 to 1.0 of a given reference shaper setting based on how the recent ingress traffic compares to some longer term average. Now, organic traffic often is not actually saturating the link so the actually achieved throughput can be limited either by not enough organic traffic being present or by the traffic hitting a bottleneck. In the not enough traffic case, arguably the traffic shaper should stay at 1.0 of reference, and only in the bottleneck along the path condition the shaper should throttle down. I am too daft to see how this is achieved?

While this is one way to measure a latency, it is not really ideal, as it a) only applies to TCP, b) will only give one sample per flow, which is a bit too sparse to catch the latency dynamics, no?

Also, the actual latency of a flow depends quite a lot on the distance/path length between the endpoints (if we want to split hairs, the sum of forward and reverse paths), but that distance dependent latency is not informative about whether there is a bottleneck on the path.

I agree—it’s a neat idea in theory, but there are some clear limitations. Using just the syn-ack to ack measurement only works for TCP, and you really only get one data point per connection. That single sample might not capture the full story of how latency fluctuates over time. Plus, since latency is heavily influenced by the physical distance and path between endpoints, that measurement could be more about the fixed delay of the route rather than any real bottleneck or congestion issue.

Sidenote, as far as I understood the iqrouter approach, it was different in that they cyclically ran a capacity test so could assume to actually measure the actual achievable capacity, which in turn can be used as reference to set a shaper (somewhat below that achievable capacity). So they did not have the challenge of application limited traffic that much. But there the challenge becomes, that to perform these capacity measurements essentially sqm needs to be disabled (or the shaper set so that detectable queueuing delay occurs again) and that means during those measurements users would experience bufferbloat. Given that most users are not active all of the time might not be all that noticeable, but that is why e.g. cake-autorate uses active latency probes instead of active capacity probes (cake-autorate will also look at the organic traffic rates, but more as a hint than a ground truth).
Cake's own built in autorate-ingress is different in that it tries to deduce the ingress capacity by looking at inter packet spacing IIRC, clever, but not perfect and only applicable for ingress where packets have already experienced the bottleneck's transmission delay.... (also this is susceptible to bursty from the near side of the bottleneck...) (that said. autorate-ingress IMHO mostly suffers from not allowing to actually configure the true maximum and minimum rate limits)