SQM codel and sfq

Two new disciplines have been added to luci-sqm. I would like to know if they are better than cake and fq_codel. From what I have tested, they allow for greater bandwidth capacity.

Define better? Bandwidth is not the be all and end all.
If the new disciplines work better then cake for you, just use them?

1 Like

Where might i find the comparative list between all the SQM methods?

Neither codel, let alone sfq are new...
Codel (controled delay) is a single queue AQM than Jacobsen/Nichols created that operates based on sojourn time, and sfq (stochastic fair queueung) is a schedulder that first hashes packets into its sub queues and then ropund robin schedules these queues.
Both are pretty fundamental for fq_codel and cake, in that fq_codel combined and extended both, an d cake is in essence a variant of fq_codel that extends fq_codel in directions really useful for leaf networks (like residental home networks).
Whether one of the is better, really depends on your measure of optimality.

Note sure one exists... but here is a very short run down:

CoDel:
The first sojourn time (experienced queueing delay) based single queue AQM, much less configuration issues than RED, works pretty well, but like all single queue AQMs does not deal well with flows that do not respond to the signaling

SFQ:
The first effective flow queuing qdisc, that (stochastically) separates flows into separate queues und now under-responsive flows will mostly hurt themselves... (it had some issues IIRC in that it performed tail drop instead of head drop, and that is tended to reshuffle the hash too often), it also had no meaningful AQM for each queue

FQ_CoDel:
Combines CoDel and SFQ, extending the default number of queues to 1024, implements head drop, as well as batch dropping on massive overload, gives each queue its own CoDel instance. All around a great default qdisc

Cake:
Takes fq_codel and adds features that are really useful for a home network, like:
a) adds a traffic shaper with overhead accounting so users do not need to configure an additional traffic shaper and overhead
b) makes the flow hashing associative, which results in a lower probability of collisions
c) adds isolation modes that allow per-internal-IP address fairness
d) a few easy to configure priority tiers

4 Likes

Queue Management Algorithms:

Queue Management Algorithms:

Feature SFQ CoDel FQ-CoDel CAKE pfifo_fast
Algorithm Type Queueing Active Queue Management (AQM) Hybrid AQM and Scheduling Comprehensive Queue Management Simple Queue Management
Primary Goal Fairness Latency Reduction Latency Reduction + Fairness Latency, Fairness, Throughput Optimization Basic FIFO Queueing
Flow Isolation Yes No Yes Yes No
Latency Control No Yes Yes Yes No
Overhead Low Low Moderate Moderate Low
Configuration Complexity Low Low Moderate Moderate Very Low
Typical Use Case General purpose fair queueing Simplifying bufferbloat management Improved fairness and latency management Advanced traffic management and optimization Simple environments with minimal queue management needs

Traffic Shaping and Rate Limiting:

Feature HTB HFSC CBQ TBF CAKE
Algorithm Type Queue Management Queue Management Queue Management Rate Limiting Comprehensive Queue Management
Primary Goal Bandwidth Management Bandwidth and Latency Management Bandwidth Management with Class-based Differentiation Rate Limiting and Traffic Shaping Latency, Fairness, Throughput Optimization
Flow Isolation Yes Yes Yes No Yes
Latency Control No Yes No No Yes
Overhead Moderate High Moderate Low Moderate
Configuration Complexity High Very High High Low Moderate
Typical Use Case Traffic shaping and prioritization Networks needing strict QoS, real-time applications Complex organizational structure and prioritization policies Simple bandwidth throttling and basic traffic shaping Advanced traffic management and optimization for diverse networks

CAKE does both and optimally. You can get cake up and running simply on your upload side by issuing these two lines via ssh and copy & paste:

tc qdisc add dev "$(ifstatus wan | grep 'l3_device' | awk -F\" '{print $4}')" root cake nat overhead 18`
tc qdisc show dev "$(ifstatus wan | grep 'l3_device' | awk -F\" '{print $4}')"

This is only icing to Moeller's post and is not meant to overshadow his advice. Consider me the guy holding the cue cards.

10 Likes

Interesting. Where did you get that table from?

So cake is the most advanced of all, right? Can you tell me if all these queuing mechanisms are still being updated?

Great summary, thanks.

It depends, cake has the most bells and whistles, but it also seems to be the most CPU intensive one.
Most of these qdiscs in the Linux kernel change rarely, but occasionally there are updates (e.g. if kernel apis change) bug fixed and occasiOnally even new features. I guess git should be able to get you list of changes for each qdisc source codevfile, but I do not know the git command for that from the top of my head.

As I said, great tables! Maybe we can all brainstorm and make these even better?

Nitpicking: modern sfq contains a variant of RED (and might have since forever, I have no motivation to research since when), so that could be classified as AQM Yes (it will not be so hot for latency control but probably leaps and bounds about a FIFO)

This one actually has 3 parallel queues with different priorities and steers packets into these based on DSCP values (too lazy to look up which by default)

Generally seems to be much less computationally demanding than cake... if you capture that in the overhead row, maybe set this to moderate or even low and set cake to High?

as typical use case maybe add "home network wan interface"...

1 Like

I created it.

great summaries. The author of sfq was involved in fq_codel's development. By default SFQ has a short (128 packet) tail drop queue, which works pretty well up until you start cracking 100mbit, and as best as I recall, about 200mbit and 20ms rtt it started falling apart. SFQ is pretty ideal below 4mbit. If you add more depth to support higher bandwidths you get into a situation where with two flows, you have a huge queue, or with dozens, one that is too short, which is why an AQM was also needed.

SFQRED was my pride and joy and was the first successful combination of a FQ system and an AQM... It was marvelous! wonderful! world-shaking! for about 2 months before codel came out. I still wish I'd written a paper on it. We also added drop head to sfq but it proved unstable and could starve, fixed by fq_codel's once' around the empty flows approach to garuntee service to fatter flows.

fq_codel is a derivative of DRR (byte based) rather than SFQ (packet based), because in general it seems to be more fair, and "SQF" was patented. By default it scales from 4mbit to 40Gbit without tuning, although openwrt is trimming it down to 4MB where in an age of 2.5 and 10gbit it should be at least double that. (for x86 platforms its set to 32MB)

To this day I do not understand how hfsc works and haver never seen a benchmark on it that made a measurable difference vs htb. I long for someone to prove to me why hfsc feels better to some.

HTB and CAKE are easier to reason about. HFSC was also very crashy, which is why we dropped it from SQM back in 2015, although that bug was fixed a few years later.

Anyone who says SFQ is better than fq_codel is not measuring the right things. Most people don't try to measure rtt-fairness, for example, when you have one high rtt flow vs one short one, where the fq and aqm interact. There are others like page load time, interactions with voip and videoconferencing traffic, worth testing when trialing a new qdisc - especially emulating real rtts! At short rtts everything looks ok....

Aside from adding an orgy of additional complexity and subtle new bugs, I am not sure why y'all are fiddling with this?

3 Likes

This thread was ongoing as more of a round-a-bout educational thread.

Here's a simple hfsc_calculator.py script for someone to test...
(final)

def calculate_bandwidth_allocations(total_bandwidth, proportions):
    """
    Calculate the bandwidth allocations for each class based on proportions.
    """
    return {
        cls: int(total_bandwidth * proportion)
        for cls, proportion in proportions.items()
    }

def generate_tc_commands(download_bw_kbit, upload_bw_kbit, tickrate):
    """
    Generates HFSC `tc` configuration commands with symmetrical filtering
    for egress (eth1) and ingress (ifb4eth1).
    """
    # Bandwidth proportions for classes
    proportions = {
        'Prio': 0.3,    # 30% of total bandwidth
        'Normal': 0.5,  # 50% of total bandwidth
        'Bulk': 0.2     # 20% of total bandwidth
    }

    # Calculate bandwidth allocations
    upload_allocations = calculate_bandwidth_allocations(upload_bw_kbit, proportions)
    download_allocations = calculate_bandwidth_allocations(download_bw_kbit, proportions)

    # Class IDs
    class_ids = {'Prio': '10', 'Normal': '20', 'Bulk': '30'}

    # Tick interval in milliseconds
    tick_interval = 1000 / tickrate

    # Start generating commands
    commands = [
        "# HFSC Setup Configuration",
        f"# Total Upload Bandwidth: {upload_bw_kbit} kbit",
        f"# Total Download Bandwidth: {download_bw_kbit} kbit",
        f"# Tick Rate: {tickrate} Hz (Tick Interval: {tick_interval:.2f} ms)"
    ]

    # --------------------------------------------------
    # 1. Egress (Upload) Configuration on eth1
    # --------------------------------------------------
    commands.append("\n# Upload Configuration (egress on eth1)")
    # default = 20 means unclassified traffic → class 1:20
    commands.append("tc qdisc add dev eth1 root handle 1: hfsc default 20")
    commands.append(
        f"    tc class add dev eth1 parent 1: classid 1:1 hfsc ls m2 {upload_bw_kbit}kbit ul m2 {upload_bw_kbit}kbit"
    )

    for cls, bw in upload_allocations.items():
        if cls == "Prio":
            # Prio remains real-time
            commands.append(
                f"    tc class add dev eth1 parent 1:1 classid 1:{class_ids[cls]} "
                f"hfsc rt m1 {bw}kbit d {int(tick_interval)}ms m2 {bw}kbit"
            )
        else:
            # Normal and Bulk are both ls
            commands.append(
                f"    tc class add dev eth1 parent 1:1 classid 1:{class_ids[cls]} "
                f"hfsc ls m2 {bw}kbit ul m2 {upload_bw_kbit}kbit"
            )
        commands.append(
            f"    tc qdisc add dev eth1 parent 1:{class_ids[cls]} handle {class_ids[cls]}: fq_codel"
        )

    # --------------------------------------------------
    # 2. Ingress (Download) Configuration on ifb4eth1
    # --------------------------------------------------
    commands.append("\n# Download Configuration (ingress on ifb4eth1)")
    # default = 20 means unclassified traffic → class 1:20
    commands.append("tc qdisc add dev ifb4eth1 root handle 1: hfsc default 20")
    commands.append(
        f"    tc class add dev ifb4eth1 parent 1: classid 1:1 hfsc ls m2 {download_bw_kbit}kbit ul m2 {download_bw_kbit}kbit"
    )

    for cls, bw in download_allocations.items():
        if cls == "Prio":
            # Prio remains real-time
            commands.append(
                f"    tc class add dev ifb4eth1 parent 1:1 classid 1:{class_ids[cls]} "
                f"hfsc rt m1 {bw}kbit d {int(tick_interval)}ms m2 {bw}kbit"
            )
        else:
            # Normal and Bulk are both ls
            commands.append(
                f"    tc class add dev ifb4eth1 parent 1:1 classid 1:{class_ids[cls]} "
                f"hfsc ls m2 {bw}kbit ul m2 {download_bw_kbit}kbit"
            )
        commands.append(
            f"    tc qdisc add dev ifb4eth1 parent 1:{class_ids[cls]} handle {class_ids[cls]}: fq_codel"
        )

    # --------------------------------------------------
    # 3. Ingress Catch-All Filter (redirect inbound on eth1 -> ifb4eth1)
    # --------------------------------------------------
    commands.append("\n# Ingress Catch-All Filter for eth1")
    commands.append("tc qdisc add dev eth1 ingress")
    commands.append(
        "    tc filter add dev eth1 parent ffff: protocol all pref 10 u32 match u32 0 0 action mirred egress redirect dev ifb4eth1"
    )

    # --------------------------------------------------
    # 4. Egress Filters on eth1
    # --------------------------------------------------
    commands.append("\n# Egress Filters (Assigning Outbound Traffic to Classes)")
    commands.append(
        "    tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 match ip dport 5060 0xffff flowid 1:10"
    )
    # The following line is removed, because default=20 catches everything else
    # commands.append(
    #     "    tc filter add dev eth1 protocol all parent 1:0 prio 2 flowid 1:20"
    # )
    commands.append(
        "    tc filter add dev eth1 protocol ip parent 1:0 prio 3 u32 match ip dport 8080 0xffff flowid 1:30"
    )

    # --------------------------------------------------
    # 5. Inbound Filters on ifb4eth1
    # --------------------------------------------------
    commands.append("\n# Ingress Filters (Assigning Inbound Traffic to Classes)")
    commands.append(
        "    tc filter add dev ifb4eth1 protocol ip parent 1:0 prio 1 u32 match ip sport 5060 0xffff flowid 1:10"
    )
    # Remove the default line for 20, since default=20 does this
    # commands.append(
    #     "    tc filter add dev ifb4eth1 protocol all parent 1:0 prio 2 flowid 1:20"
    # )
    commands.append(
        "    tc filter add dev ifb4eth1 protocol ip parent 1:0 prio 3 u32 match ip sport 8080 0xffff flowid 1:30"
    )

    return "\n".join(commands)

def generate_ascii_hierarchy(download_bw_kbit, upload_bw_kbit, tickrate):
    """
    Generates an ASCII hierarchy representation of the HFSC classes with integrated commands.
    """
    hierarchy = [
        "HFSC Configuration Hierarchy",
        "==============================\n",
        "Upload (eth1)",
        "├── Parent Class (1:1) [LS, 100% Bandwidth]",
        "│   ├── Prio Class (1:10) [30% Bandwidth, RT]",
        "│   │   └── fq_codel applied",
        "│   ├── Normal Class (1:20) [50% Bandwidth, LS, Default]",
        "│   │   └── fq_codel applied",
        "│   └── Bulk Class (1:30) [20% Bandwidth, LS]",
        "│       └── fq_codel applied\n",
        "Download (ifb4eth1)",
        "├── Parent Class (1:1) [LS, 100% Bandwidth]",
        "│   ├── Prio Class (1:10) [30% Bandwidth, RT]",
        "│   │   └── fq_codel applied",
        "│   ├── Normal Class (1:20) [50% Bandwidth, LS, Default]",
        "│   │   └── fq_codel applied",
        "│   └── Bulk Class (1:30) [20% Bandwidth, LS]",
        "│       └── fq_codel applied\n",
        "Ingress Catch-All Filter for eth1",
        "└── Redirect to ifb4eth1",
        "\nEgress Filters (outbound) on eth1:",
        " ├── Port 5060 => Prio",
        " ├── Port 8080 => Bulk",
        " └── Others => Default=Normal (1:20)",
        "\nIngress Filters (inbound) on ifb4eth1:",
        " ├── Source Port 5060 => Prio",
        " ├── Source Port 8080 => Bulk",
        " └── Others => Default=Normal (1:20)"
    ]
    return "\n".join(hierarchy)

def construct_filename(download_bw_kbit, upload_bw_kbit, tickrate):
    """
    Constructs a filename based on the input parameters.
    """
    return f"HFSC_D{download_bw_kbit}_U{upload_bw_kbit}_T{tickrate}.txt"

def main():
    print("This script calculates HFSC parameters and generates `tc` commands.")
    print("It does NOT apply configurations directly. Use the output commands manually.\n")

    try:
        download_bw_kbit = int(input("Enter the total download bandwidth (in kbit): ").strip())
        upload_bw_kbit = int(input("Enter the total upload bandwidth (in kbit): ").strip())
        tickrate = int(input("Enter the tick rate (in Hz): ").strip())
    except ValueError:
        print("Invalid input. Please enter numeric values for bandwidth and tick rate.")
        return

    if download_bw_kbit <= 0 or upload_bw_kbit <= 0 or tickrate <= 0:
        print("Bandwidth and tick rate must be positive integers.")
        return

    commands = generate_tc_commands(download_bw_kbit, upload_bw_kbit, tickrate)
    hierarchy = generate_ascii_hierarchy(download_bw_kbit, upload_bw_kbit, tickrate)
    filename = construct_filename(download_bw_kbit, upload_bw_kbit, tickrate)

    try:
        with open(filename, "w", encoding="utf-8") as file:
            file.write(hierarchy + "\n\n" + commands)
        print(f"\nCommands and hierarchy successfully saved to {filename}.")
    except IOError as e:
        print(f"Failed to save file: {e}")
        return

    print("\n" + hierarchy + "\n\n" + commands)

if __name__ == "__main__":
    main()

On the default OpenWRT kernel (where each scheduling tick is 10ms), the smallest truly meaningful would be d 10ms. You can technically configure a smaller value, but the actual HFSC transition can only happen at the 10 ms boundaries. So if you want a practical minimum that the kernel will enforce accurately, 10ms is it.

I'm sorry if I came across as a bit testy. CAKE schedules it's tiers on a ns basis. I no longer remember how htb or tbf do it.

I keep hoping we will find a way to make cake work better on multicores. https://docs.google.com/document/d/1tTYBPeaRdCO9AGTGQCpoiuLORQzN_bG3TAkEolJPh28/edit?tab=t.0

1 Like

Oh no problem Dave, no offense taken.

Because I wanted to feel like the cool bloat kids...waverform csv to streamlit graph

2 Likes

Messing around with AI to see what it could do with dynamic_queue_limits.c

// SPDX-License-Identifier: GPL-2.0
/*
 * Dynamic byte queue limits.  See include/linux/dynamic_queue_limits.h
 *
 * Copyright (c) 2011, Tom Herbert <therbert@google.com>
 */
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/jiffies.h>
#include <linux/dynamic_queue_limits.h>
#include <linux/compiler.h>
#include <linux/export.h>
#include <trace/events/napi.h>

#define POSDIFF(A, B) ((int)((A) - (B)) > 0 ? (A) - (B) : 0)
#define AFTER_EQ(A, B) ((int)((A) - (B)) >= 0)

/* Window sizes for rate tracking */
#define SHORT_WINDOW_SIZE 8
#define LONG_WINDOW_SIZE 32

/* AIMD parameters */
#define INCREASE_FACTOR 1.25
#define DECREASE_FACTOR 0.8

/* RTT tracking */
struct rate_stats
{
    u64 timestamps[LONG_WINDOW_SIZE];
    u32 bytes[LONG_WINDOW_SIZE];
    u32 head;
};

/* Extended DQL structure */
struct dql
{
    /* Original fields */
    unsigned int limit;
    unsigned int min_limit;
    unsigned int max_limit;
    unsigned int num_queued;
    unsigned int num_completed;
    unsigned int last_obj_cnt;
    unsigned int prev_num_queued;
    unsigned int prev_last_obj_cnt;
    unsigned int prev_ovlimit;
    unsigned int lowest_slack;
    unsigned int adj_limit;
    unsigned long slack_start_time;
    unsigned short slack_hold_time;
    unsigned short stall_thrs;

    /* New rate/RTT tracking fields */
    u64 base_rtt;
    u64 last_rtt;
    u32 bandwidth_est;
    u32 completion_rate;
    struct rate_stats short_window;
    struct rate_stats long_window;

    /* Original stall detection fields */
    unsigned int stall_cnt;
    unsigned short stall_max;
    unsigned long last_reap;
    unsigned long history_head;
    unsigned long history[DQL_HIST_LEN];
};

static void dql_check_stall(struct dql *dql, unsigned short stall_thrs)
{
    unsigned long now;

    if (!stall_thrs)
        return;

    now = jiffies;
    /* Check for a potential stall */
    if (time_after_eq(now, dql->last_reap + stall_thrs))
    {
        unsigned long hist_head, t, start, end;

        /* We are trying to detect a period of at least @stall_thrs
         * jiffies without any Tx completions, but during first half
         * of which some Tx was posted.
         */
    dqs_again:
        hist_head = READ_ONCE(dql->history_head);
        /* pairs with smp_wmb() in dql_queued() */
        smp_rmb();

        /* Get the previous entry in the ring buffer, which is the
         * oldest sample.
         */
        start = (hist_head - DQL_HIST_LEN + 1) * BITS_PER_LONG;

        /* Advance start to continue from the last reap time */
        if (time_before(start, dql->last_reap + 1))
            start = dql->last_reap + 1;

        /* Newest sample we should have already seen a completion for */
        end = hist_head * BITS_PER_LONG + (BITS_PER_LONG - 1);

        /* Shrink the search space to [start, (now - start_thrs/2)] if
         * `end` is beyond the stall zone
         */
        if (time_before(now, end + stall_thrs / 2))
            end = now - stall_thrs / 2;

        /* Search for the queued time in [t, end] */
        for (t = start; time_before_eq(t, end); t++)
            if (test_bit(t % (DQL_HIST_LEN * BITS_PER_LONG),
                         dql->history))
                break;

        /* Variable t contains the time of the queue */
        if (!time_before_eq(t, end))
            goto no_stall;

        /* The ring buffer was modified in the meantime, retry */
        if (hist_head != READ_ONCE(dql->history_head))
            goto dqs_again;

        dql->stall_cnt++;
        dql->stall_max = max_t(unsigned short, dql->stall_max, now - t);

        trace_dql_stall_detected(dql->stall_thrs, now - t,
                                 dql->last_reap, dql->history_head,
                                 now, dql->history);
    }
no_stall:
    dql->last_reap = now;
}

/* Records completed count and recalculates the queue limit */
void dql_completed(struct dql *dql, unsigned int count)
{
    unsigned int completed, num_queued;
    u64 now = ktime_get_ns();

    num_queued = READ_ONCE(dql->num_queued);

    /* Can't complete more than what's in queue */
    BUG_ON(count > num_queued - dql->num_completed);

    completed = dql->num_completed + count;

    /* Update rate tracking with completion data */
    dql_update_rates(dql, count, now);

    /* Check for stalls using rate-based detection */
    if (dql_is_stalled(dql, now))
    {
        /* If stalled, adjust parameters more aggressively */
        dql_adjust_params(dql);
        trace_dql_stall_detected(dql->stall_thrs, now - dql->last_rtt,
                                 dql->last_reap, dql->history_head,
                                 now, dql->history);
    }
    else
    {
        /* Not stalled - normal parameter adjustment */
        u32 new_limit = dql_get_limit(dql);
        if (new_limit != dql->limit)
        {
            dql->limit = new_limit;
            /* Update tracking variables */
            dql->slack_start_time = jiffies;
            dql->lowest_slack = UINT_MAX;
        }
    }

    /* Update completion tracking */
    dql->num_completed = completed;
    dql->prev_num_queued = num_queued;
    dql->prev_last_obj_cnt = READ_ONCE(dql->last_obj_cnt);

    /* Calculate adjusted limit based on completions */
    dql->adj_limit = dql->limit + completed;

    /* Maintain stall detection history */
    dql->last_reap = jiffies;
}
EXPORT_SYMBOL(dql_completed);

void dql_reset(struct dql *dql)
{
    /* Reset original dynamic values */
    dql->limit = 0;
    dql->num_queued = 0;
    dql->num_completed = 0;
    dql->last_obj_cnt = 0;
    dql->prev_num_queued = 0;
    dql->prev_last_obj_cnt = 0;
    dql->prev_ovlimit = 0;
    dql->lowest_slack = UINT_MAX;
    dql->slack_start_time = jiffies;

    /* Reset rate tracking values */
    memset(&dql->short_window, 0, sizeof(dql->short_window));
    memset(&dql->long_window, 0, sizeof(dql->long_window));
    dql->base_rtt = 0;
    dql->last_rtt = 0;
    dql->bandwidth_est = 0;
    dql->completion_rate = 0;

    /* Reset stall detection */
    dql->last_reap = jiffies;
    dql->history_head = jiffies / BITS_PER_LONG;
    memset(dql->history, 0, sizeof(dql->history));
}
EXPORT_SYMBOL(dql_reset);

/* Update rate statistics in the given window */
static void update_rate_window(struct rate_stats *stats, u32 bytes, u64 now)
{
    stats->timestamps[stats->head] = now;
    stats->bytes[stats->head] = bytes;
    stats->head = (stats->head + 1) % LONG_WINDOW_SIZE;
}

/* Calculate average rate over the window */
static u32 get_window_rate(struct rate_stats *stats, u64 now, u32 window_size)
{
    u32 total_bytes = 0;
    u64 oldest_time = now;
    int i, count = 0;

    for (i = 0; i < window_size && i < LONG_WINDOW_SIZE; i++)
    {
        int idx = (stats->head - i - 1 + LONG_WINDOW_SIZE) % LONG_WINDOW_SIZE;
        if (stats->timestamps[idx] == 0)
            break;

        total_bytes += stats->bytes[idx];
        oldest_time = min(oldest_time, stats->timestamps[idx]);
        count++;
    }

    if (count == 0 || now <= oldest_time)
        return 0;

    return div_u64((u64)total_bytes * USEC_PER_SEC, now - oldest_time);
}

/* Update rate tracking with new completion data */
void dql_update_rates(struct dql *dql, u32 bytes, u64 now)
{
    /* Update both short and long windows */
    update_rate_window(&dql->short_window, bytes, now);
    update_rate_window(&dql->long_window, bytes, now);

    /* Calculate rates */
    dql->completion_rate = get_window_rate(&dql->short_window, now, SHORT_WINDOW_SIZE);
    dql->bandwidth_est = get_window_rate(&dql->long_window, now, LONG_WINDOW_SIZE);

    /* Update RTT tracking */
    if (dql->base_rtt == 0 || now - dql->last_rtt > USEC_PER_SEC)
    {
        if (now > dql->last_rtt)
            dql->base_rtt = now - dql->last_rtt;
        dql->last_rtt = now;
    }
}
EXPORT_SYMBOL(dql_update_rates);

/* Enhanced stall detection using rate metrics */
bool dql_is_stalled(struct dql *dql, u64 now)
{
    u32 short_rate, long_rate;

    short_rate = get_window_rate(&dql->short_window, now, SHORT_WINDOW_SIZE);
    long_rate = get_window_rate(&dql->long_window, now, LONG_WINDOW_SIZE);

    /* Consider stalled if short-term rate drops significantly below long-term rate */
    return short_rate < (long_rate >> 2); /* Less than 25% of long-term rate */
}
EXPORT_SYMBOL(dql_is_stalled);

/* Link stability classification */
enum link_state
{
    LINK_STABLE,   /* Low rate variance, predictable */
    LINK_VARIABLE, /* Moderate rate changes */
    LINK_UNSTABLE  /* High variance, needs aggressive control */
};

static enum link_state classify_link(struct dql *dql)
{
    u32 short_rate = get_window_rate(&dql->short_window, ktime_get_ns(), SHORT_WINDOW_SIZE);
    u32 long_rate = get_window_rate(&dql->long_window, ktime_get_ns(), LONG_WINDOW_SIZE);

    /* Calculate rate variance */
    if (long_rate == 0)
        return LINK_UNSTABLE;

    u32 rate_diff = abs(short_rate - long_rate);
    u32 variance = (rate_diff * 100) / long_rate;

    if (variance < 10)
        return LINK_STABLE;
    else if (variance < 25)
        return LINK_VARIABLE;
    else
        return LINK_UNSTABLE;
}

/* Calculate dynamic limit based on network conditions and link stability */
u32 dql_get_limit(struct dql *dql)
{
    u32 limit;
    enum link_state state;

    /* Base limit on bandwidth-delay product */
    if (dql->bandwidth_est && dql->base_rtt)
    {
        limit = (dql->bandwidth_est * dql->base_rtt) / USEC_PER_SEC;

        /* Adjust based on link stability */
        state = classify_link(dql);
        switch (state)
        {
        case LINK_STABLE:
            /* Allow higher limits for stable links */
            limit = limit * 12 / 10; /* 1.2x multiplier */
            break;
        case LINK_VARIABLE:
            /* Use base BDP */
            break;
        case LINK_UNSTABLE:
            /* More conservative limits */
            limit = limit * 8 / 10; /* 0.8x multiplier */
            break;
        }

        limit = max(limit, dql->min_limit);
        limit = min(limit, dql->max_limit);
    }
    else
    {
        limit = dql->limit;
    }

    return limit;
}
EXPORT_SYMBOL(dql_get_limit);

/* Adjust parameters based on network stability */
void dql_adjust_params(struct dql *dql)
{
    u32 new_limit = dql_get_limit(dql);
    enum link_state state = classify_link(dql);
    u64 now = ktime_get_ns();

    /* AIMD control with stability-based adaptation */
    if (!dql_is_stalled(dql, now))
    {
        /* Additive increase based on link stability */
        switch (state)
        {
        case LINK_STABLE:
            /* More aggressive increase for stable links */
            new_limit = (new_limit * (INCREASE_FACTOR + 0.1));
            break;
        case LINK_VARIABLE:
            /* Standard increase */
            new_limit = (new_limit * INCREASE_FACTOR);
            break;
        case LINK_UNSTABLE:
            /* Conservative increase */
            new_limit = (new_limit * (INCREASE_FACTOR - 0.1));
            break;
        }
    }
    else
    {
        /* Multiplicative decrease based on link stability */
        switch (state)
        {
        case LINK_STABLE:
            /* Gentler decrease for stable links */
            new_limit = (new_limit * (DECREASE_FACTOR + 0.1));
            break;
        case LINK_VARIABLE:
            /* Standard decrease */
            new_limit = (new_limit * DECREASE_FACTOR);
            break;
        case LINK_UNSTABLE:
            /* Aggressive decrease */
            new_limit = (new_limit * (DECREASE_FACTOR - 0.1));
            break;
        }
    }

    dql->limit = clamp(new_limit, dql->min_limit, dql->max_limit);
}
EXPORT_SYMBOL(dql_adjust_params);

void dql_init(struct dql *dql, unsigned int hold_time)
{
    dql->max_limit = DQL_MAX_LIMIT;
    dql->min_limit = 0;
    dql->slack_hold_time = hold_time;
    dql->stall_thrs = 0;

    /* Initialize rate tracking */
    memset(&dql->short_window, 0, sizeof(dql->short_window));
    memset(&dql->long_window, 0, sizeof(dql->long_window));
    dql->base_rtt = 0;
    dql->last_rtt = 0;
    dql->bandwidth_est = 0;
    dql->completion_rate = 0;

    dql_reset(dql);
}
EXPORT_SYMBOL(dql_init);