Beyond fq_codel & cake

If a NIC implements a traffic shaper in hardware and BQL or similar backpressure, and the kernel supports that no kernel changes required... the biggest cost in SQM is invariably the traffic shaper component. It is not that AQM and fq-scheduler are free, but they appear considerably less demanding.

Not sure more queues will help, as I tried to show above more queues comes with its own cost. We can argue however how many queues are enough :wink:

There has not been a halfway competitive MIPS core announced in the last decade I know of that would be suitable for a router, that decline is probably not unrelated to the slow and steady rise of ARM, but probably not caused by it alone.

Sure, one option is simply to use these devices inside their capability envelope. I had been using my MIPS wndr3700v2 for I think 2 years on a 100/40 Mbps link with the shaper set to ~49/32 as that was the maximum it could reliably traffic shape at, it turned out the network was considerably more responsive with the lowered capacity but competent scheduling and AQM than at higher un-shaped capacity but with out SQM. It would not have made much difference if a clever trick would have allowed me to shape 10 Mbps higher...

Maybe, maybe not, the mipsen had a good run, it might be time to let go at targeting new development at those :wink:

Cake offers per-internal-IP fairness which would restrict most of the fall-out to the machine running the torrent client in our example, no qosify required. Someone on the forum tested that with ~500 greedy TCP flows and reported acceptable performance on different machines....

Probably.

1 Like

Thanks! My post lacks many details, tests and diagrams. I need to set up a proper environment for testing. The goal is to use two hosts and one router for testing. That should avoid any ISP bottlenecks or any transient behaviour.

I'm not familiar with xdp and the intel NIC chips with such facilities. Perhaps you know of one such chip? I've seen RISC-V SoCs with networking implemented in FPGA. Would it be far fetched to grab one such design to add similar features to it? It's most likely non-trivial work. We still need a cheap FPGA board with two network ports.

Do you happen to know some good pseudocode implementations of any of those algorithms? I'd start looking at those. We might be able to come up with some better implementations. I can imagine there are a few people who'd want to try that, not just me.

My impression right now is that CAKE's COBALT is good enough to control flows. The DRR part of it is something which could be improved, both in terms of cache misses and fairness.

Cache misses I have nothing to say on, but what fairness deficiencies do you observe? The last time this came up it turned out that with the ingress keyword cake actually tends to equalize its ingress rate, but that can result in more unequal egress rate.

1 Like

The cache misses are certainly a factor for the MIPS devices with gigabit NICs used for Internet connections faster than 100-300 Mbps. Cake in diffserv4 mode has 4096 total queues (flows). The queue's flow struct was 72 or 80 bytes the last time I checked for 64bit architectures. Since the backlogged flows are dequeued and appended to the tail (assuming backlogged flows), this has an impact on the cache. A different implementation might avoid that.

Regarding the ingress keyword, it's something to test with and without to compare. Again, I need to prepare a proper environment to use for testing.

As for timing: Microchip is adding patches for their LAN966x to add MQPRIO. Seems more people are adding (parts of) HW QoS to their drivers. lan96xx mqpio and taprio patch

1 Like

strict priority queues are a basically broken idea. It looks like people are reimplementing ideas that didn't work all that well in 80s, because they are easy, with 8 levels of service based on the TOS bits (CS0-CS7), which was obsoleted in the late 90s....

In looking over that patch...

IEEE 802.1Qbv in itself, is less broken, but...

1 Like

There's actual code for pie and fq_codel for p4. Hit google scholar ...

1 Like

Is that because you want CAKE to be able to be used in non-private, commercial contexts?

In any case, let me know if you need any help in this area as I specialize in invalidating (albeit European) patents, and I'm entirely familiar with ascertaining what does and does not fall within the scope of a claim. Many poorly drafted patents present straightforward workaround opportunities.

1 Like

I guess if there is either admission control and/or decreasing capacity share with increasing priority one might end up with a workable system, but pure strict priority is truly harsh and unforgiving.

I wanted the fq_codel related algorithms to be shipped in hardware.

1 Like

That is why I am hoping that for some of our limited (low-end) targets we could use the WRR and rate limiting features from the hardware (Qos) and compliment that with more active software queue management.

I appreciate that SP queues are "broken" and yet, now we finally see some development to actually use those hardware features. For sure those people are much better aware of the limitation and possibilities than I am.

I am hoping they can be used sanely as well! Thing is, my own ability to carry things forward is in decline, I don't have the mental energy or financial resources I had pre-covid, and at the moment, I'm mostly just trying to pass essential knowledge and analysis along so others can take up the flag. The love, energy and enthusiasm others have for our outputs to date IS encouraging, but at one level I often think it best that others go ahead, try new things, and ignore the past, as it's the best way to learn! So many of the best things we've done came from papers that couldn't be published.

One of the most key papers for me, came from discovering the extended paper on SFQ, and vigorously implementing all that was impossible (on a 68020 processor in 1992), then observing the results: http://www2.rdrop.com/users/paulmck/scalability/paper/sfq.2002.06.04.pdf

SFQRED was a huge stepping stone towards fq_codel, that everyone's forgotten now. I wish I'd written that up... and "RED in a ifferent light", key also: http://mirrors.bufferbloat.net/~jg/RelevantPapers/Red_in_a_different_light.pdf

Hysterically (@tohojo) I just discovered we cited the wrong paper for SFQ in rfc8290! Just SQF https://perso.telecom-paristech.fr/bonald/Publications_files/BMO2011.pdf

In all things in life, I recommend a road not taken, so long as you can keep a roof over your head.

3 Likes

there are a lot of good ideas in this: https://dl.acm.org/doi/pdf/10.1145/3387514.3406591

1 Like

Also, it seems likely that many of a new generation of ethernet devices missed putting "BQL" in there. There's a mini-tutorial over here:

http://www.taht.net/~d/broadcom_aug9_2018.pdf

Your contributions have an impact. More people should be aware and appreciate that.

The SQF paper is missing many details. The presented algorithm seems to state only that the shortest flow is chosen. The paper seems to propose a packet scheduler similar to FQ which selects the shortest flow to dequeue from and the longest flow to drop from.

An implementation of SQF with codel could probably work just fine without the longest queue drop. Codel should control the queues. The only scenario when packets would be dropped outside of codel is when running out of memory, just like in fq_codel and cake. I'm not sure what the algorithm's loop would look like. Would it just go through the queues from shortest to the longest, let codel do its work, send a packet and continue?

We're not forced to implement SQF as it is described there. Buckets could be used If SQF must be avoided because of the patent. Flows would be put in buckets based on the amount of bytes they have in the backlog. This would mean that it might or it might not be the shortest flow. It'd be close to being the shortest flow.

It's difficult to understand why they'd patent this. It's difficult enough to convince people to implement proper fair schedulers with AQM and proper shaping where it makes sense.

There are a few places in the kernel where the sizes of the data structures can be reduced. A reduction of 8-16 bytes in a flow's data structure could lead to a performance improvement on embedded hardware with limited memory bandwidth and really small CPU caches. These changes could even allow a flow's struct to fit inside a single cache line.

The fq code from the kernel (net/include/fq.h and net/include/fq_imp.h) is used for mac80211. This is the struct:

struct fq_flow {
	struct fq_tin *tin;
	struct list_head flowchain;
	struct sk_buff_head queue;
	u32 backlog;
	int deficit;
};

The sk_buff_head queue wastes 8 bytes with two fields: a spin lock and a queue length (in packets). Both of these are unused by the mac80211 tx.c code and by fq_impl.h. These queues are handled only privately by mac80211 and by the fq code. They're not touched by other code. There are other spinlocks in place which are actually used.

This field could be replaced.

-       struct sk_buff_head queue;
+       struct sk_buff    *head;
+       struct sk_buff    *tail;

fq is configured to use 8192 queues for mac80211. The change would lower the total size of these flows by 64 KB.

A few macros and functions used by fq_impl.h and mac80211/tx.c need to be rewritten. __skb_dequeue, __skb_queue_tail, skb_queue_walk_safe, __sbk_unlink, skb_peek_tail. I'll try to get these changes to work properly. More changes are required.

Others may be able to come up with other improvements for parts of the kernel's code. There are people out there with more kernel hacking experience.

Perhaps adding ACK thinning code into mac80211 is worth a shot. Having it on the client side would also help. Wifi stations might see an improvement in throughput since wifi is only half-duplex.

This hack appears to work.

diff --git a/include/net/fq.h b/include/net/fq.h
index 2eccbbd..5ca3f91 100644
--- a/include/net/fq.h
+++ b/include/net/fq.h
@@ -7,6 +7,8 @@
 #ifndef __NET_SCHED_FQ_H
 #define __NET_SCHED_FQ_H
 
+#include <net/skbuff_light.h>
+
 struct fq_tin;
 
 /**
@@ -27,7 +29,7 @@ struct fq_tin;
 struct fq_flow {
 	struct fq_tin *tin;
 	struct list_head flowchain;
-	struct sk_buff_head queue;
+	struct sk_buff_light_head queue;
 	u32 backlog;
 	int deficit;
 };
diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index a5f67a2..1ba0f48 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -51,7 +51,7 @@ static struct sk_buff *fq_flow_dequeue(struct fq *fq,
 
 	lockdep_assert_held(&fq->lock);
 
-	skb = __skb_dequeue(&flow->queue);
+	skb = __skb_light_dequeue(&flow->queue);
 	if (!skb)
 		return NULL;
 
@@ -66,13 +66,12 @@ static int fq_flow_drop(struct fq *fq, struct fq_flow *flow,
 	unsigned int packets = 0, bytes = 0, truesize = 0;
 	struct fq_tin *tin = flow->tin;
 	struct sk_buff *skb;
-	int pending;
+	u32 pending;
 
 	lockdep_assert_held(&fq->lock);
-
-	pending = min_t(int, 32, skb_queue_len(&flow->queue) / 2);
+	pending = min_t(u32, 32*1514, flow->backlog >> 1);
 	do {
-		skb = __skb_dequeue(&flow->queue);
+		skb = __skb_light_dequeue(&flow->queue);
 		if (!skb)
 			break;
 
@@ -80,7 +79,7 @@ static int fq_flow_drop(struct fq *fq, struct fq_flow *flow,
 		bytes += skb->len;
 		truesize += skb->truesize;
 		free_func(fq, tin, flow, skb);
-	} while (packets < pending);
+	} while (bytes < pending);
 
 	__fq_adjust_removal(fq, flow, packets, bytes, truesize);
 
@@ -226,7 +225,7 @@ static void fq_tin_enqueue(struct fq *fq,
 			      &tin->new_flows);
 	}
 
-	__skb_queue_tail(&flow->queue, skb);
+	__skb_light_queue_tail(&flow->queue, skb);
 	oom = (fq->memory_usage > fq->memory_limit);
 	while (fq->backlog > fq->limit || oom) {
 		flow = fq_find_fattest_flow(fq);
@@ -256,11 +255,11 @@ static void fq_flow_filter(struct fq *fq,
 
 	lockdep_assert_held(&fq->lock);
 
-	skb_queue_walk_safe(&flow->queue, skb, tmp) {
+	skb_light_queue_walk_safe(&flow->queue, skb, tmp) {
 		if (!filter_func(fq, tin, flow, skb, filter_data))
 			continue;
 
-		__skb_unlink(skb, &flow->queue);
+		__skb_light_unlink(skb, &flow->queue);
 		fq_adjust_removal(fq, flow, skb);
 		free_func(fq, tin, flow, skb);
 	}
@@ -331,7 +330,7 @@ static void fq_tin_reset(struct fq *fq,
 static void fq_flow_init(struct fq_flow *flow)
 {
 	INIT_LIST_HEAD(&flow->flowchain);
-	__skb_queue_head_init(&flow->queue);
+	__skb_light_queue_head_init(&flow->queue);
 }
 
 static void fq_tin_init(struct fq_tin *tin)
diff --git a/include/net/skbuff_light.h b/include/net/skbuff_light.h
new file mode 100644
index 0000000..4076326
--- /dev/null
+++ b/include/net/skbuff_light.h
@@ -0,0 +1,455 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ *	Definitions for the 'struct sk_buff' memory handlers.
+ *
+ *	Authors:
+ *		Alan Cox, <gw4pts@gw4pts.ampr.org>
+ *		Florian La Roche, <rzsfl@rz.uni-sb.de>
+ */
+
+#ifndef _LINUX_LIGHT_SKBUFF_H
+#define _LINUX_LIGHT_SKBUFF_H
+
+#include <linux/skbuff.h>
+
+struct sk_buff_light_head {
+	/* These two members must be first. */
+	struct sk_buff	*next;
+	struct sk_buff	*prev;
+};
+
+/**
+ *	skb_queue_empty - check if a queue is empty
+ *	@list: queue head
+ *
+ *	Returns true if the queue is empty, false otherwise.
+ */
+static inline int skb_light_queue_empty(const struct sk_buff_light_head *list)
+{
+	return list->next == (const struct sk_buff *) list;
+}
+
+/**
+ *	skb_queue_empty_lockless - check if a queue is empty
+ *	@list: queue head
+ *
+ *	Returns true if the queue is empty, false otherwise.
+ *	This variant can be used in lockless contexts.
+ */
+static inline bool skb_light_queue_empty_lockless(const struct sk_buff_light_head *list)
+{
+	return READ_ONCE(list->next) == (const struct sk_buff *) list;
+}
+
+
+/**
+ *	skb_queue_is_last - check if skb is the last entry in the queue
+ *	@list: queue head
+ *	@skb: buffer
+ *
+ *	Returns true if @skb is the last buffer on the list.
+ */
+static inline bool skb_light_queue_is_last(const struct sk_buff_light_head *list,
+				     const struct sk_buff *skb)
+{
+	return skb->next == (const struct sk_buff *) list;
+}
+
+/**
+ *	skb_queue_is_first - check if skb is the first entry in the queue
+ *	@list: queue head
+ *	@skb: buffer
+ *
+ *	Returns true if @skb is the first buffer on the list.
+ */
+static inline bool skb_light_queue_is_first(const struct sk_buff_light_head *list,
+				      const struct sk_buff *skb)
+{
+	return skb->prev == (const struct sk_buff *) list;
+}
+
+/**
+ *	skb_queue_next - return the next packet in the queue
+ *	@list: queue head
+ *	@skb: current buffer
+ *
+ *	Return the next packet in @list after @skb.  It is only valid to
+ *	call this if skb_queue_is_last() evaluates to false.
+ */
+static inline struct sk_buff *skb_light_queue_next(const struct sk_buff_light_head *list,
+					     const struct sk_buff *skb)
+{
+	/* This BUG_ON may seem severe, but if we just return then we
+	 * are going to dereference garbage.
+	 */
+	BUG_ON(skb_light_queue_is_last(list, skb));
+	return skb->next;
+}
+
+/**
+ *	skb_queue_prev - return the prev packet in the queue
+ *	@list: queue head
+ *	@skb: current buffer
+ *
+ *	Return the prev packet in @list before @skb.  It is only valid to
+ *	call this if skb_queue_is_first() evaluates to false.
+ */
+static inline struct sk_buff *skb_light_queue_prev(const struct sk_buff_light_head *list,
+					     const struct sk_buff *skb)
+{
+	/* This BUG_ON may seem severe, but if we just return then we
+	 * are going to dereference garbage.
+	 */
+	BUG_ON(skb_light_queue_is_first(list, skb));
+	return skb->prev;
+}
+
+/**
+ *	skb_peek - peek at the head of an &sk_buff_head
+ *	@list_: list to peek at
+ *
+ *	Peek an &sk_buff. Unlike most other operations you _MUST_
+ *	be careful with this one. A peek leaves the buffer on the
+ *	list and someone else may run off with it. You must hold
+ *	the appropriate locks or have a private queue to do this.
+ *
+ *	Returns %NULL for an empty list or a pointer to the head element.
+ *	The reference count is not incremented and the reference is therefore
+ *	volatile. Use with caution.
+ */
+static inline struct sk_buff *skb_light_peek(const struct sk_buff_light_head *list_)
+{
+	struct sk_buff *skb = list_->next;
+
+	if (skb == (struct sk_buff *)list_)
+		skb = NULL;
+	return skb;
+}
+
+/**
+ *	__skb_peek - peek at the head of a non-empty &sk_buff_head
+ *	@list_: list to peek at
+ *
+ *	Like skb_peek(), but the caller knows that the list is not empty.
+ */
+static inline struct sk_buff *__skb_light_peek(const struct sk_buff_light_head *list_)
+{
+	return list_->next;
+}
+
+/**
+ *	skb_peek_next - peek skb following the given one from a queue
+ *	@skb: skb to start from
+ *	@list_: list to peek at
+ *
+ *	Returns %NULL when the end of the list is met or a pointer to the
+ *	next element. The reference count is not incremented and the
+ *	reference is therefore volatile. Use with caution.
+ */
+static inline struct sk_buff *skb_light_peek_next(struct sk_buff *skb,
+		const struct sk_buff_light_head *list_)
+{
+	struct sk_buff *next = skb->next;
+
+	if (next == (struct sk_buff *)list_)
+		next = NULL;
+	return next;
+}
+
+/**
+ *	skb_peek_tail - peek at the tail of an &sk_buff_head
+ *	@list_: list to peek at
+ *
+ *	Peek an &sk_buff. Unlike most other operations you _MUST_
+ *	be careful with this one. A peek leaves the buffer on the
+ *	list and someone else may run off with it. You must hold
+ *	the appropriate locks or have a private queue to do this.
+ *
+ *	Returns %NULL for an empty list or a pointer to the tail element.
+ *	The reference count is not incremented and the reference is therefore
+ *	volatile. Use with caution.
+ */
+static inline struct sk_buff *skb_light_peek_tail(const struct sk_buff_light_head *list_)
+{
+	struct sk_buff *skb = READ_ONCE(list_->prev);
+
+	if (skb == (struct sk_buff *)list_)
+		skb = NULL;
+	return skb;
+
+}
+
+/**
+ *	__skb_queue_head_init - initialize non-spinlock portions of sk_buff_head
+ *	@list: queue to initialize
+ *
+ *	This initializes only the list and queue length aspects of
+ *	an sk_buff_head object.  This allows to initialize the list
+ *	aspects of an sk_buff_head without reinitializing things like
+ *	the spinlock.  It can also be used for on-stack sk_buff_head
+ *	objects where the spinlock is known to not be used.
+ */
+static inline void __skb_light_queue_head_init(struct sk_buff_light_head *list)
+{
+	list->prev = list->next = (struct sk_buff *)list;
+}
+
+/*
+ * This function creates a split out lock class for each invocation;
+ * this is needed for now since a whole lot of users of the skb-queue
+ * infrastructure in drivers have different locking usage (in hardirq)
+ * than the networking core (in softirq only). In the long run either the
+ * network layer or drivers should need annotation to consolidate the
+ * main types of usage into 3 classes.
+ */
+static inline void skb_light_queue_head_init(struct sk_buff_light_head *list)
+{
+	__skb_light_queue_head_init(list);
+}
+
+static inline void skb_light_queue_head_init_class(struct sk_buff_light_head *list,
+		struct lock_class_key *class)
+{
+	skb_light_queue_head_init(list);
+}
+
+/*
+ *	Insert an sk_buff on a list.
+ *
+ *	The "__skb_xxxx()" functions are the non-atomic ones that
+ *	can only be called with interrupts disabled.
+ */
+static inline void __skb_light_insert(struct sk_buff *newsk,
+				struct sk_buff *prev, struct sk_buff *next,
+				struct sk_buff_light_head *list)
+{
+	/* See skb_queue_empty_lockless() and skb_peek_tail()
+	 * for the opposite READ_ONCE()
+	 */
+	WRITE_ONCE(newsk->next, next);
+	WRITE_ONCE(newsk->prev, prev);
+	WRITE_ONCE(next->prev, newsk);
+	WRITE_ONCE(prev->next, newsk);
+}
+
+static inline void __skb_light_queue_splice(const struct sk_buff_light_head *list,
+				      struct sk_buff *prev,
+				      struct sk_buff *next)
+{
+	struct sk_buff *first = list->next;
+	struct sk_buff *last = list->prev;
+
+	WRITE_ONCE(first->prev, prev);
+	WRITE_ONCE(prev->next, first);
+
+	WRITE_ONCE(last->next, next);
+	WRITE_ONCE(next->prev, last);
+}
+
+/**
+ *	skb_queue_splice - join two skb lists, this is designed for stacks
+ *	@list: the new list to add
+ *	@head: the place to add it in the first list
+ */
+static inline void skb_light_queue_splice(const struct sk_buff_light_head *list,
+				    struct sk_buff_head *head)
+{
+	if (!skb_light_queue_empty(list)) {
+		__skb_light_queue_splice(list, (struct sk_buff *) head, head->next);
+	}
+}
+
+/**
+ *	skb_queue_splice_init - join two skb lists and reinitialise the emptied list
+ *	@list: the new list to add
+ *	@head: the place to add it in the first list
+ *
+ *	The list at @list is reinitialised
+ */
+static inline void skb_light_queue_splice_init(struct sk_buff_light_head *list,
+					 struct sk_buff_head *head)
+{
+	if (!skb_light_queue_empty(list)) {
+		__skb_light_queue_splice(list, (struct sk_buff *) head, head->next);
+		__skb_light_queue_head_init(list);
+	}
+}
+
+/**
+ *	skb_queue_splice_tail - join two skb lists, each list being a queue
+ *	@list: the new list to add
+ *	@head: the place to add it in the first list
+ */
+static inline void skb_light_queue_splice_tail(const struct sk_buff_light_head *list,
+					 struct sk_buff_head *head)
+{
+	if (!skb_light_queue_empty(list)) {
+		__skb_light_queue_splice(list, head->prev, (struct sk_buff *) head);
+	}
+}
+
+/**
+ *	skb_queue_splice_tail_init - join two skb lists and reinitialise the emptied list
+ *	@list: the new list to add
+ *	@head: the place to add it in the first list
+ *
+ *	Each of the lists is a queue.
+ *	The list at @list is reinitialised
+ */
+static inline void skb_light_queue_splice_tail_init(struct sk_buff_light_head *list,
+					      struct sk_buff_light_head *head)
+{
+	if (!skb_light_queue_empty(list)) {
+		__skb_light_queue_splice(list, head->prev, (struct sk_buff *) head);
+		__skb_light_queue_head_init(list);
+	}
+}
+
+/**
+ *	__skb_queue_after - queue a buffer at the list head
+ *	@list: list to use
+ *	@prev: place after this buffer
+ *	@newsk: buffer to queue
+ *
+ *	Queue a buffer int the middle of a list. This function takes no locks
+ *	and you must therefore hold required locks before calling it.
+ *
+ *	A buffer cannot be placed on two lists at the same time.
+ */
+static inline void __skb_light_queue_after(struct sk_buff_light_head *list,
+				     struct sk_buff *prev,
+				     struct sk_buff *newsk)
+{
+	__skb_light_insert(newsk, prev, prev->next, list);
+}
+
+void skb_light_append(struct sk_buff *old, struct sk_buff *newsk,
+		struct sk_buff_light_head *list);
+
+static inline void __skb_light_queue_before(struct sk_buff_light_head *list,
+				      struct sk_buff *next,
+				      struct sk_buff *newsk)
+{
+	__skb_light_insert(newsk, next->prev, next, list);
+}
+
+/**
+ *	__skb_queue_head - queue a buffer at the list head
+ *	@list: list to use
+ *	@newsk: buffer to queue
+ *
+ *	Queue a buffer at the start of a list. This function takes no locks
+ *	and you must therefore hold required locks before calling it.
+ *
+ *	A buffer cannot be placed on two lists at the same time.
+ */
+static inline void __skb_light_queue_head(struct sk_buff_light_head *list,
+				    struct sk_buff *newsk)
+{
+	__skb_light_queue_after(list, (struct sk_buff *)list, newsk);
+}
+void skb_light_queue_head(struct sk_buff_light_head *list, struct sk_buff *newsk);
+
+/**
+ *	__skb_queue_tail - queue a buffer at the list tail
+ *	@list: list to use
+ *	@newsk: buffer to queue
+ *
+ *	Queue a buffer at the end of a list. This function takes no locks
+ *	and you must therefore hold required locks before calling it.
+ *
+ *	A buffer cannot be placed on two lists at the same time.
+ */
+static inline void __skb_light_queue_tail(struct sk_buff_light_head *list,
+				   struct sk_buff *newsk)
+{
+	__skb_light_queue_before(list, (struct sk_buff *)list, newsk);
+}
+void skb_light_queue_tail(struct sk_buff_light_head *list, struct sk_buff *newsk);
+
+/*
+ * remove sk_buff from list. _Must_ be called atomically, and with
+ * the list known..
+ */
+void skb_light_unlink(struct sk_buff *skb, struct sk_buff_light_head *list);
+static inline void __skb_light_unlink(struct sk_buff *skb, struct sk_buff_light_head *list)
+{
+	struct sk_buff *next, *prev;
+
+	next	   = skb->next;
+	prev	   = skb->prev;
+	skb->next  = skb->prev = NULL;
+	WRITE_ONCE(next->prev, prev);
+	WRITE_ONCE(prev->next, next);
+}
+
+/**
+ *	__skb_dequeue - remove from the head of the queue
+ *	@list: list to dequeue from
+ *
+ *	Remove the head of the list. This function does not take any locks
+ *	so must be used with appropriate locks held only. The head item is
+ *	returned or %NULL if the list is empty.
+ */
+static inline struct sk_buff *__skb_light_dequeue(struct sk_buff_light_head *list)
+{
+	struct sk_buff *skb = skb_light_peek(list);
+	if (skb)
+		__skb_light_unlink(skb, list);
+	return skb;
+}
+struct sk_buff *skb_light_dequeue(struct sk_buff_light_head *list);
+
+/**
+ *	__skb_dequeue_tail - remove from the tail of the queue
+ *	@list: list to dequeue from
+ *
+ *	Remove the tail of the list. This function does not take any locks
+ *	so must be used with appropriate locks held only. The tail item is
+ *	returned or %NULL if the list is empty.
+ */
+static inline struct sk_buff *__skb_light_dequeue_tail(struct sk_buff_light_head *list)
+{
+	struct sk_buff *skb = skb_light_peek_tail(list);
+	if (skb)
+		__skb_light_unlink(skb, list);
+	return skb;
+}
+struct sk_buff *skb_light_dequeue_tail(struct sk_buff_light_head *list);
+
+#define skb_light_queue_walk(queue, skb) \
+		for (skb = (queue)->next;					\
+		     skb != (struct sk_buff *)(queue);				\
+		     skb = skb->next)
+
+#define skb_light_queue_walk_safe(queue, skb, tmp)					\
+		for (skb = (queue)->next, tmp = skb->next;			\
+		     skb != (struct sk_buff *)(queue);				\
+		     skb = tmp, tmp = skb->next)
+
+#define skb_light_queue_walk_from(queue, skb)						\
+		for (; skb != (struct sk_buff *)(queue);			\
+		     skb = skb->next)
+
+#define skb_light_queue_walk_from_safe(queue, skb, tmp)				\
+		for (tmp = skb->next;						\
+		     skb != (struct sk_buff *)(queue);				\
+		     skb = tmp, tmp = skb->next)
+
+#define skb_light_queue_reverse_walk(queue, skb) \
+		for (skb = (queue)->prev;					\
+		     skb != (struct sk_buff *)(queue);				\
+		     skb = skb->prev)
+
+#define skb_light_queue_reverse_walk_safe(queue, skb, tmp)				\
+		for (skb = (queue)->prev, tmp = skb->prev;			\
+		     skb != (struct sk_buff *)(queue);				\
+		     skb = tmp, tmp = skb->prev)
+
+#define skb_light_queue_reverse_walk_from_safe(queue, skb, tmp)			\
+		for (tmp = skb->prev;						\
+		     skb != (struct sk_buff *)(queue);				\
+		     skb = tmp, tmp = skb->prev)
+
+
+#endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 50025ff..7f52580 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3347,7 +3347,7 @@ static bool ieee80211_amsdu_aggregate(struct ieee80211_sub_if_data *sdata,
 
 	tin = &txqi->tin;
 	flow = fq_flow_classify(fq, tin, flow_idx, skb);
-	head = skb_peek_tail(&flow->queue);
+	head = skb_light_peek_tail(&flow->queue);
 	if (!head || skb_is_gso(head))
 		goto out;
 

I've done some basic tests with https://fast.com. The latency under load was around 16-18ms with this patch on the the 802.11ac 5Ghz radio of an Archer C7. The router as an AP has been stable so far with these changes.

It might be a good idea to make this scheduler an O(1) scheduler with some changes.

I've tested fq_pie to see how it behaves on the upload side (on the eth0 interface of the router in AP mode, not on the wan port). The upload barely reached 100 mbps. The default fq_codel allowed the AP to go as high as 300 mbps on the upload side.

I ran your skb_light idea past eric dumazet. His reply:

Just as a reminder David Miller added in 'struct sk_buff'  a 'struct
list_head        list;'  in 2018 :/

So we can already use standard list operators (from include/linux/list.h)
No need to add yet another set of helpers (skb_light...)

Simply use 'struct list_head' for this kind of stuff.

elsewhere in the openwrt universe there are some patches floating about to try to halve that. reducing NAPI_POLL_WAIT, killing interrupt coaelescing, using shorter TXOPs.

Also I am a big fan of flent's rrul, rtt_fair, tcp_nup, tcp_ndown, and square wave tests. Ideally these run on a local server, but I do maintain a cloud you can target for tests:
de,london,ontario,atlanta,newark,fremont,dallas,singapore,mumbai .starlink.taht.net

I am glad to hear of someone testing fq-pie at other rates than what the original developers tested it at. I too found it have difficultly scaling past 100mbit properly, and against wildly variable wifi rates... and the principal reasons why I worked on it (algorithmic diversity, O(1) on dequeue) have faded over time. There are a large group of people dead set on using pie no matter what, and more power to 'em....

I am not sure what you mean by O(1) for fq_codel. Codel itself is not O(1) on dequeue, as it can drop tons of packets before delivering one. The approach I took for a hardware implementation with strict deadlines
was to limit the loops.

In fact the ECN code in codel, in my option, is wrong, as it should
start dropping even in ecn mode in the rtt_seeking portion of the code.

others disagree.

There are a few other ideas in fq_codel_fast... new_flows was a useless stat, I felt there was no need to track both a memory and a per packet limit, just memory was enough...

From RFC 8289:
"Control theory provides a wealth of approaches to the design of
control loops. Most of classical control theory deals with the
control of linear, time-invariant, Single-Input-Single-Output (SISO)
systems. Control loops for these systems generally come from a well-
understood class known as Proportional-Integral-Derivative (PID)
controllers. Unfortunately, a queue is not a linear system, and an
AQM operates at the point of maximum non-linearity (where the output
link bandwidth saturates, so increased demand creates delay rather
than higher utilization).
Output queues are also not time invariant
since traffic is generally a mix of connections that start and stop
at arbitrary times and that can have radically different behaviors
ranging from "open-loop" UDP audio/video to "closed-loop" congestion-
avoiding TCP. Finally, the constantly changing mix of connections
(which can't be converted to a single "lumped parameter" model
because of their transfer function differences) makes the system
Multi-Input-Multi-Output (MIMO), not SISO."

Just saying... Not sure why PIE still has as many hard-core supporters as it has... (sure a linear approximation often goes a long way even for non-linear problems, but I smell a hefty portion of NIH syndrome).

1 Like