Yeah, I think we could show early on that "pinning" large amounts of kernel memory with say a source port randomizing UDP attack will quickly drive a router into OOM, and with 64 MB ram, even 4 MB was on the high side.... 
Let's run the numbers for LAN traffic? Say at 4 ms RTT (to allow for WiFi in both directions, or would that be 8ms?) the BDP for 1 Gbps would be:
(1000^3) * (4 / 1000) / (1024^2) = 3.81 MB
Which appears to be roughly in the ballpark, except our 4MB limit limits true-size with actually queued data being closer to 50% of truesize*, so for single stream throughput 4 MB could already limit TCP transfers inside the LAN (at least for a classic Reno TCP). According to Appenzeller et al. for a router we might get away with dividing by square-root of N (N being the number of parallel flows), but for our LAN transfer that only helps of we say good-by to single flow throughput (but from say 4 parallel flows on (sqrt(4)=2
countering the true_size to queued data difference) we should be good even for Reno, no?).
But by the same logic 4 MB for WAN traffic is likely too tight, for maximizing single flow throughput over a 100ms internet-scale connection we would need:
(1000^3) * (100 / 1000) / (1024^2) = 95.36 MB
which seems excessive... indicating that the formulation probably is too sloppy/approximate.
The fun thing with cake is that we can actually see the instantaneous size used... (I have only a ~105/36 Mbps link and even with the default 4 MB I see full saturation for the typical short RTTs of speedtest servers, but I note almost all speedtests by now default to using a few parallel flows)
Doing a single-flow speedtest (speedtest.net) against a server in Ashburn VA I see nothing getting close to the expected throughput (but I have little insight about my ISPs peering with those speedtest.net-nodes in Ashburn... and this is with cake configured with memlimit 32Mb
.
*) This is however a rough approximation only, as I do not want to dive into the kernel, I would guess that SKBs are power of two, so for a 1514 ethernet packet, we will need at least a 2K (2048B) SKB, so the expansion factor might be more like 2048/1514 = 1.35270805812
or the inverse 1514/2048 = 0.7392578125
so from 4 MB 4 * 1514/2048 = 2.96 MB
would queue packet data, but 50% is easier to think about 