The bonding driver when used in round robin mode has about a 10% efficiency loss when bonding two links. As you add links, the performance goes down. The practical limit is about 4 to 5 interfaces before performance degrades enough to make it not worthwhile.
The driver is fairly efficient as it's kernel mode, so there's no context switching which would nail performance. The packets are going through the TCP/IP stack anyway, the only difference is they're just being round robined by the kernel onto two output interfaces, so there is only a little bit of additional overhead (performance analyses are available on the net)
I benchmarked the efficiency on my setup at a 10% loss (ie., combined bandwidth = 2x per link bandwidth less 10%) and this included two VPN tunnels, although the crypto was being handled by hardware acceleration, so probably was a relatively small percentage of the 10% loss, although with faster links than I had, this may well be higher. 10%, however, does accord with other analyses I've seen.
If your CPU and device can handle, let's say, a real world throughput of 1Gbps on one interface, you'll absolutely get a minimum of 1Gbps - 10% when bonding two interfaces, assuming that the CPU is the bottleneck. If the CPU has additional headroom to process more packets (independent of the bonding), then you'll get higher throughput.
The point is valid though, that a lot of consumer router devices, even though they have a 1Gbps interface, cannot sustain much more than a few hundred Mbps throughput. So you'll need capable router hardware.