How to avoid routing loop problem?

This question is not really related to OpenWrt. But anyway, I have a plan to rebuild my network using devices running OpenWrt.

The current situation of my network is poor. I have ~300 concurrent users on multiple VLANs. All VLANs are connected to a single gateway, and if the server room is down (power outage, etc...) then the whole network is down. I would like to split the network into 3 smaller physical areas, with 1 OpenWrt router in each areas, to implement routing and load balancing between them. I already have a lot of wires which connect one area to each other.

Assume that I have 3 routers namely A, B, and C. Each router has 2 links to the other 2, with the metric of 2 - 2 - 11 as below:

Drawing2rlp

Now assume that the link between Router B and Router C is broken. An user behind Router A sends a packet to another user behind Router C.

  1. Router A forwards the packet to router B because it has lower metric (2 < 11).
  2. Router B has no choice but to forward the packet back to Router A, because the link to Router C has been broken.
  3. And the infinitive loop continues, because Router A doesn't know the B to C link is down.

How do I solve this problem?

this is the domain of routing protocols...

which one you choose typically comes down to;

  • learning / healing time
  • link type / speeds ( propagation / interoperability considerations )
  • network complexity

for such a topology... typically rip or ospf for wired or batman / other name escapes me right now... (olsr?)

for such a user level... you really need to reduce single points of failure and look at link state orientated routing protocols or carp/failover style link resilience... and 'mesh'-wiring (multiple redundant links)... which you now have/are doing with 3 routers but with 4 there may not be...

2 Likes

You need a dynamic routing protocol. Check quagga package and study ospf.

3 Likes

Thanks for the direction. I am pretty good at studying new stuff. However, an detailed article about the subject would be handy. Does the OpenWrt User Guide section have anything related to that topic?

@anon50098793 I am neither limited by the number or routers, nor money. The only thing that limits me is the knowledge how to do it. :frowning_face:

i'd recommend reading a few cisco documents ( start with rip and ospf )... even if you never use actual cisco IOS...

there are plenty of docs available and the config is quite clean and straight forward...

from there... picking up quagga or any other openwrt guide is pretty straight forward...

the challenges in a lan environment (especially smaller ones) are;

  • choosing and planning around service resilience vs redundancy ( where do you put your servers... etc especially dhcp is a good example )
  • switching to a routed mentality from a client perspective...

get setup with quagga read a guide or two, start with the simplest topologies possible... then come back and post your configs if you are having issues...

There is plenty of documentation on the internet about ospf. Quagga is using a cli very similar to Cisco IOS, so if you know the commands of the latter, quagga is very easy to configure.
That being said, there is no OpenWrt guide for quagga, as the default configuration files are used, or configuration can be done from the builtin vtysh

Even a larger installation, think university campus with thousands of concurrent users, would typically try to avoid having to deal with multiple gateways to the outside and rather try to improve the failsafe/ failover (but not load-balancing) aspects first (UPS, generator, etc.).

Can you tell me more about the pros and cons of such design? And is it a good idea if only 1 out of 3 routers is the gateway to the outside, while the other 2 simply do internal routing?

Any network design with multiple uplinks adds a lot of complexity - and new hard-to-debug failure cases. There are reasons to do this, but if you do, you do need to go full enterprise - and in a lot of cases this is mostly painful, with little reward.

--
I do distinguish between failover (good, necessary beyond a certain number of concurrent users, respectively the consequences (cost) of an outtage) - and load-balancing or 'mesh-like' features here, which have their place, but only if you really understand the consequences and have prepared the network to mitigate that.