Some weeks ago, I had encountered serious problems trying to activate a DHCP server (not masking!) as standalone ("authoritative") on a dumb WAP.
To no avail. Today I returned afresh, and found the following explanation in the settings:
"Lowest leased address as offset from the network address."
Wow! If I read this correctly, I'd know why it might not have worked. -> For the last 25 years, and in all my various networks, the first WAP was ...200. DHCP range has been 101-199; with system devices below 100, and fixed IPs (Printer, Scanner) above 200.
The DHCP server in OpenWRT is the first one that I ever seem to have encountered, where the DHCP address range is an offset above the network address of the OpenWRT device.
Can someone please confirm this? So, setting "21" "30" would mean that the server dishes out address in the range ....221 to .... 250; correct?(192.168.3.200 -> 200 plus offset 21)
And there is no way to dish out 101-199, like I used throughout my professional life? Or would I have to enter a negative offset?
No, the offset is applied to the base network address.
For example, if the subnet in question is 192.168.1.0/24, it would mean that the offset is added to 0. Therefore , a start value of 100 (default) would result in the first address in the pool at 192.168.1.100. The limit is the size of the pool, so a limit of 150 (default) means that there are 150 addresses. Therefore, the range would be 100-249.
The address of the host itself is not relevant, although it must be outside the total dhcp pool range.
Okay, sounds reasonable. Then I still don't understand why it doesn't dish out any single IP address to any client, unfortunately.
It is the AP between all clients and the router (with a DHCP that currently works; but less than perfect: the MAC address reservation doesn't work properly). So all DHCP requests pass the OpenWRT device, it is not disabled, and the parameters are set to "start 101" and "limit 99".
The very moment I turn off the DHCP in that router, all clients fall disconnected and don't connect ever again, until I restart the DHCP in that router.
How could I debug this further, then? The syslog never shows any DHCP request. It does show the handshakes for authentications.
Do you have another DHCP server active on the subnet?
if there is another DHCP server active on the network (typically the main router), the OpenWrt one will not be running.
Until the DHCP leases expire (or the clients needs to renew such as when it wakes up from sleep, reconnects to the network, etc.), clients will not be disconnected when the DHCP server is disabled.
Let's first look at your configuration.
Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
Remember to redact passwords, MAC addresses and any public IP addresses you may have:
(Been there, done this some weeks ago. I wouldn't mind doing it again; but lets' get clear about the philosophy before; for understanding:)
There is an option "authoritative" and the documentation recommends ticking this, if it is the only server on the network. It is written, this would speed up the process.
It says "cacheing DHCP". W.r.t reliability, though it never makes sense to have more than one DHCP servers online, wouldn't one expect the one on OpenWRT to take over, when the one on the router is shut off (I tried this a number of times; of course, disconnect a SINGLE client in order to still have access to that router, and reconnect. I'd expect that client trying to obtain a DHCP lease from the only available server: the one on OpenWRT. But to no avail. Rotating-rotating until I restart the DHCP on that router).
The default state has this disabled such that it will not clash with another DHCP server on the network.
The speed up by enabling authoritative mode is that it will not run a check to see if there is another DHCP server, so it will start up a bit faster. That is normally not a big deal though -- we're talking just a few seconds when the system starts.
There's actually a lot to unpack here.... but simply stated
In a typical home environment, the DHCP server exists on the main router -- if it goes down, there are usually bigger issues than just the DHCP server. It is a single point of failure, but from a practical sense, it's usually the one that matters the most and you'll rarely see fail over configurations with multiple routers in high-availability mode in a home setting (this is complex).
A single DHCP server can always be run on an outboard device, of course. But, it must be configured properly. This actually adds a second single-point-of-failure, so you could have two independent failure points (the router and the outboard DHCP server -- either could fail and would cause network problems).
redundant/caching DHCP servers are complex to setup and typically only done in business/enterprise environments. This is very different than simply having 2 or more DHCP servers on a network -- they need to be configured so that they share the DHCP lease table(s) and load balance or fail over based on the desired goals.
Simply stated, redundant/caching DHCP is way more complex than any normal home network will ever need.
So, unless you've got a lot of time and energy to spend learning the complexities of this type of redundant DHCP server setup, no, one server will not be able to 'take over' for another. And, as stated above, if it is the main router that goes down, the lack of a DHCP server is rarely the biggest concern.
Unfortunately, it missed out on one thing: the discrepancy between the documentation and explanation of the option "authoritative": "This is the only DHCP server in the local network". In the light of your explanation, this is illogical, since OpenWRT would never start its own server (like what I seem to have noticed in my setup). But then, the option "authoritative" makes no sense; does it? Especially in conjunction with the 'faster DHCP setup if ticked'.
I had activated the "authoritative" option; so OpenWRT ought to have issued DHCP offers while the DHCP in the router was shut down, didn't it?
While I fully agree with the enormous difficulties of a complete failover setup for DHCP, it would serve connectivity pretty much, if a repeated discovery was not met with an offer. Especially DHCP is a protocol with repeated requests with extending times in between.
Basic connectivity would come back if a repeatedly issued unsuccessful request would be met by a second server kicking in.
Back to the practical side here: OpenWRT doesn't 'kick in'; even after closing down the other DHCP server in the router, and a discovery request not resulting in any offer.
What would be the correct method of starting up the DHCP server on OpenWRT to actually dishing out IP-addresses, aside from closing down the other server on the fibre modem? As of the last months, OpenWRT doesn't dish out a single address here; with or without another server on the network.
(Edit:) Just saw another discrepancy: There is an option "Force DHCP on this network even if another server is detected" under 'Interfaces'.
It looks like documentation, description here, and settings options fall apart w.r.t. this topic, doesn't it?
I also saw the option "Read /etc/ethers to configure the DHCP server." Looking at this file, it is basically empty. The option is ticked in luci.
Could this be part of the problem?
There's no discrepancy here... The default DHCP configuration with authoritative disabled does not mean that there is another DHCP server on the network, it simply prevents a clash by performing a test to see if another one exists before starting the server.
I'm not sure why this is illogical. The process is as follows:
If authoritative is disabled, check to see if there is an existing DHCP server on the network.
determine if there is another DHCP server on the network by requesting a DHCP lease (using the standard DORA process) and then listen for a response.
If a DHCP server responds to the request, it means there is a DHCP server already active on the network
result: do not start the DHCP server (on OpenWrt)
if there is no response to the DHCP discovery request, we can assume that there is no other DHCP server on the network.
result: start the DHCP server -- we are the only one.
If authoritative is enabled, we assert that we are the only server running.
Do not test to see if there is another DHCP server running on the network. (this saves some time on the bring up process)
start the server.
if there is another DHCP server on the network, we will clash or we will hope that the other one shuts down; but we are asserting that we are the one that should be running.
Hopefully the above logic helps explain it.
Yes, but only if there were clients that needed to obtain a new/renewed lease. This depends on the DHCP lease time and how long each client has remaining on the lease, and also other events such as (re-)joining the network or interface bounces.
Define basic connectivity? And at what cost are you willing to restore such connectivity?
If the main router is down, anything physically connected to it would likely not be online, and obviously the internet would not be available.
If a 'backup' DHCP server starts issuing leases to any machines that have connectivity despite the main router being down, they still won't have access to the internet (as described above). But critically, the main router's DHCP server, when it comes back up, will not be aware of the DHCP leases that were issued by the 'backup' server. This means that the main router might issue leases for the same addresses as those that the 'backup' server had already given out. This would knock both of the DHCP client devices offline because they'd have an address conflict. Or, if a device needs a renewal and reaches the opposite server... this will result in NACKs (not acknowledged) when the client requests a specific address be renewed.
In short, if you enable 2 DHCP servers on a single network, you are very likely to experience problems at some point. Therefore, it is highly recommended that you stick with a single DHCP server. Do not overcomplicate things.
... on this, earlier I noted that if the main router is down, you don't have internet connectivity, so "basic connectivity" would purely be related to the local network. So... with that in mind, another thing I should mention...
If your local connectivity (I.e. 2 or more devices within your lan that must be able to communicate at all times) is critical and must not be interrupted if the DHCP server (and internet) is down, you should assign those devices static iP addresses (on the host itself in its network configuration) rather than using DHCP. This, of course, assumes that the physical connection is still alive (WiFi and/or Ethernet).
There is no correct way to do this. The only acceptable method is to disable one of the servers and use the other. Period. (I can state with reasonable certainty that your fiber modem/ONT + router device does not have the ability to be configured to properly operate in a multi-DHCP server environment where the servers are redundant/caching and forming a cooperative/synchronized lease table -- this is a very advanced and niche feature).
I do want to note this change regarding DHCP with the introduction of IPV6
this even include problems with not using the lows value of .1 for OpenWrt
this causes problem with allocation of ipv6 addresses
as you equally have to change the ipv6 allocation as it is defaulted to .1 as well
so thing did change a little
if you change DHCP "ipv4" on't forget to match the changes in for odhcpd "ipv6"
If we are talking about a typical home environment, including home office and smaller forms of SOHO environments, it is generally acceptable that the internet (including internal network connectivity, as most 'normal' use cases these days require internet access anyways) may go down shortly, as long as you can get it up again within a 'reasonable' time and with reasonable efforts. Usually having a second preconfigured router as cold spare and with clearly marked ports/ cable is sufficient for these needs, the incident report would be:
notice 'a problem with the internet'
try to access the router
if that works, your internal network is likely fine (and you 'only' have an issue with your uplink)
phone your ISP and open a ticket
if it doesn't work, there is an issue between your PC and the router
bad cabling, which happens very often, be it
'just' unplugged (the clasp on rj45 connectors likes to break off, debris in the ports, bent pins, broken cable
If you are confident that your router is the culprit, just replace it with your cold spare (3 cables to switch over, power, WAN, LAN), this shouldn't take more than 5 minutes (the debugging steps beforehand will take longer (at least 20-30 minutes), but aren't optional).
Where exactly the cutover is between SOHO and (formal) high-availability requirements depends a lot on your specific circumstances.
Very personally I'd suggest mwan failover (single router, still as single point of failure, but at least two independent ISP technologies (e.g. fibre, vdsl, cable and 4g/ 5g) as soon as you earn money with your internet connection (as-in, you will lose money and contracts if you aren't reachable during office hours) in combination with a cold standby as preconfigured 1:1 replacement (in more private use cases I'd still recommend a cold standby, but there you can stretch the 'cold' aspects quite a bit (in the sense of an old router not reaching full speeds suffices or might not be fully configured, as long as you can bring it up within ~10 minutes).
For me, the real hard-HA requirements start somewhere around 20-50 employees behind your router, depending on how critical the downtime really is - and how likely it ends up that one of them really can't be expected to switch over 3 cables.
Obviously there are exceptions in either direction, if lifes are on the line (medical devices, elevated security/ surveillance requirements, high frequency trading or similar or machinery that will cause huge losses in case of any interruption, but here we move quite far off SOHO needs).
While I'm not necessarily advocating for it[0], mwan3 may be useful - the above is meant to cover enterprise'y high availability needs beyond mwan3 (the real HA stuff):
HA failover between 2+ routers (~= hot-standby)
backup DHCPd
redundant power supplies for router, switches, etc.
UPS (might be necessary in some regions with unstable electricity networks)
If you want to make maintaining this your mission, all the more power to you - but in a normal household (including SOHO environments), the costs (electricity, devices, initial setup and constant maintenance) quickly exceed their value - and you easily introduce new hidden failure modes that are more likely to hit at home, than hardware failure of your router (unless you reeeaaalllllllyyyyyy go all in enterprise). It's a number's game, what is the financial loss in cold hard cash or human life, for 10-30 minutes downtime - and what else do you pay for in front of your router to avoid that (redundant ISP connections, SLAs, etc.). In practice you are likely more often hit by ISP failure/ downtime, than by your own hardware failing (competent maintenance assumed, in the sense of beyond negligently doing breaking configuration changes during office hours)
--
[0] at home, just plugging cables from e.g. ftth modem to 4g modem(router) may be a valid strategy, if it's not expected to be a regular occurrence and not an immediate financial loss, just to keep closer tabs on the situation (really notice downtime and avoid going over quota for the metered connection). This is not a recommendation against mwan3, more a consideration if the immediate -automatic- and potentially unnoticed failover is desired/ necessary.
If you are willing to pay for a second WAN uplink of comparable (sufficient) speed and for the costs associated of using it full time (e.g. if mwan3 switches to the backup unnoticed (have a non-internet-based WAN monitoring and notifications set up) --> unmetered traffic preferrable), there's no reason not to use it. But in a home environment, without a direct business case to cover the cost, there may be reasons for more ad-hoc failure control (e.g. tether your smartphone by USB, if -and only when- necessary, and reduces the traffic usage in the backup condition.
@udippel I highly recommend that in case you want to nerd out and understand the behavior and deployment of dhcp you should study the rfc and it's updates: https://www.rfc-editor.org/rfc/rfc2131
We're discussing this item forth and back, but I don't seem to get through. Maybe I added too many sidepaths, sorry.
The core is and remains, that OpenWRT does NOT dish out any address; and it does not dish out any address when it is the only DHCP server on the network, and topology-wise ALL traffic has to pass it. And there is no filtering (firewall) active.
How do I solve this? I had asked some two weeks ago, had submitted all the required settings, but no solution was feasable. The only thing useful had been the "Dynamic DHCP" that has to be activated. I did, but still no address.
Nobody wants "to nerd out" when he expects a DHCP address from a sole server on the network. An IP address should rather come out of the box. Here it doesn't.
The talk about fail over was on the side, since nobody seemed to be able or willing to explain options that according to @psherman wouldn't work anyways ("authoritative", "Force DHCP on this network even if another server is detected.").
Ok then. Then just continue to dream and image and avoid the hard work to actual understand how dhcp is intended to work and avoid to read and understand how dhcp is implemented by dnsmasq and for instance by the defacto reference implementation isc-dhcp and it's followup implementation kea.
And continue to ignore the restraints and choosen use case by limited device where openwrt usually runs at.
And btw yes of course all the core devs of openwrt are incompetent and have no clue what they are doing
Nobody talks about incompetence. Things would be just easier, if my various questions were answered directly, instead of lecturing about something that is not on my mind. I thank @Psherman for the extra explication on the difficulties of fail over, something I learned in this thread.
Alas, the rest of my questions were simply not answered. Do I have to repeat them? I don't think so. Someone in the know could quickly answer them, about the options as well as further debugging why a single DHCP server does not dish out addresses.
"RTFM" or "Read the respective RFC" don't help. Anybody can write that. Even me myself.