DNS - Stumped Again

Much better than there description earlier.

Now is it correct to say that the issue is occurring with the 1900acs device?

When you made a change and said that it was working fine without the 1900acs, how were things connected?

1 Like

The diagram shows what has worked - well, forever till recently.

To gain access, I simply remove 1900ACS wan cable at the ISP modem and replace it with SW1 cable - 1900ACS/OpenWrt gone!

I then just reconnect devices to ISP connection on 192.168.x.x that provides DHCP addresses for everyone - ie I am currently on 100.114.15.120 atm.

So this means that the ISP modem must be operating as a router?

And the ISP modem (router) is providing addresses in the 192.168.x.0/24 subnet?

Is that correct?

You are better than I to discern that, but yes. If you want to add additional devices "you need to rent a router from us"

Yes. CGNAT.

This is contradictory... so I'm still confused.
There are 2 ways this can work (unless you get multiple IP addresses from your ISP):

  1. a modem only device operates essentially transparently, converting signals from one medium to another, but just passing them through and presenting the next device with the ISP provided IP address. For example, a DSL modem or a cable modem-only device just converts the signal into standard ethernet (typically with copper/RJ45 connections). Most residential connections provide the customer with a single IP address from the ISP. That means only one device can be 'directly' online at any time. If you connect a router, that router will enable you to share your connection with (and protect from the unfiltered internet connection) multiple devices.

  2. a modem+router combo unit physically combines the features of the two deices (as the description implies). Some such devices can also be configured in modem-only/bridge mode and will pass the ISP connection directly to the next device, basically reducing it to the same idea as #1 above.

Home routers are usually configured (and should be configured) to use RFC1918 address schemes (i.e. anything within the ranges of: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16).

If the modem is providing addresses to your downstream devices in the 192.168.x.x/24 network range, the modem is a modem+router unit and it is probably not CGNAT to your devices -- it is just regular RFC1918 NAT.

CGNAT technically uses this range: 100.64.0.0/10. Typically speaking, CGNAT will be visible as the WAN address of your main router (if your ISP is using it, of course).

So maybe you can clarify exactly what addresses you are seeing under each circumstance.

ifconfig on this connection direct to ISP device returns:
enp2s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 100.114.15.120 netmask 255.255.128.0 broadcast 100.114.127.255

So that is CGNAT.

What device were you using when you ran ifconfig? It sounds like it is directly connected to the ISP modem? Is this the only device connected/online?

My brain is addled - you are correct and obviously confused. I'm missing the forest for the trees.

My main linux box. It's the only thing left on the switch other than the trunk to modem.

Ok... so that means you need a router in the system if you want to connect multiple devices.

There are 2 remaining questions:

  1. Is this a problem with the ISP's DHCP servers. One way to answer this is to keep your linux box connected and online (don't turn it off, don't let the system sleep) for the duration of the DHCP lease (or maybe several lease durations) and see if it is successfully able to renew the lease. This would help determine if the ISP is at issue because you would then have info to say that it is not the router, but that it actually happens on other connected devices, too.

  2. Is there an issue with your router and/or some incompatibility between your router and the DHCP servers at the ISP that is causing the DHCP lease renewal to fail.

With respect to your router configuration -- I would recommend that you install an official release (stable) image and only configure the bare minimum that is necessary for your network to function (i.e. change the LAN IP address if you want, enable wifi, etc., but don't install any new packages or change any other configurations other than the very basics. OpenWrt is quite stable in general, but things can go wrong when running master/snapshots and/or doing advanced configuration or package installations. If you still run into issues with this most basic configuration, it will be more likely that the issue is with the ISP than anything else you've done or OpenWrt in general (not ruling out bugs or some incompatibility between your installation and the ISP, but it just becomes lower probability).

That has been my problem - I call ISP, they say connect direct to modem using DHCP, it works, they say no problem. I don't think so!

Interesting, by way of note, I have found lease renewals of late lagging. Failure to renew at last chance immediately. I see it count down from 14400 - 7200 - 3500 . . . last chance to renew and then it sits at times and then seconds/minutes later I'm back - along with race condition issues due to time.

Is this true on your Linux box as well as openwrt? Or only openwrt?

It's only on OpenWrt as far as I've observed. I lose my VPN to timeouts when it happens and have to restart it.

OpenWrt absolutely follows the script - half life/aquire - half life/aquire . . ., the ISP takes it's time, right down to the Oh,Oh better give a new lease - wait a minute.

Because you believe that openwrt is the issue, take my earlier suggestion and run a (near) default configuration and see if the problem recurs.

1 Like

I've pared down to lan/wan at this point - henet, vpn, WG, Guest, IOT - gone.

I think it goes to the ISP - they've changed something.

I'll try a stable release per your suggestion, but no, I really don't think OpenWrt is an issue. I've used the same .config other than kernel update patches without change for 3-4 months with nary an issue. They've all been plug 'n' play til the next build. This one stopped working after 23 days of Uptime - no changes - doesn't compute!

Thanks folks, I've appreciated all of your thoughts/suggestions. I'll advise if I can wrangle this one out eventually.

When you are using snapshot/master, things can break unexpectedly. A complex configuration could affect things, too (if you have a mistake or even just an interaction). But your isp could also be part of the issue. It is really important to approach these types of issues with methodical analysis and experimentation. Isolating/eliminating variables is critical. And communicating precisely is also important if you are asking for help.

Let us know what happens with the simplified config.

I get that - I managed a help desk for an ISP at one time, so I know that challenge.

I unfortunately, have never had that gift. I have a bad habit of 'self assurance' as in 'WTF? - this has been stable for weeks? - I haven't touched it' and then run on about - well . . . everything except 'communicating precisely.

I'll put 19.x latest stable on and let you know the outcome.

I'm putting my money here at this point. I already know they are over leveraged. I get a call a week to move from sat to 3G wifi. (Despite the fact that there is no service here).

Just to update - I’m pretty sure I have tracked this down to a combination of the extended signal outage and ntpd. Without RTC, the router time was retarded when WAN got lease, ntpd then updates, (Edit: ntpd crashes) wan never issues a renew. Tried this:
(Edit: Flash original sysupgrade.bin/n)

ntpdate pool.ntp.com
touch banner
reboot

Restored my backup and working fine the last 24+ hrs. ntpd working - no crashes.

Did you turn off the router when it didn’t have a internet connection and didn’t update of the ntpd?

3h isn’t long time and the clock should just keep going if the router is turned on. Once synced the ntpd only check time every 32min (max interval) anyway. When you first install a openwrt you will have the date and time of the build date but the networks doesn’t stop working because of that and if you reboot it will always ask for a address from the dhcp server.
And the leasetime is a seconds counter and not absolute time.

Since the clock doesn’t even work right in Edgerouter I must say it can’t be that important that the time is correct, at least not correct to the real world…
It only seems to make log data timetags messed up but everything else works.

Glad things are working again.

As I had suggested 4 days ago (in response #6), you need to debug the basic connectivity first. You spent a lot of time spinning your wheels looking for red-herrings and probably added a lot of frustration to your days of troubleshooting. In the future, remember to test the fundamentals and don't make assumptions about the root cause until you have performed methodical testing and isolated the variables.

1 Like