Caching for offline access

Does anyone know if OpenWRT could function in an educational setting where a classroom needed to browse the web when-internet-available, but if not available, then browse whatever cached pages we had somehow on-the-box?

So we would need to cache pages in real time when-the-user was able to see the web. This is for intermittent wan connections which in this scenario happens a lot.

This is a real world need and I was wondering if anyone had done it. Thank you for your help.

The short answer would be "no", thanks to https-everywhere and dynamically exported content, proxies have lost most of their use.

5 Likes

If general purpose file sharing is all that is necessary (say PDF files, etc.), it would certainly be possible to use OpenWrt as a file sharing system. It's not ideal for this purpose, but it could certainly work.

1 Like

Thanks. I would just think someone, somewhere were caching for offline. If you thought of anything please let me know. Thank you.

Do you mind if I asked.... if we wanted to try it anyway, would we use OpenWRT to hand-off people to some other program to see if the cached page were on our server? Or how would you do it? Thanks.

What server? What cached page? As @slh stated earlier, most websites now use https, so you cannot cache pages in any meaningful way.

What you are probably looking for is a proxy server with caching.... those have historically been where the page caching has happened, not on routers. However, with https, this just isn't a thing anymore. But OpenWrt would not be a good choice for a caching proxy server.

1 Like

ONLY In case you are willing and able to install special certificate on the users device, you will be able to cache quite a few, not all, https sites. However, this setup is not trivial.
Which then might be done using openwrt, or full Linux, like ubuntu. Which is preferred, because of better docs. However, upper level processor and reasonable amount of RAM required for acceptable response times. Sufficient mass storage for caching, i.e. SSD, assumed.

1 Like

How does archive.is do it? It must be possible somehow.

Dive into "squid cache intercept". Be prepared for a steep learning curve.
This is a good, official entrance: https://wiki.squid-cache.org/ConfigExamples/Intercept/SslBumpExplicit

As an alternative, MAY be you are able to clone the websites in question onto your local server.

2 Likes

answer: depends.

it depends on what site you visit, how the site is built, level of security exists on the site ...

back in time caching proxy like squid for example was used to - as implied by the name - cache web content you visited, which worked nicely because in past HTTP was used. nowadays, modern decent web sites are using HTTPS to secure communication between client and server. and other "whistles" like HTST, OCSP etc on top of TLS1.3, which all prohibit traditional caching proxies to do their job. i.e. they cannot sit in the middle between client and server to capture traffic and cache / filter.

other pain point can be if the site you are visiting is working with dynamic content, generated on-the-fly.

so your possibilities are limited:

  1. try to save web site content to local file - this could work if content is mostly static. but you need to resolve how to share files.
  2. use HTTPS caching proxies - which exist but as others already said it will require HTTPS intercept, you must have a good understanding how TLS is working in order to "lie" your proxy's certificate to clients instead of destination site's. basically your proxy will act as man-in-middle attacker.
  3. you may clone web content to a local web server (as suggested already) and then use local DNS entry to assign the real, public web server's ip address and your local web server. in this way if public one is not working still the local ip will work.

obviously to mimic the real web server is less likely to work if content is dynamically generated, for example using data from a database, in case it is unlikely you can replicate properly the site.

better option to add a 2nd wan link on different medium if possible, for example a mobile internet as contingency.

1 Like

Wow. Thank you for your kind explanation. Are you by any chance interested in discussing a project we're working on? Possibly to hire you to build it?

sorry but am not that skilled. if i were you i'd go with 2nd wan link option. that's the simplest, cleanest, quickest.

What does the 2nd WAN link do?

What would you use to write the caching piece?

my understanding is that you have

problems. if that's the real problem i think that should be fixed.

with a 2nd wan link if your primary goes down still you could reach internet, hence would no need any caching solution. just fall back to 2nd link with all traffic, and you could continue to work.

there are plenty of wiki and forum posts about multi wan setup (e.g. https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3)

No not at all. These units will have one-internet connection, in often poor areas. If money were no object I'd have 10 Ethernet connections. :slight_smile:

It must cache for when the linux box is not able to see the internet. And also it must cache a lot of classroom stuff (but still www).

(@devcon1 - It's quite difficult to follow your thread when you quote random sections of text.)

Since money is a consideration (and reasonably so), what OpenWrt devices are you planning to deploy?

is the classroom stuff served on a site via http or https? Is the data available to you to download as an entire site package (i.e. could you run your own local web server for this content)? Does it change frequently? Is it personalized content (i.e.per student for uploads and/or downloads), or is it just a virtual version of classroom text books and activities?

1.) It's regular webpages on the public internet. So https almost always. I would go with that statement even though yes there will be proprietary content as well.

2.) On devices, I'm not sure yet. Possibly rpi because they're cheap but we're evaluating three others.

So that basically means the answer here is going to be "no." As stated by others, it's not impossible, but this is not trivial. Especially if the content is not static and may be personalized on a per-student basis.

The ideal case would be for you to work with the content provider to enable a mirrored solution on some local web server that you can run... but that means the content can't be that dynamic, and personalized logins are not going to work.

1 Like

Perhaps you would receive some solace from installing a local Rachel server.

' A collection of the world's best Open Educational Resources.'

I believe it is also to have local copies of Wikipedia and maybe the Gutenberg book repo.

1 Like