Hi. I am new to the forum. Hello.
As many people have likely experienced we are not able to read the full website documentation available at https://openwrt.org/docs/ offline via often used methods such as from the device webserver page, via a software plugin ( opkg install 'documentation'), or as a standalone pdf. The openwrt documentation is excellent. For most openwrt users and developers, the documentation not only effectively describes how to operate our openwrt devices, but is also an excellent reference for a good number of networking and security topics.
I would like to know if we can programmatically download the website documentation html pages and convert them into pdf. Obviously the resulting pdf will be for personal use only. I know I can do this, at least in a technical sense, but do the maintainers and owners of openwrt allow this?
This query is directed to other users on the forum, and especially the forum administrators.
the documentation won't fit in the device flash memory, that's why it's not included.
Yes you can freely download openwrt wiki contents and make a pdf, the information there is shared freely under this Creative Commons license (you can find a link to it at the bottom of each page) https://creativecommons.org/licenses/by-sa/4.0/deed.en
You know, this is an interesting thing to consider - I’m not sure how much value it would have, but if someone could download basically the entire site documentation (or subsections) as a package to their computer, that could be good for when they are planning to be offline due to router swaps / maintenance / configuration.
Probably not all that necessary given that many users will have at least a mobile device for access while their home connection is down, but I wonder if anyone would use such a feature and how it would be implemented.
for windows there are applications that do that, and can be done from a linux/mac/windows system with wget commandline
wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --reject-regex "/search|/rss" --no-check-certificate --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36" --wait 1s --restrict-file-names=windows openwrt.org
I'm doing some tests to see if it works and I can implement this in the wiki server itself to provide a package users can download and browse offline with their web browser.
You should take care that wget doesn't choke on the toh views. They contain links that only change the sorting or filtering. Dumb processes like search engines or wget can easily get stuck there forever. Should be avoided for obvious reasons.
If anyone ends up scraping the wiki into HTML files, I have used a simple pipeline to generate PDFs using pandoc and tectonic.
Basic steps are:
- gather markup friendly source files
- for each file
- convert from markup to PDF (using tectonic or other PDF generator)