Do we have someone in the forum who is experienced in data collection from websites?
Idea: Scrape prices for OpenWrt supported devices (and which are listed as "available") from e.g. amazon and/or ebay) and integrate them in the dataentries (or create a separate table for selecting a router by it's price).
I was going to say that this was "five minute job", until I saw that Amazon or Newegg now build their pages on the browser using lots of JS and almost no HTML... so there is no content to be scrapped until you execute the JS, and that is no longer a "five minute job". I should have kept my mouth closed on the other thread...
Turns out Amazon does use a lot of JS, but not as much as I had initially thought (I guess the page had not finished loading when I was looking to the source code), and all the information can be scrapped from HTML.
For Amazon, I've been using https://camelcamelcamel.com for a long time now, though I'm not exactly sure about their policies regarding use of that data on other websites.
It's amazing though how often the prices change on amazon, for example the D-Link DIR-842 (QCA9563 + QCA9888, AC1200) has been available for under EUR 40 in Germany several times (and you can sign up for e-mail reminders when the price drops below a certain value).
I do write webscrapers in python as a freelancer and I've already written one for amazon.
As eduperez mentioned, normally browser method is used with selenium webdriver but if you issue a GET request with proper headers, http request will work too. No need for webdriver.
I'd like to contribute if python is ok.
How are you going to run the code?
Are you going to use the code yourself or you need a simple API to POST a request and GET the data via an API call from front-end?
I wrote the code but there is a problem. None of the devices in the TOH table have ASIN number.
Searching device "{brand} {model}" on Amazon is not reliable option because Amazon search results include similar products.
I tried searching "{brand} {model} site:amazon.com ASIN" using Google search. It was working until google returned 429 error.
If anyone knows a better way of getting ASIN number from "{brand} {model}", let me know.
Code is ready already. The problem is finding correct product on Amazon by using device data like brand, model. Searching by ASIN is the best way to find something on Amazon.
For instance when you search "raspberry pi 2" Amazon is returning accessories for raspberry pi.
I'll try finding ASINs filtering by product category tonight. Otherwise all 1591 devices have to be checked manually.
Scraping the HTML page is the wrong approach, most of the time.
It mostly comes from a separate POST request using some JS on the page. You can get that easily with the Chome Dev Console > Network, Charles or Wireshark. I've reverse engineer'd a lot of things like that in the past.
Maybe I give it a go later
EDIT: Giving eBay a quick look, no, they're still old fashioned.
The prices and titles are inside the HTML which is initially fetched. This may complicate things a bit, but Regex may solve all of this...
I have some 'expertise' in this, I make scrapers here and there and it's quite fun (sometimes).
More details?
My focus is JavaScript (Node, Vue, Express, ...). The JS language is also quite reliable in scraping (other languages I've tried in the past aren't so... *cough* php...python... *cough*)