Collecting router prices from websites

Continuing the discussion from Lowest cost (max. $10.00) OpenWrt device?:

Do we have someone in the forum who is experienced in data collection from websites?

Idea: Scrape prices for OpenWrt supported devices (and which are listed as "available") from e.g. amazon and/or ebay) and integrate them in the dataentries (or create a separate table for selecting a router by it's price).

FYI, I know this can be done with the UPC code on eBay:

https://www.ebay.com/sch/i.html?_nkw=<UPC>

Example - Archer C7 V1: 845973070601 (Per: http://en.techinfodepot.shoutwiki.com/wiki/TP-LINK_Archer_C7_v1.x)

eBay URL: https://www.ebay.com/sch/i.html?_nkw=845973070601

I was going to say that this was "five minute job", until I saw that Amazon or Newegg now build their pages on the browser using lots of JS and almost no HTML... so there is no content to be scrapped until you execute the JS, and that is no longer a "five minute job". I should have kept my mouth closed on the other thread...

I will give this another go anyway, tomorrow.

1 Like

I do not know how to do this, but some websites do this already. Just search for best something and there are always amazon prices and links to buy.

Turns out Amazon does use a lot of JS, but not as much as I had initially thought (I guess the page had not finished loading when I was looking to the source code), and all the information can be scrapped from HTML.

All in all, this seems doable again.

Cool grate idea!

Shouldn't it be possible anyway to use a broswer API to get executed code?

In theory, yes; in practice, "it's complicated".

1 Like

For Amazon, I've been using https://camelcamelcamel.com for a long time now, though I'm not exactly sure about their policies regarding use of that data on other websites.

It's amazing though how often the prices change on amazon, for example the D-Link DIR-842 (QCA9563 + QCA9888, AC1200) has been available for under EUR 40 in Germany several times (and you can sign up for e-mail reminders when the price drops below a certain value).

I do write webscrapers in python as a freelancer and I've already written one for amazon.
As eduperez mentioned, normally browser method is used with selenium webdriver but if you issue a GET request with proper headers, http request will work too. No need for webdriver.
I'd like to contribute if python is ok.

python should be ok.

How are you going to run the code?
Are you going to use the code yourself or you need a simple API to POST a request and GET the data via an API call from front-end?

Besides there is Amazon Product Advertising API you may wish to check.

Unknown.

For the time being, I would be happy to get a table of all devices in the ToH with a price behind.

I wrote the code but there is a problem. None of the devices in the TOH table have ASIN number.
Searching device "{brand} {model}" on Amazon is not reliable option because Amazon search results include similar products.

I tried searching "{brand} {model} site:amazon.com ASIN" using Google search. It was working until google returned 429 error.

If anyone knows a better way of getting ASIN number from "{brand} {model}", let me know.

use the wikidev database?

1 Like

I would grab ASIN numbers first, then the prices, son the ASIN codes can be edited manually.

1 Like

Not all devices have wikidevi page and the devices have ASIN on its wikidevi page is a lot less than that.

1 Like

Code is ready already. The problem is finding correct product on Amazon by using device data like brand, model. Searching by ASIN is the best way to find something on Amazon.

For instance when you search "raspberry pi 2" Amazon is returning accessories for raspberry pi.

I'll try finding ASINs filtering by product category tonight. Otherwise all 1591 devices have to be checked manually.

@tmomas there are many discontinued devices. At least can we filter them out?

Scraping the HTML page is the wrong approach, most of the time.

It mostly comes from a separate POST request using some JS on the page. You can get that easily with the Chome Dev Console > Network, Charles or Wireshark. I've reverse engineer'd a lot of things like that in the past.

Maybe I give it a go later :slight_smile:

EDIT: Giving eBay a quick look, no, they're still old fashioned.

image

The prices and titles are inside the HTML which is initially fetched. This may complicate things a bit, but Regex may solve all of this...


I have some 'expertise' in this, I make scrapers here and there and it's quite fun (sometimes).

More details?

My focus is JavaScript (Node, Vue, Express, ...). The JS language is also quite reliable in scraping (other languages I've tried in the past aren't so... *cough* php...python... *cough*)

1 Like