Collecting router prices from websites

tmomas · January 22, 2020, 9:15pm

Continuing the discussion from Lowest cost (max. $10.00) OpenWrt device?:

Do we have someone in the forum who is experienced in data collection from websites?

Idea: Scrape prices for OpenWrt supported devices (and which are listed as "available") from e.g. amazon and/or ebay) and integrate them in the dataentries (or create a separate table for selecting a router by it's price).

lleachii · January 22, 2020, 11:43pm

FYI, I know this can be done with the UPC code on eBay:

https://www.ebay.com/sch/i.html?_nkw=<UPC>

Example - Archer C7 V1: 845973070601 (Per: http://en.techinfodepot.shoutwiki.com/wiki/TP-LINK_Archer_C7_v1.x)

eBay URL: https://www.ebay.com/sch/i.html?_nkw=845973070601

eduperez · January 22, 2020, 11:47pm

I was going to say that this was "five minute job", until I saw that Amazon or Newegg now build their pages on the browser using lots of JS and almost no HTML... so there is no content to be scrapped until you execute the JS, and that is no longer a "five minute job". I should have kept my mouth closed on the other thread...

I will give this another go anyway, tomorrow.

fantom-x · January 23, 2020, 1:00am

I do not know how to do this, but some websites do this already. Just search for best something and there are always amazon prices and links to buy.

eduperez · January 23, 2020, 7:17am

Turns out Amazon does use a lot of JS, but not as much as I had initially thought (I guess the page had not finished loading when I was looking to the source code), and all the information can be scrapped from HTML.

All in all, this seems doable again.

tapper · January 23, 2020, 8:03am

Cool grate idea!

Hegabo · January 23, 2020, 8:39am

Shouldn't it be possible anyway to use a broswer API to get executed code?

eduperez · January 23, 2020, 12:09pm

In theory, yes; in practice, "it's complicated".

s_2 · January 25, 2020, 12:08am

For Amazon, I've been using https://camelcamelcamel.com for a long time now, though I'm not exactly sure about their policies regarding use of that data on other websites.

It's amazing though how often the prices change on amazon, for example the D-Link DIR-842 (QCA9563 + QCA9888, AC1200) has been available for under EUR 40 in Germany several times (and you can sign up for e-mail reminders when the price drops below a certain value).

bahtsiz_bedevi · January 28, 2020, 11:06pm

I do write webscrapers in python as a freelancer and I've already written one for amazon.
As eduperez mentioned, normally browser method is used with selenium webdriver but if you issue a GET request with proper headers, http request will work too. No need for webdriver.
I'd like to contribute if python is ok.

tmomas · January 28, 2020, 11:41pm

python should be ok.

bahtsiz_bedevi · January 29, 2020, 12:33am

How are you going to run the code?
Are you going to use the code yourself or you need a simple API to POST a request and GET the data via an API call from front-end?

Besides there is Amazon Product Advertising API you may wish to check.

tmomas · January 29, 2020, 1:38pm

Unknown.

For the time being, I would be happy to get a table of all devices in the ToH with a price behind.

bahtsiz_bedevi · January 30, 2020, 12:48am

I wrote the code but there is a problem. None of the devices in the TOH table have ASIN number.
Searching device "{brand} {model}" on Amazon is not reliable option because Amazon search results include similar products.

I tried searching "{brand} {model} site:amazon.com ASIN" using Google search. It was working until google returned 429 error.

If anyone knows a better way of getting ASIN number from "{brand} {model}", let me know.

Ansuel · January 30, 2020, 1:52am

use the wikidev database?

eduperez · January 30, 2020, 6:58am

I would grab ASIN numbers first, then the prices, son the ASIN codes can be edited manually.

bahtsiz_bedevi · January 31, 2020, 3:56pm

Not all devices have wikidevi page and the devices have ASIN on its wikidevi page is a lot less than that.

bahtsiz_bedevi · January 31, 2020, 4:00pm

Code is ready already. The problem is finding correct product on Amazon by using device data like brand, model. Searching by ASIN is the best way to find something on Amazon.

For instance when you search "raspberry pi 2" Amazon is returning accessories for raspberry pi.

I'll try finding ASINs filtering by product category tonight. Otherwise all 1591 devices have to be checked manually.

bahtsiz_bedevi · January 31, 2020, 4:09pm

@tmomas there are many discontinued devices. At least can we filter them out?

Fusseldieb · January 31, 2020, 5:10pm

Scraping the HTML page is the wrong approach, most of the time.

It mostly comes from a separate POST request using some JS on the page. You can get that easily with the Chome Dev Console > Network, Charles or Wireshark. I've reverse engineer'd a lot of things like that in the past.

Maybe I give it a go later

EDIT: Giving eBay a quick look, no, they're still old fashioned.

The prices and titles are inside the HTML which is initially fetched. This may complicate things a bit, but Regex may solve all of this...

I have some 'expertise' in this, I make scrapers here and there and it's quite fun (sometimes).

More details?

My focus is JavaScript (Node, Vue, Express, ...). The JS language is also quite reliable in scraping (other languages I've tried in the past aren't so... *cough* php...python... *cough*)