Skip to content

Conversation

@l0gid
Copy link

@l0gid l0gid commented Jul 7, 2025

Fix for #131

Thumbs / Titles were not being fetched due to changes in DOM structure for 'products' ratings, rating extraction came before Thumbs / Titles, would fail, and trigger the catch of the try / catch block in which it was situated, each extraction is now wrapped in its own try / catch to fail safe, as well as logging where the error occurred to more easily identify failures in the future (may spam the console but better than failing silently IMO)

Code to extract ratings has been updated to account for new DOM structure, no idea how robust it is but currently works so is an improvement on before.

l0gid added 12 commits July 7, 2025 20:31
Thumbs / Titles were not being fetched due to changes in DOM structure for 'products' ratings, rating extraction came before Thumbs / Titles, would fail, and trigger the catch of the try / catch block in which it was situated, each extraction is now wrapped in its own try / catch to fail safe, as well as logging where the error occurred to more easily identify failures in the future (may spam the console but better than failing silently IMO)

Code to extract ratings has been updated to account for new DOM structure, no idea how robust it is but currently works so is an improvement on before.
Titles were being extracted from the same source as thumbnail before, but those titles were concatenated after a certain length ... so changed the source to obtain full titles.
Search results now have a hasCoupon field, returns true / false for existence of coupons but does not parse what they are, further, some products may show coupons only on the product page and not on search results which this change does not find.
Prices > 999.99 would not be scraped correctly, changed how price is obtained
fixed issue where individual asin lookup was not correctly getting prices
added the output of "compare with similar items" > "this item" table to parser output as compareThisItem as it has info that doesn't seem included elsewhere on the page
Added product information > technical details (all rows controlled via the collapse all toggle) to the asin output as techDetails
output being a giant string and unreadable was getting to me ^=w=^
Selectors used for grabbing ratings & stars for each listing had a value changed meaning;

scraper fails to pull rating data
sets data to 0 for all listings
filter looking for < 1 rated listings filters out all listings
no output!

changes are as follows;
fixed selector to find data
in future, if data is not found, values set to -1 or -2 (so less than zero) to indicate theres a problem
filter now ignores values < 0 to allow result even after DOM changes, better than failing completely silently IMO as negative values stand out more =w=
Added save to path, implementation by AYehia0 in main repo
Thread failures now have the ability to retry X times before triggering a failure that is picked up by AggregateError that causes scrape failure, seems to happen seldomly and re-scraping fixes the issue, so being able to just retry the thread may make this error occur much less often
had an issue with Amazon US returning strange pricing values that running with these changes seems to have fixed.
for whatever reason, some requests (only observed for Amazon US currently) were fetching prices in the currency of the requesting IP (while the api listed them as USD), a cookie specifying the currency of the region site being requested has been added (may make requests easier to be fingerprinted)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant