Black Friday

50% OFF

25-30 november 2024

Reddit - How to scrape data from Detail Pages

Detail Pages
Detail Pages
title content upvote author author_url posted_at comments_num voted_percentage
Scraping Google Search, or Maps, at scale I'm eager to crawl Google at a considerable scale. I see tools like Ahrefs and SEMRush do this 10s, if not 100s of million times a day to gather their data. Also, SERP Checkers are able to do this, also seems that some proxy providers are now providing SERP API feeds in JSON.From my understanding, it is very unlikely they're rendering the pages due to efficiency, resources, and so on. So I'm nearly convinced it's through web requests. What I also know, is it's 100% not via Google's API which has extremely low quotas.I can scrape Google, but it'sINCREDIBLY slow (used selenium with pauses)Maybe 2/3 hits get presented by a captcha.What I have done to unsuccessfully combat their detection:Randomizing user agentsUse a proxy rotator after any request (https://www.proxyrack.com/datacenter-proxies/). I have also tried many providers, proxy rack seemed to have the highest success rate...Randomizing sleep times- Tried HtmlAgilityPack (web requests, C#), Puppeteer and SeleniumUsed Puppeteer's 'stealth' plugin: https://www.npmjs.com/package/puppeteer-extra-plugin-stealthAccepting cookiesStoring cacheHas anybody been able to successfully crawl Google at scale, and if so, is there any secret sauce? 6 u/MattH1966 https://www.reddit.com/user/MattH1966/ 4 days ago 6 comments 81% Upvoted

Try AnyPicker today!

It only takes 5 seconds to install it!

Add to Chrome AnyPicker requires Chrome.