Detail Pages
SIGN IN TO USE ITtitle | content | upvote | author | author_url | posted_at | comments_num | voted_percentage |
---|---|---|---|---|---|---|---|
Scraping Google Search, or Maps, at scale | I'm eager to crawl Google at a considerable scale. I see tools like Ahrefs and SEMRush do this 10s, if not 100s of million times a day to gather their data. Also, SERP Checkers are able to do this, also seems that some proxy providers are now providing SERP API feeds in JSON.From my understanding, it is very unlikely they're rendering the pages due to efficiency, resources, and so on. So I'm nearly convinced it's through web requests. What I also know, is it's 100% not via Google's API which has extremely low quotas.I can scrape Google, but it'sINCREDIBLY slow (used selenium with pauses)Maybe 2/3 hits get presented by a captcha.What I have done to unsuccessfully combat their detection:Randomizing user agentsUse a proxy rotator after any request (https://www.proxyrack.com/datacenter-proxies/). I have also tried many providers, proxy rack seemed to have the highest success rate...Randomizing sleep times- Tried HtmlAgilityPack (web requests, C#), Puppeteer and SeleniumUsed Puppeteer's 'stealth' plugin: https://www.npmjs.com/package/puppeteer-extra-plugin-stealthAccepting cookiesStoring cacheHas anybody been able to successfully crawl Google at scale, and if so, is there any secret sauce? | 6 | u/MattH1966 | https://www.reddit.com/user/MattH1966/ | 4 days ago | 6 comments | 81% Upvoted |