You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice work, thanks for this libary. Getting a performance issue here - pages are taking forever to load in Selenium / Edge... but do load eventually. Here's my stats currently running after nearly an hour:
4%|███████▉ | 16/387 [43:24<18:02:09, 175.01s/it]
(Line 434: self.driver.get(url) ) taking forever - Edge automation just shows spinning wheel on tab with page apparently fully loading eventually.
I wonder if Substack have implemented anti-bot measures do you think? Have tested network connection, very fast and pages loading fine in BeautfulSoup. Apologies if someone's raised this already.
The text was updated successfully, but these errors were encountered:
Hey @reidben - are you running a macbook with apple silicon? It seems like this might be a slowdown because it's using an Intel/x64 version of Edge.
I've had a bit of success switching it out for the Chrome (arm64) driver in substack_scraper.py. To do this, you need chrome installed and the corresponding chromedriver binary in /usr/local/bin (download from: https://developer.chrome.com/docs/chromedriver/downloads).
Just in case it helps - I only started using this project yesterday and am not familiar enough to offer this up as a proper solution!
- options = EdgeOptions()- if headless:- options.add_argument("--headless")- if edge_path:- options.binary_location = edge_path- if user_agent:- options.add_argument(f'user-agent={user_agent}') # Pass this if running headless and blocked by captcha-- if edge_driver_path:- service = Service(executable_path=edge_driver_path)- else:- service = Service(EdgeChromiumDriverManager().install())+ # options = EdgeOptions()+ # if headless:+ # options.add_argument("--headless")+ # if edge_path:+ # options.binary_location = edge_path+ # if user_agent:+ # options.add_argument(f'user-agent={user_agent}') # Pass this if running headless and blocked by captcha+ #+ # if edge_driver_path:+ #+ # if edge_driver_path:+ # service = Service(executable_path=edge_driver_path)+ # else:+ # service = Service(EdgeChromiumDriverManager().install())+ #+ # self.driver = webdriver.Edge(service=service, options=options)++ options = webdriver.ChromeOptions()+ self.driver = webdriver.Chrome(options=options)- self.driver = webdriver.Edge(service=service, options=options)
self.login()
Nice work, thanks for this libary. Getting a performance issue here - pages are taking forever to load in Selenium / Edge... but do load eventually. Here's my stats currently running after nearly an hour:
4%|███████▉ | 16/387 [43:24<18:02:09, 175.01s/it]
(Line 434: self.driver.get(url) ) taking forever - Edge automation just shows spinning wheel on tab with page apparently fully loading eventually.
I wonder if Substack have implemented anti-bot measures do you think? Have tested network connection, very fast and pages loading fine in BeautfulSoup. Apologies if someone's raised this already.
The text was updated successfully, but these errors were encountered: