-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page.goto: net::ERR_INVALID_ARGUMENT #327
Comments
I'm sorry, I cannot reproduce: # test.py
import scrapy
class TestSpider(scrapy.Spider):
name = "test"
custom_settings = {
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
"DOWNLOAD_HANDLERS": {
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
},
"PLAYWRIGHT_LAUNCH_OPTIONS": {
"headless": False,
},
"USER_AGENT": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
"LOG_LEVEL": "INFO",
}
def start_requests(self):
yield scrapy.Request(
url="URL",
meta={
"playwright": True,
"playwright_include_page": True,
},
)
async def parse(self, response):
page = response.meta["playwright_page"]
await page.screenshot(path="croma.png")
await page.close()
print("Response parsing")
print(response.xpath("//h1/text()").get())
Note that I had to set a custom User-Agent, otherwise I was getting 403 status responses. Versions used:
|
@elacuesta thanks for the quick reply. Yes it is working when working on project separately. but one more issue when I make
the prints |
I see. I suppose the site could be detecting and blocking headless browsers, I'm seeing the same behavior with standalone Playwright: import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as pw:
browser = await pw.chromium.launch(headless=False)
page = await browser.new_page()
await page.goto(URL)
await page.screenshot(path="page.png")
print(await page.locator("//h1").text_content())
await browser.close()
if __name__ == "__main__":
asyncio.run(main()) prints |
I am getting following error for my basic scrapy with playwright
error:
I am following this - https://scrapeops.io/python-scrapy-playbook/scrapy-playwright/
Why I am getting this error
(edited to adjust formatting)
The text was updated successfully, but these errors were encountered: