You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/browser (BrowserCrawler)
Issue description
I've been sporadically getting the error message "requestHandler timed out after 130 seconds" on some crawls.
It turns out that my requestHandler is not timing out at all. In fact, my requestHandler code is not even being called.
I'm using PlaywrightCrawler, but the issue is that BrowserCrawler's _runRequestHandler is delegated to BasicCrawler's _runTaskFunction. This function adds a timeout to the _runRequestHandler call, which might normally be the user's requestHandler, but in the case of PlaywrightCrawler, its the wrapped BrowserCrawler's _runRequestHandler.
So in my case, something is going wrong periodically with the creation of the page on the browser pool or possibly with the navigation or cookies, but I had no way of knowing that from the logs. The requestHandler time out should be reserved for the user level code and not encapsulate or capture issues with the browser pool as that is rather confusing.
As a separate issue, periodically (and seemingly at random) one of the calls in BrowserCrawler's _runRequestHandler prior to calling the user's requestHandler is hanging indefinitely. I suspect it might be the call to open the Playwright page, but haven't been able to verify that. I think it needs a separate timeout over the creation of the page and the subsequent page loading activities, this could be a much shorter timeout as I expect these calls are usually relatively fast. And I think that if one of these timeouts occur, it needs to reset the browser, abandon the session, or similar before retrying the request.
Code sample
No response
Package version
3.12.1
Node.js version
20
Operating system
No response
Apify platform
Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
The text was updated successfully, but these errors were encountered:
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/browser (BrowserCrawler)
Issue description
I've been sporadically getting the error message "requestHandler timed out after 130 seconds" on some crawls.
It turns out that my requestHandler is not timing out at all. In fact, my requestHandler code is not even being called.
I'm using PlaywrightCrawler, but the issue is that BrowserCrawler's
_runRequestHandler
is delegated to BasicCrawler's_runTaskFunction
. This function adds a timeout to the_runRequestHandler
call, which might normally be the user's requestHandler, but in the case of PlaywrightCrawler, its the wrapped BrowserCrawler's_runRequestHandler
.So in my case, something is going wrong periodically with the creation of the page on the browser pool or possibly with the navigation or cookies, but I had no way of knowing that from the logs. The requestHandler time out should be reserved for the user level code and not encapsulate or capture issues with the browser pool as that is rather confusing.
As a separate issue, periodically (and seemingly at random) one of the calls in BrowserCrawler's
_runRequestHandler
prior to calling the user's requestHandler is hanging indefinitely. I suspect it might be the call to open the Playwright page, but haven't been able to verify that. I think it needs a separate timeout over the creation of the page and the subsequent page loading activities, this could be a much shorter timeout as I expect these calls are usually relatively fast. And I think that if one of these timeouts occur, it needs to reset the browser, abandon the session, or similar before retrying the request.Code sample
No response
Package version
3.12.1
Node.js version
20
Operating system
No response
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
The text was updated successfully, but these errors were encountered: