-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does not shutdown cleanly on SIGINT (cmd+c) #62
Comments
So to answer my own question, I have solved it. This is a problem upstream with Playwright, currently there isn't a way to prevent a sigint being passed down to playwright. The other thing I have done is monkey patch playwright to stop passing down the sigint:
This is inspired by https://stackoverflow.com/a/5446982/18244376 and only worked on Posix. Going to file a bug with Playwright asking for an api to enable this. |
Thank you for the research and for opening the upstream feature request, let's wait and see what they suggest. |
It seems that when using scrapy-playwright Scrapy will not shut down cleanly on SIGINT (
cmd+c
), and you have to force a shutdown with a secondcmd+c
. If you use the telnet client to runengine.stop()
it does seem to be shutting down cleanly. This is needed in order to save the current state for resuming, an unclean shutdown does not save the current state and cannot be resumed. It stops scraping but continues to log its stats every minute.On some investigation Playwright has a launch arg
handle_sigint
which I believe indicates to forward a SIGINT to the browser process. I thought it may have been this forwarding of the SIGINT that was causing the hang in the handling of shutdown. When I set this tofalse
(it defaults totrue
) withPLAYWRIGHT_LAUNCH_OPTIONS
it still doesn't shutdown cleanly, however all chromium processes are stoped but Scrapy does not shut down and there are no errors, even during a forced shutdown. It just continues to report its stats every minute showing 0 more pages scraped.I'm happy to continue investigating how to fix this but would appreciate any pointers in the right direction as to where the issue may be arising.
Edit:
Just want to add I am confident that what is happening is that the SIGINT is still passed on to the browser. With
engine.stop()
via telnet I see a very graceful shut down (up to a couple of minutes long) with pages currently being processed in myparse
call-backs all finishing. SIGINT/cmd+c just immediately kills the browser processes.The text was updated successfully, but these errors were encountered: