-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate urls checking using concurrency (asyncio + aiohttp) #50
Comments
I think this is a great idea! If there is some reason that we'd want to keep the original implementation, then perhaps instead of doing Also, let's put in this change after the work I'm going to do this weekend to update the url arguments and spelling, just so it's easier to manage the different PRs. |
Those plots are beautiful, by the way! If we keep both fast / slow we could include a page in the docs that shows this difference. |
Oh yes that's another thing to consider - if we do retry, we can't have it running in parallel (it needs to honor the timeout /delay). |
okay @SuperKogito take it away! I won't do any more work / changes until you have had time to work on this. And no rush! I just started a new job so I'm a bit under 💧 😆 |
I will try to do my best 😜 Congratulations on the new job 🎉 |
Urls are checked using a loop that tests the response of the requests sequentially, which becomes slow for huge websites.
Img source
Alternatively, we can use concurrency to process requests& responses asynchronously and speed up the system.
Img source
I already integrated this concept in my local repo using the asyncio and AIOHTTP libraries and the results look promising. The speed difference is notable based on various blogs (Python and fast HTTP clients, HTTP in Python: aiohttp vs. Requests, Making 1 million requests with python-aiohttp) and so far my tests confirm that.
Img source
The new libraries are slightly different from requests and so the following is true:
I managed to almost replicate the same features we have in the current version but I will definitely need your feedback. Anyway, these differences bring me to my major question @vsoch : Do you think that we should add this feature as an option
--accelerated-run
or replace the current implementation with itThe text was updated successfully, but these errors were encountered: