crawler
is a web-crawler for the 20th Birdhouse project. It scans
the Web (or theoretically every protocol supported by
reqwest) for URLs, and parses HTML
with Servo's html5ever.
Run with cargo. It's recommended to provide RUST_LOG=crawler=info
to get its
status as it crawls. Provide a URL to start with as well. You will also want to
pipe stdout to a file.
RUST_LOG=crawler=info cargo run https://github.com >urls