This project is a parser for a novel website designed to enhance the search and retrieval of novels. The website's native tools are insufficient for efficient searching, so this parser offers an improved solution. It supports both synchronous and asynchronous parsing, with the async parser being approximately 8 times faster.
- Synchronous Parsing: Basic parsing of the novel website.
- Asynchronous Parsing: Enhanced performance with async parsing, approximately 8 times faster than synchronous.
- Data Storage: Extracted data is saved in
data/novels_data.json
. - Error Handling: Robust error handling for fetching and parsing novel data.
- Logging: Comprehensive logging for monitoring and debugging.
This project uses the following libraries:
requests
beautifulsoup4
lxml
aiohttp
- Clone the repository:
git clone git@github.com:RomaP13/animestuff-parser.git
cd animestuff-parser/
- Install dependencies using pipenv:
pipenv install
To run the synchronous parser:
make parse
To run the asynchronous parser:
make async_parse