>--------------------------------- Scrapy dappy doo crawler for proxy sites

  • Crawls proxy sites for working proxies
  • Scrapyd server to initiate crawl and get results
  • Retain jobs and logs for recent crawls


# Copy the example environment file to .env
cp .env.example .env

# Build the docker image and run the container
docker-compose up --build --detach

# Run a scrapy crawl job via cli
# docker-compose exec -it scrapyd scrapy crawl <spider_name>
docker-compose exec -it scrapyd scrapy crawl freeproxylist

# Run a scrapy crawl job via scrapyd api
# Scrapyd documentation:
curl http://localhost:6800/schedule.json -d project=scrapydoo -d spider=freeproxylist

Scrapyd API is now available at http://localhost:6800.


  • root: / - Scrapyd server
  • jobs: /jobs - crawl jobs
  • items: /items - scraped items
  • logs: /logs - spider logs


provided by scrapyd server

  • daemonstatus: /daemonstatus.json - to check the load status of a service
  • addversion: /addversion.json - to add a new version of a project
  • schedule: /schedule.json - to schedule a spider run
  • cancel: /cancel.json - to cancel a spider run
  • listprojects: /listprojects.json - to list all projects
  • listversions: /listversions.json - to list all versions of a project
  • listspiders: /listspiders.json - to list all spiders of a project
  • listjobs: /listjobs.json - to list all pending, running and finished jobs
  • delversion: /delversion.json - to delete a version of a project
  • delproject: /delproject.json - to delete a project


# Poetry is required for installing and managing dependencies
poetry install

# Run the crawlers
#poetry run scrapy crawl <spider_name>
poetry run scrapy crawl freeproxylist

# Install pre-commit hooks
poetry run pre-commit install

# Formatting (inplace formats code)
poetry run black .

# Linting (and to fix automatically)
poetry run ruff .
poetry run ruff --fix .

# Type checking
poetry run mypy .

Configuration details can be found in pyproject.toml.

