Skip to content

Aniruddhsinh03/proxy_rotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Proxy management for web scraping

This project is a Scrapy-based web scraper designed to scrape quotes from quotes.toscrape.com with proxy rotation support using the scrapy-rotating-proxies middleware. The proxy rotation helps in avoiding IP bans and throttling by the target website.

Getting Started

Prerequisites

Make sure you have Python 3.x installed along with Scrapy. To install Scrapy, use:

pip install scrapy

Additionally, install the scrapy-rotating-proxies middleware:

pip install scrapy-rotating-proxies

Project Structure

proxy_rotation/
├── proxy_rotation/
│   ├── __init__.py
│   ├── middlewares.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders/
│       ├── __init__.py
│       └── quotes_proxy_rotation_spider.py
├── scrapy.cfg
└── README.md

Installing

Clone the repository and navigate to the project directory:

git clone https://github.com/yourusername/proxy_rotation.git
cd proxy_rotation

Configuration

  1. Proxies Setup:

    Update the list of proxies in the settings.py file:

    ROTATING_PROXY_LIST = [
     # if proxy need authentication then list will be in below format
     'http://username:password@proxy1.com:8000',
     # list without authentication
     # 'ip:port'
     'proxy1.com:8000'
     # Add more proxies as needed
    ]
    • If your proxies require authentication, use the format http://username:password@proxy:port.
    • If your proxies do not require authentication, use the format ip:port.
  2. Spider Setup:

    The spider is located in spiders/quotes_proxy_rotation_spider.py. It scrapes quotes and author details from the website.

  3. Run the Spider:

    Run the spider using the following command:

    scrapy crawl quotes_proxy_rotation_spider

Features

  • Proxy Rotation: Uses multiple proxies to avoid IP bans and throttling by rotating through a list of proxies.

  • Robust Scraping: Handles pagination and extracts detailed information about quotes and authors.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages