IMDb Scraper

This project is aimed at scraping the IMDb Top 250 movies list, extracting various details about each movie, and exporting the data into different formats such as JSON, Excel, XML, CSV, and SQLite database. Additionally, it is automated to run daily for scheduled updates.

Overview

This project is aimed at scraping the IMDb Top 250 movies list, extracting various details about each movie, and exporting the data into different formats such as JSON, Excel, XML, CSV, and SQLite database.

The IMDb scraper is implemented in Python and utilizes the following libraries:

selenium for web scraping
BeautifulSoup for HTML parsing
pandas for data manipulation and exporting to Excel
xml.etree.ElementTree for exporting data to XML
sqlite3 for SQLite database operations

The scraper extracts details such as movie title, year, length, image URL, rating, votes, URL, story, genres, directors, writers, stars, popularity, etc., for each movie in the IMDb Top 250 list.

Dependencies

Before running the IMDb scraper, ensure you have the required dependencies installed. You can install them using pip with the provided requirements.txt file.

Clone this repository to your local machine.
Navigate to the project directory.
Install dependencies using the following command:
```
pip install -r requirements.txt
```

Usage

To use the IMDb scraper:

Clone this repository to your local machine.
Navigate to the project directory.
Run the scraper using the following command:
```
python scraper.py
```

The scraper will extract data from the IMDb Top 250 list, process it, and export it to JSON, Excel, XML, CSV, and SQLite database formats in the output_data folder.

Automating the Scraper

The scraper can be automated to run daily at a specified time using the schedule library. The automate.py script schedules the scraper to run every day at 6:30 AM. Adjust the schedule timing in the script if needed.

To run the automation script:

python scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IMDb Scraper

Contents

Overview

Dependencies

Usage

Automating the Scraper

! CODE IS NOT FREE

Files

README.md

Latest commit

History

README.md

File metadata and controls

IMDb Scraper

Contents

Overview

Dependencies

Usage

Automating the Scraper

! CODE IS NOT FREE