Skip to content

Releases: DEENUU1/tvtime-scraper

v0.1-22.03.2024

22 Mar 14:49
Compare
Choose a tag to compare

TV Time Scraper

Script for gathering detailed information about movies and shows from the tvtime.com website

Report Bug · Request Feature

Features

  1. Comprehensive Movie/Show Collection: Users can collect movies/show either from all available categories or selectively choose specific categories according to their preferences.

  2. Automated Cookie Modal Handling: The application automatically manages and closes cookie consent pop-ups, ensuring uninterrupted browsing and data scraping experience.

  3. Detailed Movie/Show Information: Users can retrieve in-depth details about movies or shows, including ratings, descriptions, cast members, and keywords, enhancing the understanding of the content available.

  4. Efficient Data Export: The collected data can be efficiently exported to JSON files with pagination support, enabling organized storage and further analysis of the scraped information.

  5. Dynamic Page Scrolling: A dedicated function automatically scrolls through webpages, facilitating continuous data retrieval and execution of associated callback functions for enhanced efficiency.

Commands & Examples

Scrape all movies/shows from each genre

python main.py list-scraper

Scrape all movies/show from the given genre

python main.py list-scraper-url <url_here>

for example

python main.py list-scraper-url https://www.tvtime.com/pl/genres/action

Scrape details for each movies/show in database

python main.py details-scraper

Export data to JSON

python main.py export-to-json --start_page 1 --page-limit 12
[
    {
        "id": "df22da8c-db97-4c27-86a0-e440b2d67414",
        "title": "Wakfu",
        "genre": "AKCJA",
        "production_year": null,
        "image": "https://www.tvtime.com/_next/image?url=https%3A%2F%2Fartworks.thetvdb.com%2Fbanners%2Fv4%2Fseries%2F94121%2Fposters%2F65d88839b6802_t.jpg&w=750&q=75",
        "hours": null,
        "minutes": null,
        "url": "https://www.tvtime.com/show/94121",
        "type": "Show",
        "details": true,
        "rating": null,
        "description": "Follow Yugo and his friends Amalia, Evangelyne, Tristepin, Ruel and Az as they try to rescue the World of Twelve from destruction.",
        "keywords": "FANTASY,,RODZINNY,,ANIMACJA,,PRZYGODOWY,,AKCJA",
        "actors": [
            {
                "full_name": "G\u00e9rard Surugue",
                "image": "https://www.tvtime.com/_next/image?url=https%3A%2F%2Fartworks.thetvdb.com%2Fbanners%2Fv4%2Factor%2F621366%2Fphoto%2F65324fa2daf1c_t.jpg&w=256&q=75",
                "url": "https://www.tvtime.com/people/621366-gerard-surugue",
                "id": "f16ff459-8ccd-451e-a04b-e3bbb90d6294"
            },
            {
                "full_name": "Thomas Guitard",
                "image": "https://www.tvtime.com/_next/image?url=https%3A%2F%2Fartworks.thetvdb.com%2Fbanners%2Fv4%2Factor%2F7950847%2Fphoto%2F65c62c81bac17_t.jpg&w=256&q=75",
                "url": "https://www.tvtime.com/people/7950847-thomas-guitard",
                "id": "b6813e27-f892-4b24-bb05-023e0538856b"
            }
        ]
    },

By default start_page is set to 1 and page_limit to 50 so you don't need to pass this options to your command
for example

python main.py export-to-json

Technologies:

  • Python
    • Selenium
    • Typer
  • SQLite
  • Docker

Installation

Clone repository

git clone https://github.com/DEENUU1/tvtime-scraper.git

Without docker

Install requirements

pip install -r requirements.txt

Run specified command

python main.py <command_here>

With docker

Build image

docker build -t scraper .

Run specified command

docker run scraper <command_here>

Authors

License

See LICENSE.txt for more information.