Skip to content

A Python script to scrape popular movie reviews from Letterboxd.com. This tool extracts review data including reviewer names, movie titles, ratings, review content, and like counts, then saves them to a CSV file and store to BigQuery.

Notifications You must be signed in to change notification settings

fajri-yanti/LetterBox-Popular-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Letterboxd Review Scraper

A Python script to scrape popular movie reviews from Letterboxd.com. This tool extracts review data including reviewer names, movie titles, ratings, review content, and like counts, then saves them to a CSV file and store to BigQuery.

Features

  • Scrapes popular reviews from Letterboxd's current year
  • Extracts key information:
    • Reviewer username
    • Movie title
    • Review date
    • Rating (including half stars)
    • Review content
    • Number of likes
  • Automatically formats dates for consistency
  • Exports data to CSV format
  • Storing to BigQuery

Prerequisites

Make sure you have Python 3.6+ installed and the following packages:

pip install requests
pip install beautifulsoup4
pip install pandas
pip install numpy
pip install python-dateutil
pip install psycopg2
pip install sqlalchemy

Installation

  1. Clone this repository or download the script:
git clone https://github.com/fajri-yanti/LetterBox-Popular-Review
cd webscrapiingletterbox
  1. Install the required packages:
pip install -r requirements.txt

Usage

  1. Basic usage:
python webscrapingletterbox.py

This will:

  • Scrape the most recent popular reviews from Letterboxd
  • Process and clean the data
  • Save the results to popular_reviews.csv

Data Format

The script creates a CSV file with the following columns:

  • Reviewer: Username of the review author
  • Title: Name of the movie
  • Date: Date the review was posted (YYYY-MM-DD format)
  • Rating: Movie rating (0-5 stars, including half stars)
  • Review: Full text of the review
  • Likes: Number of likes the review received

Code Structure

# Main components:
1. Web scraping setup (requests & BeautifulSoup)
2. Data extraction from HTML
3. Date parsing and formatting
4. DataFrame creation and cleaning
5. CSV export
6. Store to BigQuery

Error Handling

The script includes error handling for:

  • Missing webpage elements
  • Invalid dates
  • Missing review content

Database

Tabel letterbox

Limitations

  • Only scrapes the first page of popular reviews
  • Respects Letterboxd's HTML structure (may need updates if site changes)
  • Date parsing assumes English language dates

Future Improvements

Potential enhancements:

  1. Multi-page scraping
  2. Rate limiting for respectful scraping
  3. Additional metadata extraction
  4. Error logging

Contributing

Feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests

Disclaimer

This tool is for educational purposes only. Be responsible when scraping websites and respect the site's terms of service.

About

A Python script to scrape popular movie reviews from Letterboxd.com. This tool extracts review data including reviewer names, movie titles, ratings, review content, and like counts, then saves them to a CSV file and store to BigQuery.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages