A Python script to scrape popular movie reviews from Letterboxd.com. This tool extracts review data including reviewer names, movie titles, ratings, review content, and like counts, then saves them to a CSV file and store to BigQuery.
- Scrapes popular reviews from Letterboxd's current year
- Extracts key information:
- Reviewer username
- Movie title
- Review date
- Rating (including half stars)
- Review content
- Number of likes
- Automatically formats dates for consistency
- Exports data to CSV format
- Storing to BigQuery
Make sure you have Python 3.6+ installed and the following packages:
pip install requests
pip install beautifulsoup4
pip install pandas
pip install numpy
pip install python-dateutil
pip install psycopg2
pip install sqlalchemy
- Clone this repository or download the script:
git clone https://github.com/fajri-yanti/LetterBox-Popular-Review
cd webscrapiingletterbox
- Install the required packages:
pip install -r requirements.txt
- Basic usage:
python webscrapingletterbox.py
This will:
- Scrape the most recent popular reviews from Letterboxd
- Process and clean the data
- Save the results to
popular_reviews.csv
The script creates a CSV file with the following columns:
Reviewer
: Username of the review authorTitle
: Name of the movieDate
: Date the review was posted (YYYY-MM-DD format)Rating
: Movie rating (0-5 stars, including half stars)Review
: Full text of the reviewLikes
: Number of likes the review received
# Main components:
1. Web scraping setup (requests & BeautifulSoup)
2. Data extraction from HTML
3. Date parsing and formatting
4. DataFrame creation and cleaning
5. CSV export
6. Store to BigQuery
The script includes error handling for:
- Missing webpage elements
- Invalid dates
- Missing review content
- Only scrapes the first page of popular reviews
- Respects Letterboxd's HTML structure (may need updates if site changes)
- Date parsing assumes English language dates
Potential enhancements:
- Multi-page scraping
- Rate limiting for respectful scraping
- Additional metadata extraction
- Error logging
Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
This tool is for educational purposes only. Be responsible when scraping websites and respect the site's terms of service.