The aim of the project is to extract the view, like and comment counts of a set of Smash Ultimate tournament playlists from Youtube via Youtube's API to compute statistics and graphs per character and per player.
- Download
virtualenv
if you do not alread have it
pip install virtualenv
- Create virtual environment in your folder of interest
virtualenv yt-extractor
- Activate the virtual environment
source yt-extractor/bin/activate
- Install the libraries of interest in the virtual environment based on the
requirements.txt
file
pip install -r requirements.txt
deactivate
source yt-extractor/bin/activate
A large number of playlist IDs and titles of the tournaments of interest were stored in the JSON file input_jsons/playlists.json
in the following format:
{
"PLcMdMmtHkPpR5epLsLfAT9OgVkAHGJgat": "Splendors and Contenders 2 - Smash Ultimate",
"PLcMdMmtHkPpQpWKUm-ieB58c4pShNU0-G": "LACS Rivals - Smash Ultimate",
"PLcMdMmtHkPpQP8nOfrhf1rUb-K-lkfgx_": "The Throne 2 - Smash Ultimate"
}
python3 get_youtube_data.py
The output file is raw_video_stats.tsv
.
Based on pattern recognition, the doubles, squadstrike, team matches and interviews were filtered out.
The file input_jsons/characters.json
can be re-used for this purpose.
python3 get_character_names.py # based on input_jsons/characters.json
The output file is character_video_stats.tsv
and now contains the character annotation for each video.
First annotate a few player names manually and propagate it to all their other games with the following script:
python3 get_player_names.py
Repeat the process until you have annotated as many player names as possible. Make sure to start with the larger player names first and the shorter player names at last (as you may get unwanted substring matches with short names).
Many corrections were necessary due to the inconsistent player name annotation within the video titles and substring matching issues.
python3 compute_stats.py
The output files containing the different statistics of interest are stored in the folder output_statistics
python3 compute_graphs.py
The script compute_graphs.py
will use the statistics previously generated in the folder output_statistics
to build
graphs. The output files containing the different graphs are stored in the folder output_graphs
As shown right below, the graphs are purposely long vertically so that they can be displayed with a constant scroll rate in a presentation video.
- Start Date: November 27, 2024
- Completion Date: November 29, 2024
- Maintenance status: Inactive