Manual to extract Youtube statistics from a playlist

Aim of the project

The aim of the project is to extract the view, like and comment counts of a set of Smash Ultimate tournament playlists from Youtube via Youtube's API to compute statistics and graphs per character and per player.

Set up Python virtual environment

First set-up

Download virtualenv if you do not alread have it

pip install virtualenv

Create virtual environment in your folder of interest

virtualenv yt-extractor

Activate the virtual environment

source yt-extractor/bin/activate

Install the libraries of interest in the virtual environment based on the requirements.txt file

pip install -r requirements.txt

Deactivate the virtual environment

deactivate

Re-activate virtual environment every time you need to extract data from youtube

source yt-extractor/bin/activate

Perform the statistic extraction

Extract the playlist IDs and titles of the tournaments of interest

A large number of playlist IDs and titles of the tournaments of interest were stored in the JSON file input_jsons/playlists.json in the following format:

{
    "PLcMdMmtHkPpR5epLsLfAT9OgVkAHGJgat": "Splendors and Contenders 2 - Smash Ultimate",
    "PLcMdMmtHkPpQpWKUm-ieB58c4pShNU0-G": "LACS Rivals - Smash Ultimate",
    "PLcMdMmtHkPpQP8nOfrhf1rUb-K-lkfgx_": "The Throne 2 - Smash Ultimate"
}

Store your Google API key, Youtube service name and Youtube api version into a local `.env` file

Extract all the statistics per Youtube video

python3 get_youtube_data.py

The output file is raw_video_stats.tsv.

Curate the extracted data and prepare the data for processing

Manually curate the file with all the statistics

Based on pattern recognition, the doubles, squadstrike, team matches and interviews were filtered out.

Identify the characters used in each set

Define all the characters' possible names in a JSON file

The file input_jsons/characters.json can be re-used for this purpose.

Assign a character list for each video

python3 get_character_names.py  # based on input_jsons/characters.json

The output file is character_video_stats.tsv and now contains the character annotation for each video.

Annotate the player names

Use a manual iterative approach to annotate all the player names

First annotate a few player names manually and propagate it to all their other games with the following script:

python3 get_player_names.py

Repeat the process until you have annotated as many player names as possible. Make sure to start with the larger player names first and the shorter player names at last (as you may get unwanted substring matches with short names).

Carefully review the player annotation

Many corrections were necessary due to the inconsistent player name annotation within the video titles and substring matching issues.

Compute all the statistics based on the curated data

python3 compute_stats.py

The output files containing the different statistics of interest are stored in the folder output_statistics

Compute all the graphs based on the newly computed statistics

python3 compute_graphs.py

The script compute_graphs.py will use the statistics previously generated in the folder output_statistics to build graphs. The output files containing the different graphs are stored in the folder output_graphs

As shown right below, the graphs are purposely long vertically so that they can be displayed with a constant scroll rate in a presentation video.

Project Timeline

Start Date: November 27, 2024
Completion Date: November 29, 2024
Maintenance status: Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manual to extract Youtube statistics from a playlist

Aim of the project

Set up Python virtual environment

First set-up

Deactivate the virtual environment

Re-activate virtual environment every time you need to extract data from youtube

Perform the statistic extraction

Extract the playlist IDs and titles of the tournaments of interest

Store your Google API key, Youtube service name and Youtube api version into a local `.env` file

Extract all the statistics per Youtube video

Curate the extracted data and prepare the data for processing

Manually curate the file with all the statistics

Identify the characters used in each set

Define all the characters' possible names in a JSON file

Assign a character list for each video

Annotate the player names

Use a manual iterative approach to annotate all the player names

Carefully review the player annotation

Compute all the statistics based on the curated data

Compute all the graphs based on the newly computed statistics

Project Timeline

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
input_jsons		input_jsons
output_graphs		output_graphs
output_statistics		output_statistics
.gitignore		.gitignore
README.md		README.md
compute-graphs.py		compute-graphs.py
compute-stats.py		compute-stats.py
get_character_names.py		get_character_names.py
get_player_names.py		get_player_names.py
get_youtube_data.py		get_youtube_data.py
requirements.txt		requirements.txt

SchniderB/youtube-smash-stats

Folders and files

Latest commit

History

Repository files navigation

Manual to extract Youtube statistics from a playlist

Aim of the project

Set up Python virtual environment

First set-up

Deactivate the virtual environment

Re-activate virtual environment every time you need to extract data from youtube

Perform the statistic extraction

Extract the playlist IDs and titles of the tournaments of interest

Store your Google API key, Youtube service name and Youtube api version into a local .env file

Extract all the statistics per Youtube video

Curate the extracted data and prepare the data for processing

Manually curate the file with all the statistics

Identify the characters used in each set

Define all the characters' possible names in a JSON file

Assign a character list for each video

Annotate the player names

Use a manual iterative approach to annotate all the player names

Carefully review the player annotation

Compute all the statistics based on the curated data

Compute all the graphs based on the newly computed statistics

Project Timeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Store your Google API key, Youtube service name and Youtube api version into a local `.env` file

Packages