Skip to content

This project aims to extract the view, like and comment counts of Smash Ultimate tournaments to compute statistics and graphs for each character and player

Notifications You must be signed in to change notification settings

SchniderB/youtube-smash-stats

Repository files navigation

Manual to extract Youtube statistics from a playlist

Aim of the project

The aim of the project is to extract the view, like and comment counts of a set of Smash Ultimate tournament playlists from Youtube via Youtube's API to compute statistics and graphs per character and per player.

Set up Python virtual environment

First set-up

  1. Download virtualenv if you do not alread have it
pip install virtualenv
  1. Create virtual environment in your folder of interest
virtualenv yt-extractor
  1. Activate the virtual environment
source yt-extractor/bin/activate
  1. Install the libraries of interest in the virtual environment based on the requirements.txt file
pip install -r requirements.txt

Deactivate the virtual environment

deactivate

Re-activate virtual environment every time you need to extract data from youtube

source yt-extractor/bin/activate

Perform the statistic extraction

Extract the playlist IDs and titles of the tournaments of interest

A large number of playlist IDs and titles of the tournaments of interest were stored in the JSON file input_jsons/playlists.json in the following format:

{
    "PLcMdMmtHkPpR5epLsLfAT9OgVkAHGJgat": "Splendors and Contenders 2 - Smash Ultimate",
    "PLcMdMmtHkPpQpWKUm-ieB58c4pShNU0-G": "LACS Rivals - Smash Ultimate",
    "PLcMdMmtHkPpQP8nOfrhf1rUb-K-lkfgx_": "The Throne 2 - Smash Ultimate"
}

Store your Google API key, Youtube service name and Youtube api version into a local .env file

Extract all the statistics per Youtube video

python3 get_youtube_data.py

The output file is raw_video_stats.tsv.

Curate the extracted data and prepare the data for processing

Manually curate the file with all the statistics

Based on pattern recognition, the doubles, squadstrike, team matches and interviews were filtered out.

Identify the characters used in each set

Define all the characters' possible names in a JSON file

The file input_jsons/characters.json can be re-used for this purpose.

Assign a character list for each video

python3 get_character_names.py  # based on input_jsons/characters.json

The output file is character_video_stats.tsv and now contains the character annotation for each video.

Annotate the player names

Use a manual iterative approach to annotate all the player names

First annotate a few player names manually and propagate it to all their other games with the following script:

python3 get_player_names.py

Repeat the process until you have annotated as many player names as possible. Make sure to start with the larger player names first and the shorter player names at last (as you may get unwanted substring matches with short names).

Carefully review the player annotation

Many corrections were necessary due to the inconsistent player name annotation within the video titles and substring matching issues.

Compute all the statistics based on the curated data

python3 compute_stats.py

The output files containing the different statistics of interest are stored in the folder output_statistics

Compute all the graphs based on the newly computed statistics

python3 compute_graphs.py

The script compute_graphs.py will use the statistics previously generated in the folder output_statistics to build graphs. The output files containing the different graphs are stored in the folder output_graphs

As shown right below, the graphs are purposely long vertically so that they can be displayed with a constant scroll rate in a presentation video.

Average views per character

Project Timeline

  • Start Date: November 27, 2024
  • Completion Date: November 29, 2024
  • Maintenance status: Inactive

About

This project aims to extract the view, like and comment counts of Smash Ultimate tournaments to compute statistics and graphs for each character and player

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages