To collect data from YouTube, we need to be clear about what data we need. Let’s collect data about the trending videos on YouTube to analyze and find what makes a video trend on YouTube.
To collect data from YouTube, you need to set up an API. Here are the steps you can follow:
- Go to Google Cloud Console.
- Click on the project drop-down at the top, then “New Project”.
- Enter a project name and click “Create”.
- In the Google Cloud Console, navigate to “APIs & Services” > “Library”.
- Search for “YouTube Data API v3” and click on it.
- Click “Enable”.
- Go to “APIs & Services” > “Credentials”.
- Click “+ CREATE CREDENTIALS” and select “API key”.
- Copy the generated API key.
We have collected data about the top 200 trending videos on YouTube.
We are using the YouTube Data API to fetch details of the top 200 trending videos in the US, iterating through the API’s paginated responses to collect video details such as title, description, published date, channel information, tags, duration, definition, captions, and various engagement metrics like views, likes, and comments. The script compiles this information into a list, converts it into a pandas DataFrame, and saves the data to a CSV file named trending_videos.csv
, allowing us to analyze trends and patterns in the collected video data.
Used pandas for description of data collected through API i.e., trending_videos.csv
.
Checked for missing values and data types of columns. Found that the description column has 4 missing values. This is minor and can be handled as needed. The data types seem appropriate for most columns, but we may need to convert the published_at
column to a datetime format and tags might need further processing.
Parameters: view_count
, like_count
, dislike_count
, comment_count
Distribution of views, likes, and comments of all videos in data using matplotlib and Seaborn libraries.
The histograms show that the distributions of view counts, like counts, and comment counts are right-skewed, with most videos having lower counts and a few videos having very high counts.
Look at the correlation between likes, views, and comments.
The heatmap confirms strong positive correlations between views, likes, and comments. Understanding the correlation between these metrics can provide insights into how engagement on videos is related. For example:
- A high positive correlation between
view_count
andlike_count
suggests that videos with more views also tend to receive more likes. - A high positive correlation between
view_count
andcomment_count
suggests that videos with more views also tend to receive more comments.
Since we have collected only category ID, let’s collect category names also from the API.
Analyze the number of trending videos on YouTube.
The bar chart shows that the Gaming, Entertainment, Sports, and Music categories have the highest number of trending videos.
Look at the average engagement metrics by category.
Music and People & Blogs categories have the highest average view counts, likes, and comments. Film & Animation also shows high engagement, especially in view counts and like counts.
Using the isodate
library to convert the duration of each video from the ISO 8601 format to seconds, which allows for numerical analysis. After converting the durations, categorizing the videos into different duration ranges (0-5 minutes, 5-10 minutes, 10-20 minutes, 20-60 minutes, and 60-120 minutes) by creating a new column called duration_range
. This categorization enables us to analyze and compare the engagement metrics of videos within specific length intervals, providing insights into how video length influences viewer behavior and video performance.
Analyze the content and duration of the videos.
The scatter plot shows a slight negative correlation between video length and view count, indicating shorter videos tend to have higher view counts. Videos in the 0-5 minute range have the highest average view counts, likes, and comments. Engagement decreases as video length increases.
Analyze the relationship between views and number of tags.
The scatter plot shows a very weak relationship between the number of tags and view count, suggesting that the number of tags has minimal impact on a video’s view count.
The distribution shows that most videos are published between 14:00 and 20:00 hours (2 PM – 8 PM), indicating this may be an optimal time for uploading videos. There is a very weak negative relationship between publish hour and view count, suggesting that the hour of publication has minimal impact on engagement metrics.
So, here’s my conclusion on what makes a video trend on YouTube:
- Encourage viewers to like and comment on videos to boost engagement metrics.
- Aim to create shorter videos (under 5 minutes) for higher engagement, especially for categories like Music and Entertainment.
- Schedule video uploads around peak times (2 PM – 8 PM) to maximize initial views and engagement.