This project aims to analyze the popularity of YouTube content across different regions by leveraging datasets sourced from Kaggle. It employs a systematic approach to data preprocessing, cleaning, and analysis using various AWS (Amazon Web Services) services including S3, Lambda, Glue, and others, to build an automated ETL pipeline.
The primary objective of this project is to provide insights into the most popular YouTube content in different regions through robust data processing and analysis techniques using Microsoft powerbi.
The below is the link for Kaggle dataset contains statistics (CSV files) on daily popular YouTube videos over the course of many months. https://www.kaggle.com/datasets/datasnaek/youtube-new
1.Data Collection: Utilizes datasets sourced from Kaggle, ensuring access to comprehensive and relevant YouTube data.
2.Automated Data Cleaning: Implements Lambda functions for efficient data cleaning and preprocessing, ensuring high-quality data for analysis.
3.ETL Pipeline: Constructs a streamlined ETL pipeline utilizing AWS Glue for seamless data extraction, transformation, and loading.
4.Analysis: Provides detailed reports and visualizations for analyzing the popularity of YouTube content across various content and regions.
Storage for raw and processed data.
Serverless computing for data preprocessing tasks.
Managed ETL service for data integration and transformation.
Interactive query service for analyzing data in S3 using standard SQL.
Business intelligence platform for data visualization and analytics.
1.Identify Trends:Discover popular content using views.
2.Regional Comparisons: Compare the popularity of YouTube content among different regions to understand regional preferences and trends.
3.Audience Engagement: Analyze audience engagement metrics such as views, likes, and comments to know the impact and reception of different types of content.