Youtube_data_analytics_project

This project aims to analyze the popularity of YouTube content across different regions by leveraging datasets sourced from Kaggle. It employs a systematic approach to data preprocessing, cleaning, and analysis using various AWS (Amazon Web Services) services including S3, Lambda, Glue, and others, to build an automated ETL pipeline.

Objective:

The primary objective of this project is to provide insights into the most popular YouTube content in different regions through robust data processing and analysis techniques using Microsoft powerbi.

Solution Approach:

Dataset :

The below is the link for Kaggle dataset contains statistics (CSV files) on daily popular YouTube videos over the course of many months. https://www.kaggle.com/datasets/datasnaek/youtube-new

1.Data Collection: Utilizes datasets sourced from Kaggle, ensuring access to comprehensive and relevant YouTube data.

2.Automated Data Cleaning: Implements Lambda functions for efficient data cleaning and preprocessing, ensuring high-quality data for analysis.

3.ETL Pipeline: Constructs a streamlined ETL pipeline utilizing AWS Glue for seamless data extraction, transformation, and loading.

4.Analysis: Provides detailed reports and visualizations for analyzing the popularity of YouTube content across various content and regions.

Technologies Used:

Amazon S3:

Storage for raw and processed data.

AWS Lambda:

Serverless computing for data preprocessing tasks.

AWS Glue:

Managed ETL service for data integration and transformation.

Amazon Athena:

Interactive query service for analyzing data in S3 using standard SQL.

Vizualization(Microsoft PowerBI) :

Business intelligence platform for data visualization and analytics.

Expected Outcome:

1.Identify Trends:Discover popular content using views.

2.Regional Comparisons: Compare the popularity of YouTube content among different regions to understand regional preferences and trends.

3.Audience Engagement: Analyze audience engagement metrics such as views, likes, and comments to know the impact and reception of different types of content.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CLI_commands.sh		CLI_commands.sh
README.md		README.md
lambda_function.py		lambda_function.py
pyspark_code_for_glue_job.py		pyspark_code_for_glue_job.py
youtube_analytics.pdf		youtube_analytics.pdf
youtube_analytics_dashboard.pbix		youtube_analytics_dashboard.pbix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Youtube_data_analytics_project

Objective:

Solution Approach:

Dataset :

Technologies Used:

Amazon S3:

AWS Lambda:

AWS Glue:

Amazon Athena:

Vizualization(Microsoft PowerBI) :

Expected Outcome:

About

Languages

NSVpriya/Youtube_Data_ETL_Project

Folders and files

Latest commit

History

Repository files navigation

Youtube_data_analytics_project

Objective:

Solution Approach:

Dataset :

Technologies Used:

Amazon S3:

AWS Lambda:

AWS Glue:

Amazon Athena:

Vizualization(Microsoft PowerBI) :

Expected Outcome:

About

Topics

Resources

Stars

Watchers

Forks

Languages