Module 16 Big Data
Cover what constitutes big data and how it's handled.
Use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin.
Use PySpark, Pandas, or SQL to determine if there is any bias toward favorable reviews from Vine members in your dataset.
1- How many Vine reviews and non-Vine reviews were there?
Total reviews were 104,975
2- How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?
5 Star reviews were 52,255
3- What percentage of Vine reviews were 5 stars? What percentage of non-Vine reviews were 5 stars?
Around 50% of Vine reviews were 5 star
1- Based on the results there seems to be no bias.
2- There do not appear to be evident positive bias for reviews in the vine program. Using R we may do many different tests to support this statement.