Skip to content

Used MapReduce, (NLP) in relation to big data, Google Colab, AWS , Visual Studio Code to Analyze Amazon Vine Reviews

Notifications You must be signed in to change notification settings

klaudio07/Amazon_Vine_Analysis

Repository files navigation

Amazon_Vine_Analysis

Module 16 Big Data

Overview of the analysis.

  • Cover what constitutes big data and how it's handled.

  • Use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin.

  • Use PySpark, Pandas, or SQL to determine if there is any bias toward favorable reviews from Vine members in your dataset.

Results:

1- How many Vine reviews and non-Vine reviews were there?

Total reviews were 104,975

2- How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?

5 Star reviews were 52,255

3- What percentage of Vine reviews were 5 stars? What percentage of non-Vine reviews were 5 stars?

Around 50% of Vine reviews were 5 star

Summary:

1- Based on the results there seems to be no bias.

2- There do not appear to be evident positive bias for reviews in the vine program. Using R we may do many different tests to support this statement.

About

Used MapReduce, (NLP) in relation to big data, Google Colab, AWS , Visual Studio Code to Analyze Amazon Vine Reviews

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published