To run the code the following libraries are required:
numpy
seaborn
pandas
matplotlib
spacy
nltk
wordcloud
pickle
PIL
gensim
pyLDAvis
tabulate
re
sklearn
vaderSentiment
langdetect
deep_translator
textblob
The analysis has been run on JupiterLab using Python 3.9..
For this project, I chose the Airbnb dataset of the city of Florence. Specifically, I attempted to answer the following questions using the most popular Natural Language Processing techniques applied to reviews data:
- How do guests experience their stay in Airbnb in Florence?
- What are the main topics in guest reviews?
- How best to predict the topic of a new review?
There are 5 notebooks available to answer the questions above. Each of the notebooks is exploratory in researching through the data and with the support of the machine learning models highlighted by the notebook title. Markdown cells have been used to guide you through the process at each individual step.
-
Data_Cleaning_Reviews_AirbnbFlorence.ipynb
-
Data_Cleaning_Listings_AirbnbFlorence.ipynb
-
Sentiment_Analysis_AirbnbFlorence.ipynb
-
Topic_Modeling_AirbnbFlorence.ipynb
-
Topic_Classification_Airbnb_Florence.ipynb
The main findings of the analysis are discussed in the blog post "How To Get Useful Insights From Airbnb Reviews" available here.
Open-source data from http://insideairbnb.com/get-the-data.html (data used compiled: 12 July, 2021)