Collect OpenTable reviews with Web Scraping
This repo provides the python script to scrape OpenTable reviews.
There are 4 user ratings in each review, the main script will get all the reviews and corresponding overall rating from the target restaurants.
Sample format:
review overall rating
0 Great ambiance and service. Lots of menu choic... 3
1 Exceptional service, cuisine, ambience. Windo... 4
2 Our server Darcy was wonderful! She accommoda... 2
3 Great food choices for lunch and excellent ser... 3
4 Always reliable and great place to go for lunc... 4
... ... ...
13438 Our first visit to Chophouse. We will not go b... 3
13439 Friendly and attentive service and the food an... 4
13440 My family and I had an amazing time! Not only ... 4
13441 Great food 4
13442 Great food and excellent service. We’ll be back!! 4
- Find the target restaurant website in OpenTable, go to page 2 of review page.
- Copy the url of review page 2, e.g. https://www.opentable.ca/r/chez-mal-manchester?page=2&sortBy=newestReview
- Place the urls in url_list for training dataset and eval_url for validation dataset.
- Run the main script, it will save the df as csv with two columns: reviews and overall rating of the restaurants.
You may use below function to read the .csv file
def load_data(path, name):
df = pd.read_csv(path)
df = df.rename(columns={'review': 'text', 'overall rating': 'label'})
dataset = Dataset.from_pandas(df, split=name)
return dataset
For more examples, please refer to:
https://huggingface.co/docs/datasets/main/en/loading#pandas-dataframe
https://huggingface.co/docs/datasets/main/en/tabular_load