Our research dataset is based on Zomato, a food reviewing dataset for restaurants in the area of Bengaluru(Bangalore). This data contains approximately 50,000 reviews of over 12,000 restaurants and this has caused issues of more competitive costs and shortages. We will examine what customers in Bengaluru are looking for in their food and propose 3 main questions to further explore ways to understand the preferences of restaurant-goers.
Our main research questions are:
Are ratings affected by restaurant types or by accessibility of food, and how are ratings and votes distributed in our dataset?
How does cost affect ratings in highly rated restaurants and in poorly rated restaurants?
What words are commonly used to describe positive reviews? What words are commonly used to describe negatively reviews? Does rating affect the positive or negative words used in reviews?
Question: How is rating and votes distributed in the data and what does that tell us about the customer preferences in Bangalore for restaurant type and accessability?
Firstly, we examine histograms of the variables rating and log(votes) so that we can perform analysis on the data before further graphically investigating relationships with those variables. We can see that the histogram of rating looks relatively normal with a median around 3.7. Looking at the histogram of the variable votes, we see that votes is significantly right-skewed and thus perform a log transformation to make the graph more approximately normal. We can see that there is still some variance from normality in our graph however it appears more normal after a transformation. We can takeaway that these variables are appropriate to analyze further analysis on customer preferences based on ratings.
To further understand the demographics of customer ratings, we looked at both binary variable online_order and categorical variable listed_in.type. Using those variables we can see in the above graphs that we can better understand what high rating restaurants are like in comparison to lower rating restaurants. We can see for places with online order, generally dine out and delivery places have the highest percent of ratings, however pubs and bars is another category that significantly increases in higher ratings. We can also see in our other graph where we have ratings plotted against votes. We can see that the most popularly voted places were relatively high ratings and drinks & nightlife categories. We can also see that dine-out and pubs and bars were also highly reviewed categories.
We can see that there seems to be a preference for relatively higher cost experiences in Bangalore such as dine-out and drink & nightlife and we can further investigate these relationships by also answer our following research questions.
How do restaurants in Bangalore with higher ratings differ with respect to approximate cost for two people and number of rating votes as compared to restaurants with lower ratings?
In order to answer this question, we produced a contour plot between log(votes) and log(cost) that is colored by rating. The contour lines show the joint distribution between log(votes) and log(cost), and darker shaded points mean that there are more observations of data at that point in the scatterplot.
There a few interesting takeaways that we gain from this visualization. The first is that lower rated restaurants (below 3 )have higher approximate cost for two people than the majority of medium rated restaurants (between 3 and 4). The data is centered around log(votes) = 4 and log(cost) = 6, yet many of the red and orange points on the plot appear above this cost threshold. Interestingly, this is also true for higher rated restaurants (above 4); almost every single blue point appears above the threshold of log(cost) = 6.
On top of this, higher rated restaurants received more votes than both medium and low rated restaurants. Furthermore, these higher rated restaurants always have more log(votes) than the average log(votes) contained in the data. It is also intriguing to note that all the restaurants with less than 10 votes (i.e log(votes) <= 2) are very similar in terms of ratings, but vary greatly in terms of log(cost).
What words are used to describe positive reviews and what words are used to describe negative reviews? What words are used to describe lower rated reviews and what words are used to describe higher rated reviews?
We produced four word clouds to answer these questions. The word clouds are positive words in higher rated reviews, positive words in lower rated reviews, negative words in higher rated reviews, and negative words in lower rated reviews. We can interpret each graph by observing the larger words in the middle as more representation in that specific grouping of the dataset.
The main takeaway from the positive reviews is that they are ‘good’. The main takaway from the negative reviews is that they are ‘bad’. However, what we see with the positive words is that the word ‘good’ has much higher representation than the next most used words. With the negative reviews, we observe more words to describe the restaurant. This may mean that positive reviews are typically done with less thought and described as ‘good’. On the other hand, negative reviews are written more thoughtfully as more words are used to describe the restaurant.
There is very little difference between higher rated reviews and lower rated reviews. However, one difference is that the word ‘worst’ comes out more often in negative words in lower rated reviews. One way to describe a low rated restaurant would be to call it the ‘worst’ as it is more bad than ‘bad’.
From our research, we can see that there is definitely growing competition amongst the restaurants in Bengalaru. As the dataset prefaces, there is an oversaturated market for restaurants, and labor shortages and high competition are big challenges of the industry. We can see that as such, this has driven the average cost for places that are rated high up. In addition, we can see that customers are pronounced with their preferences and we can see that there are many differences in the types of reviews customers give. Overall, we can see that there are visually apparent differences in restaurant preferences throughout Bangalore.