Research Questions

For our final project, we chose to work with a 2019 NYC Airbnb dataset which provides extensive pricing, location, and listing data for Airbnb rentals throughout the popular areas of New York City. Our dataset allows us to understand listing activity by host, geographical availability, and pricing history which we can use handily in our analysis.
In this final project, our main goal was to understand how these different listing metrics impact the pricing and popularity of these Airbnb rentals and how they differ across different areas of New York City. More specifically, we aimed to answer how price is impacted listing metrics such as neighborhood and room type. We also aimed to understand how the availability of these listings were impacted by the same metrics (location and room type), and on a more abstract level which hosts are the busiest and why.

Analysis

We will start off with a couple exploratory plots of our key variables. We will look at the univariate distributions of our variables price, room type, and neighborhood group. This will allow us to understand our individual variables for our later more involved models.

The above visualization shows that our price variable is extremely left skewed with the vast majority of Airbnbs having rental price less than $1000 and a couple outliers with price greater than $1000. Next, we see that the most prevalent listings are private rooms or entire apartments which makes sense contextually as Airbnb allows home owners to rent out their homes to visitor/tourists, so offering a shared room may not be appealing to most visitors. Lastly, we see that Brooklyn and Manhattan are the most prevalent neighborhoods in New York City, which again makes sense contextually as these are the most visited areas of New York City causing these neighborhoods to be the most popular for rentals.

We are now interested in the neighborhood groups in particular, but more specifically, what features about each neighborhood group appeals to a potential customer. This may relate to how the price for each neighborhood group varies. This can best be shown through analyzing the listing names on Airbnb, which often have descriptive features. Below are word clouds that show the most common terms for each neighborhood group.

Wordclouds of Listing Names per Neighbourhood Group

Wordclouds of Listing Names per Neighbourhood Group

We can see that these listing names include descriptive adjectives, promising features, and other information which makes it particularly useful in seeing what guests values in each of the neighborhood groups. An observation in particular is the fact that many groups have Manhattan and NYC as a frequent term, implying that it is a desired trait to be near Manhattan and the main city for many guests. In addition, each group has unique words such as Yankee and Stadium for the Bronx, airport for Queens, park for Manhattan, garden for Brooklyn, and beach for Staten Island. Manhattan has luxury and sunny as popular adjectives for its listings, while Staten Island has private and cozy as more prominent words.

Another key variable to explore is the number of reviews a listing has, which may help indicate the popularity and demand for that property. Below is a histogram that plots the frequency of the number of reviews by neighborhood group, conditional on the room type. This may tell us more about the popularity and demand for listings in different neighborhood groups and of different room types.

Frequency of Number of Reviews by Neighbourhood Group on Room Type

Frequency of Number of Reviews by Neighbourhood Group on Room Type

Based on this visualization, we can clearly see that most listings have between 0 and 100 reviews in all 5 neighborhood groups. Manhattan and Brooklyn seem to have the highest number of reviews which is expected since they are the biggest neighborhood groups. Staten Island and the Bronx have the least number of reviews. Finally, it can be seen that entire homes and apartments have the highest number of reviews in Manhattan, but private rooms have the highest number of reviews for all other neighborhood groups. From this, we can conclude that Manhattan and Brooklyn are the most popular neighborhood groups, and that private rooms and entire homes and apartments are the room types most in demand.

Moving onto our research questions, we want to understand how pricing is affected by different listing metrics such as room type and neighborhood (which we explored earlier). So, in this visualization, we looked at a conditional histogram of price by room type, and even further, we facetted our histograms so we could examine each neighborhood separately.
Price vs room type and neighborhood

Price vs room type and neighborhood

In the above plot, we examined the conditional distribution of price given room type and neighborhood, but we looked only at the Airbnb with lower prices (eliminating some outliers that may not be helpful for our visualization). We can see that entire homes tend to be more expensive on a nightly basis than private rooms, and this trend is consistent across each neighborhood that we looked at. Further, like earlier, we see that shared rooms are very uncommon, but when we do see shared room listings, the price is often the cheapest as they lie in the range of < $100. We can see that for each neighborhood, the most common listings are in the range of $100 to $300 per night with a simple private room being slightly cheaper (in the range $0 to $200) and entire homes being slightly more pricey and luxurious (in the range of $100 to $1000).

Next, we wanted to understand how geographic location impacts the price of these Airbnb listings. To do this, we created a map with each listing representing a point on our map of New York City.
Expensive listings by neighborhood

Expensive listings by neighborhood

In this plot, we looked specifically at the listings that were on the more expensive side (>= $1000 per night). We see that location does seem to dictate the pricing of these locations as the majority of the very expensive listings are located in Manhattan and Brooklyn. But even further, we see geographically that these listings gravitate towards the main, most popular areas of New York City. We understand this contextually that the most expensive listings are those located in the area with the most visitors and most appealing location.

We want to investigate the relationship of the number of Airbnb listings the owner has vs the price per night of the listing. To examine the relationship, we created a scatter plot of our two variables and is colored by neighborhood group.

Relationship between number of listing vs listing price

Relationship between number of listing vs listing price

When examining the dataset, we found Airbnb owners that had upwards of 100 listings. So we considered that the number of listings the owner has could have an effect on the price of a listing. From the scatter plot, we see that there are clear presence of outliers in terms of price where the Number of Listings Owned is small in value (< 10). Moreover, we see a greater spread of listing price as the number of listing owned decreases. As the number of listings owned increases, we see that the price of listings does not go over 1,250 dollars. An Airbnb owner, who has multiple properties, is likely that he/she owns multiple smaller apartments as opposed to having many large, expensive ones. So the negative relationship between the number of listings owned and listing price we see on the scatter plot, aligns with the context of the data as well.

Conclusion

From our investigation of the Airbnb NYC dataset, we found that the neighborhood, room type, and number of listings owned all have a relationship with the price of a listing. We also saw that the number of reviews a listing has shows us the demand in different neighborhood groups and room types, which may imply an indirect relationship with price. In addition, analysis of the different neighborhood group listing names found different desirable traits for each neighborhood group, which may also impact the price. Looking further at neighborhoods, we found that Airbnb’s in Manhattan and Brooklyn tend to be more expensive than the listings in the other boroughs. Moreover, listings that are the entire home are generally more expensive than the other room types (private room or shared room). Interestingly, we also found a negative relationship between number of listings an owner owned and price. We found that the more listings an owner has the cheaper the price of the listing is.