Introduction to Data Set
The dataset we are looking at describes Airbnb’s in New York City. It has 48.9k rows, and 16 columns. Each observation refers to an individual Airbnb listening in NYC. The categorical variables that are in the data are neighborhood group (aka Manhattan, Brooklyn, or other), and the specific neighborhood, the room type (full apartment, private room, or other). The quantitative variables are the latitude/longitude coordinates, the price, minimum night stay, the number of reviews, the amount of listings per host, the date of the last review, and the number of days the listing is available for booking out of 365 days. Other variables include the host/listing ID and the listing/host name.
Main Research Questions
For this project, we (Akshara Ramakrishnan, Grace Cui, Iris Pei) decided to investigate three main questions:
How do the prices of New York Airbnb’s depend on their location? Are Airbnb’s generally more expensive when located in a certain geographical area?
What factors affect the availability of NYC Airbnb’s? Do things like price, the type of home, and types of stays available affect when NYC Airbnb’s are available?
How does the target customer affect the way owners advertise their rentals? Are there specific keywords in listings used in order to attract certain types of customers?
Graph One (Question One)

Graph One: For our first research question, we made a facetted stacked histogram. It is facetted on the type of room, colored by the neighbourhood, and we also added a new variable called "price_ranges" so we would be able to more clearly see the ranges of prices Airbnb's generally fell into. We can see from this graph that most Airbnb's in this dataset were entire homes/apts, and very little of them were shared rooms/ We can also see that most Airbnb's fall into the $0-$200/per night price range, with most of the Airbnb's being located in Manhattan and Brooklyn. For entire home/apt Airbnb's, the distribution is relatively bimodal. For Airbnb's that are private rooms, their distribution has a right skew and peaks at the $50-$100/night price range. We can see from this graph that A) most Airbnb's are located in Manhattan/Brooklyn and b) many of the more expensive Airbnb's are located in Manhattan.
Graph Two (Question One)

Graph Two: This graph shows a more geographical representation of our first question, which asks if NYC Airbnb's prices depend on location. We can see in this graph that most Airbnb's are priced in the $0-$500 range, and that there is a pretty even spread of Airbnb's across the NYC Metropolitan Area. There are very few airbnb's that fall into the $500+/night price range anywhere in NYC. There are some slightly redder (aka more expensive) clusters near the Manhattan area, though.
Graph Three (Question Two)

Graph Three: According to this graph, most shared room AirBnBs have a low minimum night requirement, while entire home/apartments can have much higher minimum night requrements. It looks like more private AirBnBs have a higher minimum night requirement to ensure that they are booked for most of their available days. Shared rooms can have lower minimnum night requiremenets because of higher rotation of tenants. It's interesting that AirBnBs can require multiple years' worth of night stays -- isn't that basically a rent?
Graph Four (Question Two)

Graph Four: This graph shows that most shared room AirBnBs either have high availability (> 300 days per year) or low availability (< 100 days per year). Shared rooms are available for booking more days out of the year. Entire home/apartments and private rooms are available for similar amounts of days out of the year. TThe distribution of availability of all three room types are right skewed.
Graph Five (Question Three)

Graph Five: First, we created a word cloud of all the words that owners used in order to list their rentals. Apart from the expected words such as "room", "bedroom" and "private", there are a lot of adjectives that listers tended to use. Some that stood out to be were "cozy", "spacious" and "sunny". Some other common words that they used are location descriptors and nearby attractions and amenities. Overall we find that listers try to grab renters attention via descriptors that make their place sound as homey as possible.
Graph Six (Question Three)

Graph Six: We then created side-by-side word clouds of private room, shared room and entire house respectively. The description all seem to be pretty overt about the type of housing they are: for example, the most common words for private rooms are "private" and "room". Some of the interesting features in these side by side word clouds is that single bedrooms or shared bedrooms are much more likely to use the word "cozy" while whole apartments are more likely to use "spacious". Location wise, a lot of the shared rooms are in manhattan or brooklyn. The whole apartments have a lot that are brooklyn or williamsburg.
Conclusion
Overall we were able draw some conclusive answers for each of our three research questions. For the first question of how price is different by location, we found that the prices varies significantly by neighborhood and different room types significantly impact the overall price of the rentals. For the question of when the rentals we’re available, we found that shared room were a lot more frequently available than private rooms or apartments, but all the types of rentals had large deviations from the mean. For our last research question about how rentals were described, we found that there were a lot of similar words used in all three different types of rentals but overall but the emphasis of the descriptor was on coziness and comfort for rooms and more on spaciousness for full apartments.