Introduction

Globalization has led to more frequent international travels and an increasing demand in hotel services. Therefore, it has become increasingly important for the hotel industry to learn more about the characteristics of their guests and their booking decisions and habits. In this report, we will focus on a dataset that records booking demands from two hotels in Portugal: one a city hotel in Lisbon, and the other a resort hotel in the beach region of Algarve. Our dataset consists of booking records from July 1, 2015 to August 31, 2017. The objective of this study is to analyze the data given in such a way to answer the following research questions: 1. What are some of the characteristics of the customers in this dataset, and what are their correlations with hotel booking habits, if any? 2. What are the factors that cause customers to prefer one type of hotel over another (resort versus city)?



Method

We retrieved our data set “Hotel Booking Demand” from kaggle.com. The dataset describes data from two hotels in Portugal: one a city hotel in Lisbon, and the other a hotel in the resort region of Algarve. There are 119,391 rows and 32 columns in our dataset–40060 rows represent booking records of the resort hotel, 79330 rows represent booking records of the city hotel, and each column represents a variable. Some of the major variables of interest that we examined for this report include country (country of origin of the guests), distribution_channel (booking distribution channel), is_canceled (whether or not the booking was canceled), stays_in_week_night (number of weeknights stayed), deposit_type (type of deposit), hotel (hotel type), total_of_special_requests (number of special requests from guest), is_repeated_guest (if the customer was a repeat guest), among others. The analysis is conducted in R using various available packages.



Graphs

To begin our analysis, we would like to trace the density of our hotel guest countries of origins, since cultural behaviors might dictate or parallel some of the trends and customer characteristics we would like to observe. As can seen in Figure 1, the highest amount of guests seem to come from Western Europe, followed by the US, Brazil, China, and Australia. We also noticed that for a few countries in Africa and Europe, there are more people booking the resort hotel than the city hotel, this is pretty outstanding as there are more data from city hotel in the first place.


These facetted side-by-side boxplots show the distribution of weeknights stayed given whether the customer canceled, facetted on the customer type. From Graph 2, we can see that most customers tend to stay (or book to stay) for fewer than 5 days, and for customer types like “Contract” or “Group”, the lengths booked to stay are visually quite different in between the ones who did not cancel and the ones who did, although not statistically different since they still overlap. We believe this plot is important for our analysis because we are interested in evaluating the decisions made by customers, and the lengths of their stay (or booked to stay) and whether customers canceled their booking are both important aspects to consider, given customer group.


One of the questions we want to answer overall with our dataset is: what cause customers to prefer one type of hotel over another (resort versus city); thus, we decided to examine this variable with the addition of whether or not this guest was a repeat customer and number of special requests, since some guests may seek fancier resorts if they are planning a more “luxurious” vacation, where they may make more special requests. From this graph, it seems that the city hotel relatively have more data on 0 special requests and fewer special requests than the resort hotel; it also seems that the city hotel in fact see less return guests relatively, when compared to the resort hotel, which could be because people will go back to the same hotels on vacations, and we can also see that almost no guests made 3 or more special requests.


Graph 4 shows the distribution of the number of days between customers’ booking dates and arrival dates through different distribution channels (another characteristic of customer behavior-“TA” means “Travel Agents”, “TO” means “Tour Operators”, and “GDS” is a computer reservation system used by service providers), facetted on whether or not the customer is a repeated guest and the hotel type. There is indeed a higher proportion of repeat guests for resort hotel than for city hotel, which might mean that Resort Hotel is more likely to have repeat guests. In addition, most guests tend to primarily book through TA/TO, but repeat guests tend to book hotels through Corporate more often.


Graph 5 illustrates the distribution of the lead time, filled by the deposit type and facetted by whether the booking ended up being cancelled, other variables we would like to examine as characteristics of customers. As we can see, for the bookings that eventually got canceled, the leading time seems longer for the ones with non refund than the ones with no deposit. Most bookings are not canceled, and no-deposit booking types are the most canceled, since travelers lose no money through cancellation.


Graph 6 gives us an outlook of the number of check-ins over time for the two hotels. As we can see, there definitely seems to be a seasonality/cyclical component to the graph, with people generally booking less hotels during wintertime and booking more during the summertime/Thanksgiving. We can see that the time series line for city hotel is almost compeltely above that of resort hotel, but this is probably because there are way more data from city hotel in the first place.


As we can see from Graph 7, throughout all times, there are more people who booked, stayed, and checked out at the hotel without bringing children or babies, and they have different trends thoughout the years. For people who did not bring children or babies, they tended to book, stay, and check out in hotels all year except the 2 dips of red line that we see around the beginning/end of the year. For people who bring children or babies to the hotel, we can see that they tend to book, stay, and check out at more hotels during the summer around July (which would make sense, as children would be on school holidays during this time).


Graph 8 shows that, overall, there are a majority of no deposit-type bookings made, followed by non-refundable bookings (for which the activity changes drastically based on year as well as time of year). The bookings that require no deposit, however, seem more stable and consistent in terms of number of bookings throughout the year. Lastly, there is not much activity for bookings that are refundable, as they are probably much less common in terms of availability at hotels. ***


Conclusion

We are able to reach several conclusions in this analysis: We have a low number of guests coming from areas without major economies, like Antarctica and some countries in Sub-Saharan Africa, and a higher amount of guests from Europe; in particular, Western Europe, followed by the US, Brazil, and so on.

Most customers tend to not cancel their bookings, but they are definitely more likely to cancel their bookings if it is refundable and are likely to cancel earlier if no deposit is required. Customers generally book less hotels during wintertime and book more during the summertime or around Thanksgiving. The customers without children or babies tended to book, stay, and check out in hotels all year except around the beginning/end of the year. The customers who bring children or babies to the hotel tend to book, stay, and check out more often during the summer around July.

Finally, it seems customers who have more special requests tend to choose the resort hotel over the city hotel, and the resort hotel is more likely to have repeated guests than the city hotel.