Introduction

As countries around the world loosen and remove their COVID-19 restrictions, revenge tourism has, and is expected to, become increasingly widespread over the next few years. Tourism spending has been up 60% in 2022, and is projected to rise by another 30% in 2023, according to The Economist (https://www.economist.com/the-world-ahead/2022/11/14/take-that-covid-revenge-tourism-takes-off).

With the expected increase in tourism, hotel demand is also expected to increase in the coming years. For hotel businesses, it’s now more important than ever to understand how to position themselves to attract the most customers. For potential travelers, understanding when and where to place their bookings to maximize value can be key to improving their travel experience. To that end, we will attempt to answer the following research questions:

  1. What kinds of bookings are associated with the least cancellations?

  2. How far ahead should travelers plan ahead in making hotel reservations? Is this time frame dependent on whether they’re planning to travel in a group?

  3. Where in Europe should travelers plan on going to if they are looking for the best deals? Is this dependent on whether they plan or going to a resort or city hotel?

To answer these questions, we will look at the Hotel Bookings Demand Dataset (https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand), where it contains information about hotel bookings and related information from 2015 to 2017, a few years before the pandemic impacted the industry.

Specifically, some variables include:

  1. City Hotel/Resort Hotel

  2. Time of Booking

  3. Number of adults, children, babies

  4. Number of available parking spaces

  5. Time between booking and arrival

Among other variables that we will directly mention in our analysis.

General Observations

Before we begin answering our research questions, it’s important to know where we can expect our findings to be the most applicable. Since the data set contains entries from many countries, we will plot the number of records in each country on the world map.

Since the hotels dataset does not contain longitude and latitude coordinates of the countries, we will import and merge another dataset detailing average country longitude and latitudes to create the map.

By observing the plot, we can see that there is a clear concentration of points in Europe, as it’s where we see the largest and most circles. Specifically, we see countries such as Portugal, United Kingdom, and France with the largest number of entries. Overall, this plot tells us that most of our data pertains to information gathered from Europe, meaning that our insights will be most helpful for European hotel business owners and travelers considering visiting Europe.

Secondly, we will observe how hotel demand changes throughout each year. This allows for 2 observations:

  1. an understanding of approximately when our findings are most applicable.

  2. an understanding of when businesses and customers can adapt their strategies to improve sales and receive better deals, respectively.

We will first create a time series plot of daily hotel reservations to see if there exist any immediate trends.

From the time series plot, we see that there seems to be peaks in demand in the spring and fall months and visible drops in the winter months. We now know during which months our findings will be most relevant, and also know when businesses can offer deals during winter where demand is lower and customers can take advantage of winter tourism where prices are likely to be lower.

We can also be a bit more rigorous in our analysis of this time series by asking if this time series is random and if the seasonal components are significant. To do this, we will create an ACF plot.

From this plot, we see multiple points outside of the 95% confidence dashed line, suggesting that this time series is non-random. Also, we see clear seasonal fluctuations in the plot throughout different lags. The fluctuations have a “wavelength” of around 180 days, which makes sense as it corresponds to 2 seasons. From this, we can conclude that there does exist a meaningful seasonal trend in the data, giving us more confidence in our conclusion that summer and spring months are probably most relevant to our analysis and the best time to make and look for better deals.

Minimizing Cancellations

There are many factors that likely impact the reservation status of a hotel room. From location and services, to past customer experiences, understanding what influences bookings, no-shows, or cancellations the most is very important for managing a hotel. The data’s variables can be used to help anticipate some expected number of cancellations and no-shows for a hotel to plan for.

Given the seasonality of the hotel industry, it’s crucial for hotel businesses to maximize the number of bookings that translate into Check-In and therefore cash-flow. By understanding what types of customers are most likely to not cancel their booking, hotels can prioritize certain guests to maximize profit.

To that end, we looked at cancellation rates across two metrics: Number of Special Requests and Guest Status (New/Repeat Guest). We chose these metrics because they are the most easily measurable and actionable for hotel owners.

From the plot, we see that repeat guests generally have lower cancellation rates, which makes sense as guests usually return only when they enjoyed their previous stay. We also see that for new guests, cancellation rates decreases as the number of special requests increases, which can probably be partly attributed to the extra trouble of remaking the same requests at other hotels should visitors decide to stay elsewhere. However, we don’t see the same trend with returning guests – from 1 special request onwards, cancellation rates seem to steadily increase from ~4% to ~9%. While the cause is not immediately apparent, it’s not too big a worry as the sample size for those observations are not nearly as large.

As for conclusions, hotel owners can use this information to pre-assign rooms for repeat guests and those with non-zero special requests first as they’re the ones least likely to cancel and cause administrative trouble. Furthermore, owners can with more confidence consider certain bookings to be “secure”, facilitating book-keeping and future projections of hotel performance.

However, our current findings only look at 2 variables that we thought may affect cancellation rates, and doesn’t consider the other quantitative variables that we have. To that end, we will conduct PCA on the other variables to determine which combinations of the other variables lead to Cancellation, no-show, or check-in.

PCA Perspective

Here, we will use PCA analysis to determine in what ways no-shows, cancellations, and normal stays are associated with 11 other variables. These include lead time, weekend night stays (booked), week day night stays (booked), number of adults, children, and babies, number of previous cancellations and bookings for a customer, number of booking changes, average daily rate, and number of special requests.

First, we get our principal components after subsetting and standardizing our data. Note that we had to remove four observations where children were entered as null.

## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6     PC7
## Standard deviation     1.3237 1.1953 1.0921 1.0537 1.01192 0.99353 0.93616
## Proportion of Variance 0.1593 0.1299 0.1084 0.1009 0.09309 0.08974 0.07967
## Cumulative Proportion  0.1593 0.2892 0.3976 0.4985 0.59162 0.68136 0.76103
##                            PC8     PC9    PC10    PC11
## Standard deviation     0.88276 0.86926 0.77690 0.70014
## Proportion of Variance 0.07084 0.06869 0.05487 0.04456
## Cumulative Proportion  0.83187 0.90057 0.95544 1.00000

Next, we make plots to visualize the first three principal components, colored by reservation status. We limit these to the first three principal components for the sake of simplicity, although the optimal number to inspect can be obtained from an elbow plot of them.

From our PCA plots, it appears difficult to separate the three groups as a whole, since check-outs and cancellations are clustered tightly together. When it comes to the outliers, however, there are notable cancellation values forming a trend with low PC1 and moderately-low PC2. A similar trend for check outs exists for low PC1 and high PC2. Looking at the second plot, PC3 appears even more useful in separating out the deviating cancellation points, which have low PC3 and somewhat-low PC2. High PC3 values yield the deviating check out points.

To identify potential variables of importance, we use our principal components to display a biplot for the second and third principal components that we identified as being important. The biplot shows us the linear relationships between the two principal components and our variables of interest. When displayed over the plot of the two PC’s, we see vectors corresponding to the variable linear combination for them.

For the points that are no-show stays, the only potentially-helpful variable is number of babies, which we see from the blue ellipse’s direction relative to the green and red’s, which corresponds to the babies variable the most.

For the normal check-out points, we can examine the cluster leading up from the main group of points. We see vectors in this exact direction that correspond to number of changes, and number of previous bookings not canceled. From this, it is likely that customers with a history of bookings not canceled, or customers who make booking changes beforehand are likely to check out normally.

For cancellations, there are two main areas of interest. First, we see the large cluster to the right of the main one. This has three vectors in exact same direction. These correspond to number of previous cancellations, and the two variables for the number of booked nights. In addition, we see the outlier cancellation points that trend to very low PC3 values and somewhat-low PC2 values. The only vector in this direction is the one corresponding to the number of adults in a booking. From these two clusters, it is likely that high numbers of nights booked, adults booked, or previous cancellations may explain cancellations.

Using these variables, hotels can try to curb the number of cancelations by limiting reservations with many previous cancellations and/or many nights booked. Businesses reserving hotel spaces can use the finding that reservations with many adults tend to be canceled, and that hotels may be trying to limit this.

Getting Ahead of the Curve

Securing hotel bookings is crucial for potential travelers, especially during higher traffic months during the summer. To get a better idea of how far in advance travelers should make their hotel bookings, we will look at densities of lead_time (days from Booking to Arrival) across all hotel (City/Resort Hotel) and customer types (Contract, Group, Transient, Transient-Party). With this information, travelers now have a general idea of the best time to solidify their plans to get ensure quality bookings.

From the faceted density plot, we see that bookings from Transient and Transient Party customer types have similar booking lead time while Group bookings seem to have generally much shorter lead times for both city and resort hotels. For these three groups, the distributions are all skewed to the right. As for Contract customer types, resort hotel bookings show a roughly normal distribution when mean ~180 days while city hotel bookings show a bi-modal distribution with peaks at around 15 & 300 days. This plot is informative for potential travelers looking for how far in advance people/groups plan their bookings in advance for different hotel types. This can be helpful for potential travelers & booking agencies so they can have an idea of the best time to book to secure their spots.

Cheapest Locations

It is important to know where to find the best deals for travel in Europe, as the cost of accommodations and activities can greatly impact the overall budget of a trip. By understanding the most cost-effective destinations, travelers can make informed decisions on where to go and how to allocate their funds to have the best and most affordable vacation.

In these plots we see the prices of hotels in each country splits into two graphs based off of whether it is a resort or a city hotel. The plot is a map of the continent of Europe with different countries shaded by the average price of hotels in that country. The red shaded countries have the most expensive hotels while the green shaded countries has the cheapest countries. We see that on average the prices for city hotels are more expensive than resort hotels. We do see some exceptions to this rule in particular countries though such as Portugal where the prices of city hotels are cheaper. If a traveler was looking for the cheapest hotels in Europe at resorts to stay this map would point to Ukraine or Greece. If they were looking in cities than the cheapest hotels are in Portugal or Lithuania.

Conclusion

The hotel bookings dataset is a rich source of information that can be used to understand how to position a hotel business to attract the most customers and how travelers can maximize the value of their hotel bookings. Our goal of this analysis was to find specific information about hotel booking to assist both travelers and hotel owners. Through our analysis, we found that hotels are cancelled less with repeat guests cancelled less with guests that make more special requests as well. One key finding for booking times is that group bookings tend to have shorter lead times than other types. Additionally, customer lead times varies across customer type and type and type of hotel. Knowing the relevant distribution of days booked ahead for can be used as good starting point for how far in advance bookings should be made. We also found a few countries where hotels are much cheaper than others such as Portugal but also found that the prices depend on whether it is a city or resort hotel with a country like Lithuania having a large difference. Overall, our findings highlight the importance of leveraging data in the hotel industry to improve the travel experience for both businesses and consumers.