Predicting Hotel Cancelations

GMU

In this project, I analyzed a data set on hotel reservation cancelations. We were tasked with doing some exploratory data analysis and machine learning to predict booking cancelations. For this project, we used R Programming to analyze the data and create several machine learning models including logistic regression, KNN, and a random forest model. We evaluated the models and made recommendations based on our findings. The full project with visualizations and analysis questions can be found on my GitHub.

Executive Summary

Introduction

In this project, we are asked to analyze booking data from a hotel experiencing record levels of cancelations. The hotel wants to better identify factors that lead to canceled bookings in order to decrease their loss with unoccupied rooms. Understanding a customer’s risk of cancelation and maximizing occupancy will help the hotel maintain profit and thus aide in its future success.

The goal of this analysis is to identify booking factors that relate to canceled and non-canceled reservations. Understanding common factors in canceled and non-canceled reservations can help the hotel better identify customers that will likely keep or cancel their reservation. With these key factors in mind, we can create a model that predicts cancelations based on an understating of the exploratory data analysis. In addition to predicting cancelations, the hotel can leverage these insights and key factors to try and minimize cancelations to increase profit.

Key Findings

In the exploratory data analysis, we analyzed relationships of various factors to booking status, canceled or non-canceled. Firstly, noting the presence of a relationship with cancelations, the proportion of canceled rooms increased from low to moderate to peak occupancy seasons. This is important as peak occupancy season has high potential earning value and almost half of the reservations cancel, decreasing potential profits. Looking at booking type, we see about 1/3rd of reservations booked online, in the app, and through a travel agency cancel while a majority of reservations through corporate partnerships do not. This insight suggests that increasing reservations through corporate partnerships could increase non-cancelations. When analyzing bookings with “add-ons” such as parking or special requests, most customers did not cancel their reservation compared to those who did not add parking or special requests. This helps the hotel know that those bookings with “add-ons” are likely not to cancel. Additionally, those who have previously booked with the hotel, either canceled before or stayed before, are also likely not to cancel. Lastly, most non-canceled bookings tended to be reserved later, with less lead time before their stay, whereas canceled bookings had no trend in lead time, suggesting bookings with less lead time are less likely to cancel. Although analyzed in the data set, there was no relationship between booking status and the price of the reservation or the length of stay.

Modeling Results

To predict future reservation cancelations, we modeled the predictor variables, booking features, and their booking status using logistic regression, K nearest neighbors, and a random forest model. Of the three models tested, the random forest model had the highest accuracy in predicting canceled reservations. It also had the highest ROC AUC. ROC AUC has a maximum value of 1 when a model has zero mistakes in its true negatives (non-cancelations) and 100% accuracy in its real positives (cancelations). The random forest had an ROC AUC of 0.9285 out of 1. In terms of a letter grade we can compare this metric to an ”A”. The proportion of canceled bookings correctly predicted as canceled (sensitivity) was 0.7485 and the proportion of non-canceled bookings that were correctly predicted (specificity) was 0.9406. Looking at the confusion matrix of our predicted and actual booking sates, we see that the model has a low number of false positives, those that we predicted would cancel but do not. It is important that the hotel has a low false positive rate because if they predicted someone would cancel and over book, in hopes of increasing occupancy and profit, they may not have rooms available. On the other hand, predicting a reservation will not cancel but actually canceling might have an impact on profit but will not negatively affect potential customers. As seen in the exploratory data analysis, many non-cancelled reservations occur with minimal lead time so these false positives, while still important to keep low, could open up rooms for late booking guests.

Recommendations

Throughout this analysis, we looked at different factors that relate to booking status, canceled or non-canceled and created a model to predict cancelations. Looking at the variable importance from fitting our “best” model, random forest, we see that the lead time before staying is highly influential in our model. Similarly, in our exploratory data analysis, we saw that many non-canceled reservations had less lead time compared to canceled reservations. With this in mind, the hotel could target customers who plan to book closer to their stay time with room deals or promotions. This could increase the number of bookings and, with less time before arrival, decrease cancelations helping the hotel maintain profit. Special requests were also in the top 3 factors of variable importance and a factor we analyzed that had a relationship to booking status. The hotel could try advertising their special requests or accommodations to increase bookings with special requests which, compared to no special request bookings, cancel less often. With our model, the hotel can keep track of reservations they predict to cancel and create solutions to increase their profit such as a late cancel fee or encourage them to not cancel by adding special deals on parking or meal plans. Additionally, the hotel should work on maintaining relationships with past customers as almost 100% don’t cancel their reservation. Implementing these recommendations, the hotel could start to see an increase in their profit and decrease in their canceled reservations leading to their future growth and success.