Publicação

Customer Churn Prediction in Auto Insurance: Predicting policy cancelation for new clients

Ver documento

Detalhes bibliográficos
Resumo:Customer churn presents a challenge for any company, including insurers. Knowing which customers are more likely to cancel their insurance policy, enables the company to proactively take action to prevent such churn. To better describe the customers’ behavior and environment, new features can be created from external data. This project used data spanned from January 2019 to March 2023 from the auto branch of an insurance company. Given the availability of geospatial data, two new variables were added that help to portray the customer’s exposure to insurance intermediaries. Different feature selections and techniques to impute missing values were tested to build the probability model. After conducting a literature review on churn, four types of models were considered: random forests, neural networks, LightGBM, and XGBoost. To improve results, an ensemble was constructed using a Generalized Linear Model (GLM), and isotonic regression was applied to one of the models to calibrate the predictions. The main goal is to achieve a well-calibrated model whose probability predictions are expected to have the same percentage of confidence. To compare the models obtained, RMSE and LogLoss were used to measure the loss, while Expected Calibration Error (ECE) and reliability diagrams helped to assess the calibration.
Autores principais:Nunes, Ricardo de Sá
Assunto:Customer Churn Insurance Geospatial variables Ensemble GLM Model Calibration
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:Customer churn presents a challenge for any company, including insurers. Knowing which customers are more likely to cancel their insurance policy, enables the company to proactively take action to prevent such churn. To better describe the customers’ behavior and environment, new features can be created from external data. This project used data spanned from January 2019 to March 2023 from the auto branch of an insurance company. Given the availability of geospatial data, two new variables were added that help to portray the customer’s exposure to insurance intermediaries. Different feature selections and techniques to impute missing values were tested to build the probability model. After conducting a literature review on churn, four types of models were considered: random forests, neural networks, LightGBM, and XGBoost. To improve results, an ensemble was constructed using a Generalized Linear Model (GLM), and isotonic regression was applied to one of the models to calibrate the predictions. The main goal is to achieve a well-calibrated model whose probability predictions are expected to have the same percentage of confidence. To compare the models obtained, RMSE and LogLoss were used to measure the loss, while Expected Calibration Error (ECE) and reliability diagrams helped to assess the calibration.