Publicação
Customer Churn Prediction in Auto Insurance: Predicting policy cancelation for new clients
| Resumo: | Customer churn presents a challenge for any company, including insurers. Knowing which customers are more likely to cancel their insurance policy, enables the company to proactively take action to prevent such churn. To better describe the customers’ behavior and environment, new features can be created from external data. This project used data spanned from January 2019 to March 2023 from the auto branch of an insurance company. Given the availability of geospatial data, two new variables were added that help to portray the customer’s exposure to insurance intermediaries. Different feature selections and techniques to impute missing values were tested to build the probability model. After conducting a literature review on churn, four types of models were considered: random forests, neural networks, LightGBM, and XGBoost. To improve results, an ensemble was constructed using a Generalized Linear Model (GLM), and isotonic regression was applied to one of the models to calibrate the predictions. The main goal is to achieve a well-calibrated model whose probability predictions are expected to have the same percentage of confidence. To compare the models obtained, RMSE and LogLoss were used to measure the loss, while Expected Calibration Error (ECE) and reliability diagrams helped to assess the calibration. |
|---|---|
| Autores principais: | Nunes, Ricardo de Sá |
| Assunto: | Customer Churn Insurance Geospatial variables Ensemble GLM Model Calibration |
| Ano: | 2023 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | Customer churn presents a challenge for any company, including insurers. Knowing which customers are more likely to cancel their insurance policy, enables the company to proactively take action to prevent such churn. To better describe the customers’ behavior and environment, new features can be created from external data. This project used data spanned from January 2019 to March 2023 from the auto branch of an insurance company. Given the availability of geospatial data, two new variables were added that help to portray the customer’s exposure to insurance intermediaries. Different feature selections and techniques to impute missing values were tested to build the probability model. After conducting a literature review on churn, four types of models were considered: random forests, neural networks, LightGBM, and XGBoost. To improve results, an ensemble was constructed using a Generalized Linear Model (GLM), and isotonic regression was applied to one of the models to calibrate the predictions. The main goal is to achieve a well-calibrated model whose probability predictions are expected to have the same percentage of confidence. To compare the models obtained, RMSE and LogLoss were used to measure the loss, while Expected Calibration Error (ECE) and reliability diagrams helped to assess the calibration. |
|---|