Publicação
Enhancing Customer Feedback Analysis in the Insurance Sector through Generative AI and NLP
| Resumo: | This study presents an automated text classification system for categorizing customer reviews from Generali Tranquilidade, an insurance company, into 11 predefined topics, such as Atendimento, Preço, Documentação, Burocracia e Processo and Ferramentas Digitais. The primary objective is to enhance the efficiency and reliability of review analysis to support datadriven decision-making. The methodology integrates Large Language Models (LLMs) for synthetic data generation to address class imbalance, Extreme Gradient Boosting (XGBoost) for classification, Conformal Prediction for uncertainty quantification, and SHAP values to interpret the model by identifying the words that most influence each classification. The dataset comprises 1,943 Portuguese customer reviews, augmented with 1,727 synthetic reviews generated by Claude 3.5. Key preprocessing steps include error correction, text cleaning, stop word removal, lemmatization, and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization with custom term weighting. The XGBoost model, optimized via Bayesian optimization, achieved a 59% accuracy on real data, with improved performance on minority classes when trained with synthetic data. Conformal Prediction ensured 95.6% coverage for ambiguous reviews, while SHAP analysis identified key terms driving classifications. Despite challenges like multi-topic reviews and class imbalance, the system offers a scalable, interpretable solution for insurance feedback analysis, with implications for improving customer satisfaction and operational efficiency. Future work should explore multi-label classification and enhanced annotation strategies to better handle real-world review complexity. |
|---|---|
| Autores principais: | Boumelala, Fatma Zahra |
| Assunto: | Natural Language Processing (NLP) Large Language Models (LLMs) conformal prediction SHAP (Model Interpretability) SDG 8 - Decent work and economic growth SDG 9 - Industry, innovation and infrastructure |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | This study presents an automated text classification system for categorizing customer reviews from Generali Tranquilidade, an insurance company, into 11 predefined topics, such as Atendimento, Preço, Documentação, Burocracia e Processo and Ferramentas Digitais. The primary objective is to enhance the efficiency and reliability of review analysis to support datadriven decision-making. The methodology integrates Large Language Models (LLMs) for synthetic data generation to address class imbalance, Extreme Gradient Boosting (XGBoost) for classification, Conformal Prediction for uncertainty quantification, and SHAP values to interpret the model by identifying the words that most influence each classification. The dataset comprises 1,943 Portuguese customer reviews, augmented with 1,727 synthetic reviews generated by Claude 3.5. Key preprocessing steps include error correction, text cleaning, stop word removal, lemmatization, and Term Frequency-Inverse Document Frequency (TF-IDF) vectorization with custom term weighting. The XGBoost model, optimized via Bayesian optimization, achieved a 59% accuracy on real data, with improved performance on minority classes when trained with synthetic data. Conformal Prediction ensured 95.6% coverage for ambiguous reviews, while SHAP analysis identified key terms driving classifications. Despite challenges like multi-topic reviews and class imbalance, the system offers a scalable, interpretable solution for insurance feedback analysis, with implications for improving customer satisfaction and operational efficiency. Future work should explore multi-label classification and enhanced annotation strategies to better handle real-world review complexity. |
|---|