Document details

Comparing the performance of oversampling techniques for imbalanced learning in insurance fraud detection

Author(s): Moreno, María Fernanda Osorio

Date: 2018

Persistent ID: http://hdl.handle.net/10362/33863

Origin: Repositório Institucional da UNL

Subject(s): Imbalanced datasets; Fraud; oversampling; Insurance


Description

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics

Although the current trend of data production is focused on generating tons of it every second, there are situations where the target category is represented extremely unequally, giving rise to imbalanced datasets, analyzing them correctly can lead to relevant decisions that produces appropriate business strategies. Fraud modeling is one example of this situation: it is expected less fraudulent transactions than reliable ones, predict them could be crucial for improving decisions and processes in a company. However, class imbalance produces a negative effect on traditional techniques in dealing with this problem, a lot of techniques have been proposed and oversampling is one of them. This work analyses the behavior of different oversampling techniques such as Random oversampling, SOMO and SMOTE, through different classifiers and evaluation metrics. The exercise is done with real data from an insurance company in Colombia predicting fraudulent claims for its compulsory auto product. Conclusions of this research demonstrate the advantages of using oversampling for imbalance circumstances but also the importance of comparing different evaluation metrics and classifiers to obtain accurate appropriate conclusions and comparable results.

Document Type Master thesis
Language English
Advisor(s) Bação, Fernando José Ferreira Lucas
Contributor(s) RUN
CC Licence
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents