Author(s): Spada, Enrico
Date: 2018
Persistent ID: http://hdl.handle.net/10362/42452
Origin: Repositório Institucional da UNL
Subject(s): Data Science; Car Insurance; Raw Telematics Data; Clustering; Risk Knowledge
Author(s): Spada, Enrico
Date: 2018
Persistent ID: http://hdl.handle.net/10362/42452
Origin: Repositório Institucional da UNL
Subject(s): Data Science; Car Insurance; Raw Telematics Data; Clustering; Risk Knowledge
Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
This report presents all data science processes designed and implemented during the internship at the Actuarial Department of Sterling Insurance1 (Italy). The project developed a complete data science solution, organized according to Cross-Industry Standard Process for Data Mining. The objective is to study in-depth – for the very first time – trips raw telematics data, and to discover actionable knowledge that can be applied to generate value for the business. The research is based on trips raw telematics data generated over 5 months by telematics black-box devices installed in the cars of 937 customers. The data are solely related to trips, with granularity at the finest level of individual geospatial coordinate sets composing trajectories. The features describing each timestamped GPS coordinate set are average speed in the last second, heading, GPS quality, meters travelled since previous position. The data sources consist of semi-structured data stored in several flat files in their native format, batch extracted from the data lake. Starting from trips raw telematics data at the granular level of geospatial coordinate sets, they are extensively studied and enriched with additional open data sources exploiting spatial join operations. Next, a complex concatenation of data preparation tasks is performed to obtain the final dataset, aggregated at the granular level of trips and described by 117 features. The final dataset is fed to the k-means algorithm for discovering patterns over trips characteristics. Patterns are studied considering the overall portfolio, regardless of driver and intentionally neglecting historical or personal information. The study concludes by deploying the clustering results to profile customers, bringing to a new level the risk knowledge of the line of business about its customers. This discovery opens a world of new possibilities, some of the uncountable examples are improve pricing, using results in fraud detection and offering new services and overall risk prevention for customers.