Document details

Data science for connected car insurance : use of trips raw telematics data for knowledge discovery and customers profiling

Author(s): Spada, Enrico

Date: 2018

Persistent ID: http://hdl.handle.net/10362/42452

Origin: Repositório Institucional da UNL

Subject(s): Data Science; Car Insurance; Raw Telematics Data; Clustering; Risk Knowledge


Description

Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence

This report presents all data science processes designed and implemented during the internship at the Actuarial Department of Sterling Insurance1 (Italy). The project developed a complete data science solution, organized according to Cross-Industry Standard Process for Data Mining. The objective is to study in-depth – for the very first time – trips raw telematics data, and to discover actionable knowledge that can be applied to generate value for the business. The research is based on trips raw telematics data generated over 5 months by telematics black-box devices installed in the cars of 937 customers. The data are solely related to trips, with granularity at the finest level of individual geospatial coordinate sets composing trajectories. The features describing each timestamped GPS coordinate set are average speed in the last second, heading, GPS quality, meters travelled since previous position. The data sources consist of semi-structured data stored in several flat files in their native format, batch extracted from the data lake. Starting from trips raw telematics data at the granular level of geospatial coordinate sets, they are extensively studied and enriched with additional open data sources exploiting spatial join operations. Next, a complex concatenation of data preparation tasks is performed to obtain the final dataset, aggregated at the granular level of trips and described by 117 features. The final dataset is fed to the k-means algorithm for discovering patterns over trips characteristics. Patterns are studied considering the overall portfolio, regardless of driver and intentionally neglecting historical or personal information. The study concludes by deploying the clustering results to profile customers, bringing to a new level the risk knowledge of the line of business about its customers. This discovery opens a world of new possibilities, some of the uncountable examples are improve pricing, using results in fraud detection and offering new services and overall risk prevention for customers.

Document Type Master thesis
Language English
Advisor(s) Cabral, Pedro da Costa Brito; Claeys, Olivier
Contributor(s) RUN
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents