Publicação

Modelling Real Entangled Systems Using Projections in Flat Abstract Spaces

Ver documento

Detalhes bibliográficos
Resumo:Credit card fraud poses a significant global concern evidenced by Portugal’s expenditure of €1.87B in 2019 alone due to fraudulent transactions. This staggering amount is expected to rise further when considering consequent additional expenses incurred by financial institutions. Existing credit card fraud detection methods still fail to capture numerous occurrences, leading to the continuous pursuit of novel approaches in this domain. Prevailing methods heavily rely on neural networks and supervised/unsupervised learning, where all consider initial independence between the elements in credit card transactions, or require previous knowledge of the correlation degree between the data features. This thesis hypothesis goes against that assumption, proposing a novel fraud detection model that considers an entanglement present in the transaction elements, resulting in their conditional dependence. The model is based on the premise that the data exists within a high-dimensional manifold. The interconnections within the manifold will induce a dimensionality reduction to a Euclidean space, thereby upholding the independence assumption made by the existing models. The reduction is achieved through word2vec, a neural-based word embedding technique, renowned for effectively handling data complexity and capturing its most relevant aspects. Briefly, the method will consist on the junction of a custom data pre-processing, word2vec, and a prediction technique. The transactions classification is made according to its probability of occurrence, which is obtained from the neural network’s output. The system’s behaviour is described with the Ising model, which effectively characterizes real entangled systems. Moreover, the system’s critical point is computed. Derived from the severe data imbalance is established an additional comparison between SMOTE and CT-GAN for the task of synthetic frauds generation. Empirical evaluation of the novel method is conducted on a real-life credit card data set. The overall results show to be comparable to the leading fraud detection methods.
Autores principais:Carvalho, Beatriz Lopes de
Assunto:Deteção de Fraude de Cartão de Crédito Manifold Word2vec Modelo de Ising Entrelaçados Teses de mestrado - 2023
Ano:2023
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:inglês
Origem:Repositório da Universidade de Lisboa
Descrição
Resumo:Credit card fraud poses a significant global concern evidenced by Portugal’s expenditure of €1.87B in 2019 alone due to fraudulent transactions. This staggering amount is expected to rise further when considering consequent additional expenses incurred by financial institutions. Existing credit card fraud detection methods still fail to capture numerous occurrences, leading to the continuous pursuit of novel approaches in this domain. Prevailing methods heavily rely on neural networks and supervised/unsupervised learning, where all consider initial independence between the elements in credit card transactions, or require previous knowledge of the correlation degree between the data features. This thesis hypothesis goes against that assumption, proposing a novel fraud detection model that considers an entanglement present in the transaction elements, resulting in their conditional dependence. The model is based on the premise that the data exists within a high-dimensional manifold. The interconnections within the manifold will induce a dimensionality reduction to a Euclidean space, thereby upholding the independence assumption made by the existing models. The reduction is achieved through word2vec, a neural-based word embedding technique, renowned for effectively handling data complexity and capturing its most relevant aspects. Briefly, the method will consist on the junction of a custom data pre-processing, word2vec, and a prediction technique. The transactions classification is made according to its probability of occurrence, which is obtained from the neural network’s output. The system’s behaviour is described with the Ising model, which effectively characterizes real entangled systems. Moreover, the system’s critical point is computed. Derived from the severe data imbalance is established an additional comparison between SMOTE and CT-GAN for the task of synthetic frauds generation. Empirical evaluation of the novel method is conducted on a real-life credit card data set. The overall results show to be comparable to the leading fraud detection methods.