Publicação

Evaluating the Impact of Large Language Model-Generated Synthetic Data on Recommender Systems Performance

Detalhes bibliográficos
Resumo:	The rapid expansion of digital catalogs necessitates effective Recommender Systems (RSs) to guide users to relevant items. However, less popular items often suffer from the cold-start problem in RSs. With the rise of Large Language Models (LLMs), it is now possible to generate synthetic user–item interaction data to alleviate this issue. This thesis evaluates how LLM-generated samples impact RS performance in coldstart scenarios. We distilled the Amazon Books dataset (10,000 users × 37,000 items) and used an LLM to produce synthetic interactions, augmenting two models: Neural Matrix Factorization (NeuMF) and the Wide & Deep. Each model was evaluated using five cross-validation runs with different random seeds, on both augmented and non-augmented versions, employing Recall@10, nDCG@10, and F1-Score as evaluation metrics. A one-sided Wilcoxon signed-rank test ( < 0.05) was applied to the F1-Score to assess the statistical significance of performance differences. In cold-start settings, augmentation yielded improvements of 12 Percentage Points (pp) in Recall@10 and 15 pp in nDCG@10. For warm-start items, a moderate decrease was observed (6 pp Recall@10, 10 pp nDCG@10), indicating a performance trade-off. These results confirm that LLM-based augmentation can help mitigate cold-start challenges. Future work may explore richer LLM pipelines (e.g., Retrieval Augmented Generation (RAG)) and benchmark against simpler content-similarity approaches.
Autores principais:	Felisberto, Matheus
Assunto:	Recommender Systems Large Language Models Cold-Start Data Augmentation Synthetic Data
Ano:	2026
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso embargado
Instituição associada:	Universidade Nova de Lisboa
Idioma:	inglês
Origem:	Repositório Institucional da UNL

Descrição
Resumo:	The rapid expansion of digital catalogs necessitates effective Recommender Systems (RSs) to guide users to relevant items. However, less popular items often suffer from the cold-start problem in RSs. With the rise of Large Language Models (LLMs), it is now possible to generate synthetic user–item interaction data to alleviate this issue. This thesis evaluates how LLM-generated samples impact RS performance in coldstart scenarios. We distilled the Amazon Books dataset (10,000 users × 37,000 items) and used an LLM to produce synthetic interactions, augmenting two models: Neural Matrix Factorization (NeuMF) and the Wide & Deep. Each model was evaluated using five cross-validation runs with different random seeds, on both augmented and non-augmented versions, employing Recall@10, nDCG@10, and F1-Score as evaluation metrics. A one-sided Wilcoxon signed-rank test ( < 0.05) was applied to the F1-Score to assess the statistical significance of performance differences. In cold-start settings, augmentation yielded improvements of 12 Percentage Points (pp) in Recall@10 and 15 pp in nDCG@10. For warm-start items, a moderate decrease was observed (6 pp Recall@10, 10 pp nDCG@10), indicating a performance trade-off. These results confirm that LLM-based augmentation can help mitigate cold-start challenges. Future work may explore richer LLM pipelines (e.g., Retrieval Augmented Generation (RAG)) and benchmark against simpler content-similarity approaches.