Publicação
Applying LLM-based entity matching for hierarchical product categorization in e-commerce
| Resumo: | This research explored techniques to improve Large Language Models performance for Hi erarchical Product Classification (HPC), including optimized fine-tuning, optimal prompting techniques, taxonomy-specific Knowledge Graphs, leveraging Retrieval-Augmented Genera tion, and implementing LLM-based Entity Matching. Tested on benchmark datasets Icecat and WDC-222, these methods significantly enhanced LLMs’ ability to solve HPC tasks across var ious scenarios. Results achieved a hierarchical F1-score (hF) of 0.921, surpassing traditional DL benchmarks (0.85 hF). While not outperforming proprietary models like GPT, the proposed approaches offer a cost-efficient and effective alternative for businesses, demonstrating strong performance without reliance on expensive LLM solutions. |
|---|---|
| Autores principais: | Markwardt, Elias |
| Assunto: | Large Language Models Hierarchical classification E-Commerce In-Context Learning Fine tuning Prompt Engineering Knowledge graphs Retrieval Augmented Generation Entity matching |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso embargado |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| Resumo: | This research explored techniques to improve Large Language Models performance for Hi erarchical Product Classification (HPC), including optimized fine-tuning, optimal prompting techniques, taxonomy-specific Knowledge Graphs, leveraging Retrieval-Augmented Genera tion, and implementing LLM-based Entity Matching. Tested on benchmark datasets Icecat and WDC-222, these methods significantly enhanced LLMs’ ability to solve HPC tasks across var ious scenarios. Results achieved a hierarchical F1-score (hF) of 0.921, surpassing traditional DL benchmarks (0.85 hF). While not outperforming proprietary models like GPT, the proposed approaches offer a cost-efficient and effective alternative for businesses, demonstrating strong performance without reliance on expensive LLM solutions. |
|---|