Publicação
Optimizing document reranking in a retrieval-augmented generation pipeline for Portuguese legal research
| Resumo: | This study explores RAG systems tailored to the Portuguese legal domain, highlighting challenges in underrepresented languages. Fixed-size chunking strategies, particularly TokenTextSplitter, were found to be most effective, while more advanced techniques like Recursive and Semantic splitting showed little benefits. Larger chunk sizes improved retrieval accuracy and answer quality, though the impact of chunk overlap remains inconclusive. Although reranking techniques have been shown to improve retrieval in previous research this may only be true for large and diverse datasets. |
|---|---|
| Autores principais: | Wollny, Carolyn Svea |
| Assunto: | Retrieval-Augmented Generation RAG Large Language Models LLM Artificial Intelligence AI Hallucination Question answering RAG evaluation Vector store Chunking Legal AI Document reranking Relevance ranking Legal information retrieval Portuguese legal retrieval |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
Registos relacionados
school Graphrag for the Portuguese legal domain - a comparative study of graph-based document relationships and traditional RAG pipelines
por: Esteves, Patrícia Nunes Domingos
Publicado em: (2025)
por: Esteves, Patrícia Nunes Domingos
Publicado em: (2025)
school Large language models (LLMS) for legal analysis rag and beyond for optimizing domain adaptation in Portuguese legal domain-advanced retrieval and orompt engineering techniques for retrieval augmented generation in the Portuguese legal domain
por: Thaçi, Rita
Publicado em: (2025)
por: Thaçi, Rita
Publicado em: (2025)
school Large language models (LLMs) for legal analysis: RAG and beyond for optimizing domain adaptation in Portuguese legal domain
por: Barros, Tiago Mendonça Alencar
Publicado em: (2025)
por: Barros, Tiago Mendonça Alencar
Publicado em: (2025)
school Graph-based reasoning for retrieval-augmented generation: a study in the Portuguese legal domain
por: Hermenegildo, Maria Leonor Trindade
Publicado em: (2025)
por: Hermenegildo, Maria Leonor Trindade
Publicado em: (2025)
book A Question-Answering System for Legal Information Retrieval
por: Quaresma, Paulo
Publicado em: (2012)
por: Quaresma, Paulo
Publicado em: (2012)
groups Document retrieval for question answering : a quantitative evaluation of text preprocessing
por: Carvalho, Gracinda
Publicado em: (2007)
por: Carvalho, Gracinda
Publicado em: (2007)
school Innovating public procurement: impact of a market-specific retrieval-augmented generation model for Switzerland
por: Miranda, Laura Valentina Sanabria
Publicado em: (2025)
por: Miranda, Laura Valentina Sanabria
Publicado em: (2025)
school Enhancing product categorization with retrieval augmented generation: a comparative study of architectures, techniques and strategies
por: Marrero, Rafael Alejandro Moles
Publicado em: (2025)
por: Marrero, Rafael Alejandro Moles
Publicado em: (2025)
article A Logic Programming-Based Approach to QA@CLEF-2005
por: Quaresma, Paulo
Publicado em: (2012)
por: Quaresma, Paulo
Publicado em: (2012)
article A Proposal for a Web Information Extraction and Question-Answer System
por: Saias, José
Publicado em: (2009)
por: Saias, José
Publicado em: (2009)
article A question-answering system for Portuguese juridical documents
por: Quaresma, Paulo
Publicado em: (2012)
por: Quaresma, Paulo
Publicado em: (2012)
article Open Source ChatBot Prototype Using RAG and Structured Data for Academic Support
por: Almeida, Pedro
Publicado em: (2026)
por: Almeida, Pedro
Publicado em: (2026)
article ProfAI – Agente codocente para mediação pedagógica em ambientes virtuais de aprendizagem
por: Aragão, Gabriel
Publicado em: (2025)
por: Aragão, Gabriel
Publicado em: (2025)
article Power BI conversacional com RAG
por: Fernandes, João Pedro Sá
Publicado em: (2025)
por: Fernandes, João Pedro Sá
Publicado em: (2025)
school Employing retrieval augmented generation to optimize LIMS for the legal domain: evaluating methods to improve chatbot performance
por: Schumann, Lorenzo Oliver
Publicado em: (2024)
por: Schumann, Lorenzo Oliver
Publicado em: (2024)
school BioKnowQA: Prompt-tuning for biomedical QA with knowledge graphs
por: Lopes,Paulo Rodrigo Coelho
Publicado em: (2026)
por: Lopes,Paulo Rodrigo Coelho
Publicado em: (2026)
school Developing End-to-End, Deep Learning-Based Chatbots for Healthcare Support in Portuguese
por: Santos, Miguel Ângelo Azeitona dos
Publicado em: (2024)
por: Santos, Miguel Ângelo Azeitona dos
Publicado em: (2024)
school Retrieval-Augmented Generation for Biomedical Protocols: Optimising Knowledge Retrieval to Support Healthcare Technicians in Anatomical Pathology Workflows
por: Pires, Diogo Filipe Moreira
Publicado em: (2025)
por: Pires, Diogo Filipe Moreira
Publicado em: (2025)
groups Building and exploring semantic equivalences resources
por: Carvalho, Gracinda
Publicado em: (2012)
por: Carvalho, Gracinda
Publicado em: (2012)
school Time-aware Question-Answering for the Portuguese Web Archive
por: Arvana, João Manuel Coelho Barroso Varandas
Publicado em: (2023)
por: Arvana, João Manuel Coelho Barroso Varandas
Publicado em: (2023)
school Multimodal Learning for Lung Cancer Diagnosis and Management: A Deep Learning Pipeline for Classification, TNM Staging, and Treatment Protocol Generation
por: Silva, Catarina Costa Pereira Nascimento da
Publicado em: (2025)
por: Silva, Catarina Costa Pereira Nascimento da
Publicado em: (2025)
school Tracking Context in Conversational Search: From Utterances to Neural Embeddings
por: Ferreira, Rafael André Henriques
Publicado em: (2021)
por: Ferreira, Rafael André Henriques
Publicado em: (2021)
school Temporal Information Models for Real-Time Microblog Search
por: Martins, Flávio Nuno Fernandes
Publicado em: (2018)
por: Martins, Flávio Nuno Fernandes
Publicado em: (2018)
school Exploration and refinement of language models for applications in the food industry
por: Magalhães, José Pedro Martins
Publicado em: (2024)
por: Magalhães, José Pedro Martins
Publicado em: (2024)
article A Review on Cooperative Question-Answering Systems
por: Melo, Dora
Publicado em: (2014)
por: Melo, Dora
Publicado em: (2014)
article Biomedical text mining applied to document retrieval and semantic indexing
por: Lourenço, Anália
Publicado em: (2009)
por: Lourenço, Anália
Publicado em: (2009)
article BioDR: semantic indexing networks for biomedical document retrieval
por: Lourenço, Anália
Publicado em: (2010)
por: Lourenço, Anália
Publicado em: (2010)
school A Chatbot for Tourism in Porto: Building a Chatbot to support daily Tourists’ activities in Porto
por: Dona, Ricardo Francisco Montenegro
Publicado em: (2025)
por: Dona, Ricardo Francisco Montenegro
Publicado em: (2025)
article Ontology Mapping for a Legal Question Answering System
por: Trojahn, Cássia
Publicado em: (2009)
por: Trojahn, Cássia
Publicado em: (2009)
school Strategies to Bridge Modalities in Large Vision and Language Models
por: Simplício, Afonso Miguel Lopes
Publicado em: (2024)
por: Simplício, Afonso Miguel Lopes
Publicado em: (2024)
article A Chatbot to Help Promoting Financial Literacy
por: Eleuterio, Davi Silva
Publicado em: (2025)
por: Eleuterio, Davi Silva
Publicado em: (2025)
school Implementation of an Intelligent Virtual Assistant based on LLM for Irrigation Optimization
por: Chia, Henrique Duarte Mota dos Santos
Publicado em: (2025)
por: Chia, Henrique Duarte Mota dos Santos
Publicado em: (2025)
school BIOMEDICAL DOCUMENT RETRIEVAL FOR DATABASE CURATION
por: Ramos, Diogo Luís Embaixador
Publicado em: (2024)
por: Ramos, Diogo Luís Embaixador
Publicado em: (2024)
school Dense and hybrid models for information retrieval
por: Frias, José André Lopes
Publicado em: (2022)
por: Frias, José André Lopes
Publicado em: (2022)
school Enhancing the Efficiency of Diffusion Models: A Retrieval-Based Approach
por: Kutsenko, Anton
Publicado em: (2025)
por: Kutsenko, Anton
Publicado em: (2025)
school Retrieval Augmented Generation for Enhanced Enterprise Information Availability: Nova IMS Case Study
por: Nosorowski, Jan Jerzy
Publicado em: (2025)
por: Nosorowski, Jan Jerzy
Publicado em: (2025)
school Biomedical information extraction for matching patients to clinical trials
por: Araújo, Gonçalo Carmo de
Publicado em: (2018)
por: Araújo, Gonçalo Carmo de
Publicado em: (2018)
book Proceedings of the 9th International Workshop on Information Retrieval on Current Research Information Systems
por: Tenreiro De Magalhaes, Sérgio
Publicado em: (2006)
por: Tenreiro De Magalhaes, Sérgio
Publicado em: (2006)
groups Using Lucene for Developing a Question-Answering Agent in Portuguese
por: Oliveira, Hugo Gonçalo
Publicado em: (2019)
por: Oliveira, Hugo Gonçalo
Publicado em: (2019)
article Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs—An Extensive Exploration of Question-Matching Approaches for Portuguese
por: Santos, José
Publicado em: (2020)
por: Santos, José
Publicado em: (2020)
Registos relacionados
-
school Graphrag for the Portuguese legal domain - a comparative study of graph-based document relationships and traditional RAG pipelines
por: Esteves, Patrícia Nunes Domingos
Publicado em: (2025) -
school Large language models (LLMS) for legal analysis rag and beyond for optimizing domain adaptation in Portuguese legal domain-advanced retrieval and orompt engineering techniques for retrieval augmented generation in the Portuguese legal domain
por: Thaçi, Rita
Publicado em: (2025) -
school Large language models (LLMs) for legal analysis: RAG and beyond for optimizing domain adaptation in Portuguese legal domain
por: Barros, Tiago Mendonça Alencar
Publicado em: (2025) -
school Graph-based reasoning for retrieval-augmented generation: a study in the Portuguese legal domain
por: Hermenegildo, Maria Leonor Trindade
Publicado em: (2025) -
book A Question-Answering System for Legal Information Retrieval
por: Quaresma, Paulo
Publicado em: (2012)