Document details

Optimizing document reranking in a retrieval-augmented generation pipeline for Portuguese legal research

Author(s): Wollny, Carolyn Svea

Date: 2025

Persistent ID: http://hdl.handle.net/10362/186944

Origin: Repositório Institucional da UNL

Subject(s): Retrieval-Augmented Generation; RAG; Large Language Models; LLM; Artificial Intelligence; AI; Hallucination; Question answering; RAG evaluation; Vector store; Chunking; Legal AI; Document reranking; Relevance ranking; Legal information retrieval; Portuguese legal retrieval; Domínio/Área Científica::Ciências Sociais::Economia e Gestão


Description

This study explores RAG systems tailored to the Portuguese legal domain, highlighting challenges in underrepresented languages. Fixed-size chunking strategies, particularly TokenTextSplitter, were found to be most effective, while more advanced techniques like Recursive and Semantic splitting showed little benefits. Larger chunk sizes improved retrieval accuracy and answer quality, though the impact of chunk overlap remains inconclusive. Although reranking techniques have been shown to improve retrieval in previous research this may only be true for large and diverse datasets.

Document Type Master thesis
Language English
Advisor(s) Han, Qiwei
Contributor(s) RUN
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents