Document details

Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains

Author(s): Nunes, Rafael ; Santos, Joaquim ; Balreira, Dennis ; Freitas, Carla ; Olival, Fernanda ; Cameron, Helena ; Vieira, Renata

Date: 2025

Persistent ID: http://hdl.handle.net/10174/39472

Origin: Repositório Científico da Universidade de Évora

Subject(s): Named entity recognition; Portuguese language variants


Description

This paper discusses the impact of Portuguese variants in Large Language Models for the task of named entity recognition (NER) in specialised domains. The tests were made on a Brazilian Portuguese legal and a European Portuguese historical corpora. The models taken into account are BERTimbau (PT-BR), Albertina (PT-PT and PT-BR), and XML-R (multilingual). The impact was more evident in the Portuguese historical corpus, which resulted in higher F1 measures compared to previous works that did not consider the same language variant. Additionally, the study underscores the impact of model architecture on performance, highlighting the critical role of both linguistic alignment and model size in enhancing NER in specialised domains.

This work has received funds from the Portuguese Science Foundation FCT, in the context of the projects CEECIND/01997/2017 and UIDB/00057/2020.

Document Type Book part
Language English
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents