RCAAP - Repositórios Científicos de Acesso Aberto de Portugal

Annotating, analysing and learning named entities in Portuguese historical text...

Vieira, Renata; Olival, Fernanda; Cameron, Helena; Farrica, Fátima; Santos, Joaquim; Reyes, Daniel

This article presents a study based on 18th-century Portuguese texts, focusing on the analysis of named entities to enhance their value for historical research. For that, an annotated corpus was developed using a primary source (the Parish Memories), which was transcribed, revised, and standardised. The distribution of named entities in the source was then analysed to reflect on the variations in the defined ca...

Date: 2025 | Origin: Linguamática

More info.

Anotação, análise e aprendizagem de Entidades Nomeadas em textos históricos por...

Vieira, Renata; Olival, Fernanda; Cameron, Helena; Santos, Joaquim; Reyes, Daniel

Este artigo apresenta um estudo baseado em textos portugueses do século XVIII, através da análise de entidades nomeadas, tendo em vista potenciá-las para análise histórica. Para isso foi elaborado um corpus anotado, a partir de uma fonte (Memórias Paroquiais) transcrita, revista e normalizada. Posteriormente, realizou-se uma análise da distribuição das entidades nomeadas na fonte em apreço, para refletir sobre ...

Date: 2025 | Origin: Repositório Científico da Universidade de Évora

More info.

Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains

Nunes, Rafael; Santos, Joaquim; Balreira, Dennis; Freitas, Carla; Olival, Fernanda; Cameron, Helena; Vieira, Renata

This paper discusses the impact of Portuguese variants in Large Language Models for the task of named entity recognition (NER) in specialised domains. The tests were made on a Brazilian Portuguese legal and a European Portuguese historical corpora. The models taken into account are BERTimbau (PT-BR), Albertina (PT-PT and PT-BR), and XML-R (multilingual). The impact was more evident in the Portuguese historical ...

Date: 2025 | Origin: Repositório Científico da Universidade de Évora

More info.

Notes on variation and lexical diachrony in the Parish Memories-Alentejo collec...

Cameron, Helena; Olival, Fernanda; Vieira, Renata

Memórias Paroquiais-Alentejo (1758) collects the responses of the parish priests from the largest region of Portugal (Alentejo) to a survey carried out by the Crown, asking about the state of the territory and its populations, and also about the effects of the earthquake 1755. This article discusses the transformative process from the manuscripts up to the processable digital stage. We described some individual...

Date: 2025 | Origin: Repositório Científico da Universidade de Évora

More info.

Named entity recognition specialised for Portuguese 18th century History research

Santos, Joaquim; Vieira, Renata; Olival, Fernanda; Cameron, Helena; Farrica, Fatima

This paper presents the construction of a corpus and the respective models learned for the Named Entity Recognition (NER) task, specialised for historical research. The entity categories were adapted based on the objectives of the historical analysis of the 18th-century text. We trained and evaluated traditional neural networks and the new Large Language Models (LLMs) for the NER task. In total, we assessed six...

Date: 2024 | Origin: Repositório Científico da Universidade de Évora

More info.

PLN e Humanidades Digitais

Vieira, Renata; Cameron, Helena; Olival, Fernanda; Farrica, Fatima; Finatto, Maria José; Banza, Ana Paula; Ribeir, Ana Sofia; Trojan, Cassia

Na área de HD, relativamente aos trabalhos baseados em fontes textuais, encontramos uma grande variação, tanto nos períodos históricos das fontes, no seu suporte (manuscritos em papel, impressos, fotografados, etc), como no seu estágio de digitalização, que pode variar entre imagens digitais, textos em PDF e textos digitalizados em outros formatos. Todas essas variações adicionam esforços extras de processament...

Date: 2024 | Origin: Repositório Científico da Universidade de Évora

More info.

Analysing entity distribution in an annotated 18th-century historical source

De Los Reyes, Daniel; Vieira, Renata; Olival, Fernanda; Cameron, Helena; Farrica, Fatima

This paper presents a distribution analysis of named entities in a historical source, an 18th century Portuguese text collection. The source has been transcribed, revised, normalised and annotated manually with the help of an annotation tool. The distribution analysis was carried out automatically with the help of an extraction parser applied to the annotated texts. The central question of this text is to analy...

Date: 2024 | Origin: Repositório Científico da Universidade de Évora

More info.

Digital Humanities and Portuguese Processing: a research pathway

Vieira, Renata; Banza, Ana Paula; Ribeiro, Ana Sofia; Trojahn, Cassia; Olival, Fernanda; Cameron, Helena; Vilar, Herminia; Santos, Ivo; Santos, Joaquim

This paper reflects on the whole path of work in digital humanities, on the light of the projects related to text processing under development at CIDEHUS. These projects deal with a rich heritage related to the Portuguese culture, history and language. This paper reflects on the many challenges to be faced and how NLP techniques may broaden the capabilities of organising and sharing knowledge related to these r...

Date: 2022 | Origin: Repositório Científico da Universidade de Évora

More info.

Named entity annotation of an 18th-century transcribed corpus: problems and cha...

Cameron, Helena; Olival, Fernanda; Vieira, Renata; Santos, Joaquim

This paper reviews a stage of the process of annotating named entities in 18th-century texts to enrich historical research sources and link them to other bases. The categories in question are person, location and organisation, valid categories for historian analysis. We discuss the difficulties observed in the process and point eventual solutions.; Partially supported by the Portuguese Foundation FCT, under the...

Date: 2022 | Origin: Repositório Científico da Universidade de Évora

More info.

Enriching the 1758 Portuguese Parish Memories (Alentejo) with Named Entities

Vieira, Renata; Olival, Fernanda; Cameron, Helena; Santos, Joaquim; Sequeira, Ofelia; Santos, Ivo

This work presents an enriched version of the Parish Memories (1758–1761), an essential Portuguese historical source manually transcribed. It is enriched with annotations of named entities of the types PERSON, LOCATION, and ORGANIZATION. The annotation was done automatically for the whole collection where two researchers annotated a portion of it manually for evaluation purposes. In this dataset, we provide the...

Date: 2021 | Origin: Repositório Científico da Universidade de Évora

More info.

13 documents found, page 1 of 2

Annotating, analysing and learning named entities in Portuguese historical text...

Anotação, análise e aprendizagem de Entidades Nomeadas em textos históricos por...

Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains

Notes on variation and lexical diachrony in the Parish Memories-Alentejo collec...

Named entity recognition specialised for Portuguese 18th century History research

PLN e Humanidades Digitais

Analysing entity distribution in an annotated 18th-century historical source

Digital Humanities and Portuguese Processing: a research pathway

Named entity annotation of an 18th-century transcribed corpus: problems and cha...

Enriching the 1758 Portuguese Parish Memories (Alentejo) with Named Entities

13 Results

Queried text

Refine Results

Author

Date

Document Type

Access rights

Resource

Subject