RCAAP - Repositórios Científicos de Acesso Aberto de Portugal

A Galician-Portuguese Generative Model

Gamallo, Pablo; Rodríguez, Pablo; Sotelo, Susana; Miquelina, Nuno; Paniagua, Silvia; Schmidt, Daniela; de-Dios-Flores, Iria; Quaresma, Paulo

Large language models (LLMs) have revolutionized natural language processing, but their predominant focus on English has resulted in biases and performance differences across various languages. This situation is maintained in generative multilingual models, where English continues to be the predominant language. In these models, the presence of European Portuguese is marginal and that of the Galician variety is...

Date: 2026 | Origin: Repositório Científico da Universidade de Évora

More info.

Leveraging Advanced Prompting Strategies in Llama-8b for Enhanced Hyperpartisan...

Maggini, Michele Joshua; Marino, Erik Bran; Gamallo, Pablo

This paper explores advanced prompting strategies for hyperpartisan news detection using the Llama3-8b-Instruct model, an open-source LLM developed by Meta AI. We evaluate zero-shot, few-shot, and Chain-of-Thought (CoT) techniques on two datasets: SemEval-2019 Task 4 and a headline-specific corpus. Collaborating with a political science expert, we incorporate domain-specific knowledge and structured reasoning s...

Date: 2025 | Origin: Repositório Científico da Universidade de Évora

More info.

Enhancing Large Language Models for Underrepresented Varieties: Pretraining Str...

Rodríguez, Pablo; Gamallo, Pablo; Santos, Daniel; Sotelo, Susana; Paniagua, Silvia; Pichel, José; Salgueiro, Pedro; Nogueira, Vítor; Quaresma, Paulo

This study presents a systematic exploration of strategies for pretraining generative Large Language Models (LLMs) within the Galician-Portuguese diasystem, by focusing on two underrepresented varieties of this diasystem, namely European Portuguese and Galician. We investigate the impact of combining versus separating linguistic varieties during continued pretraining, the trade-offs between large-scale noisy da...

Date: 2025 | Origin: Repositório Científico da Universidade de Évora

More info.

Desenvolvimento e avaliação de um modelo NER no domínio da análise cultural e d...

Sotelo Docío, Susana; Gamallo, Pablo; Iriarte Sanromán, Álvaro

O Reconhecimento de Entidades Mencionadas (NER) é uma tarefa essencial de extração de informação em que as entidades de um texto são identificadas e classificadas. Um dos principais desafios enfrentados pelos sistemas NER é a dificuldade de generalização do aprendido para outros tipos de corpora diferentes dos utilizados durante o treino. Este problema é acentuado pelo facto de a maioria dos corpora de treino u...

Date: 2023 | Origin: RepositóriUM - Universidade do Minho

More info.

Development and evaluation of a NER model in the domain of cultural analysis an...

Sotelo Docío, Susana; Gamallo, Pablo; Iriarte, Álvaro

 Named Entity Recognition (NER) is an essential task in information extraction where entities in a text are identified and classified. One of the primary challenges addressed by NER systems is the difficulty of generalizing what was learned to different types of corpora beyond the training data. This problem is magnified by the fact that most of the training corpora used are journalistic and therefore need...

Date: 2023 | Origin: Linguamática

More info.

Uso de tecnologias linguı́sticas para estudar a evolução dos sufixos -ÇOM e -VE...

Gamallo, Pablo; Ramom Pichel, José; Montero Santalha, José Martinho; Neves, Marco

O trabalho apresentado neste artigo tem dois objectivos. Por um lado, descreve a adaptação de duas ferramentas de processamento da língua natural ao galego-português medieval, nomeadamente um analisador morfossintático e um reconhededor de variedades medievais, e por outro, visa testar hipóteses linguísticas sobre a evolução de sufixos medievais mediante o uso dessas ferramentas em corpora históricos. Apesar de...

Date: 2021 | Origin: Linguamática

More info.

Distância diacrónica automática entre variantes diatópicas do português e do es...

Pichel, José Ramom; Gamallo, Pablo; Neves, Marco; Alegria, Iñaki

The objective of this work is to apply a perplexity-based methodology to automatically calculate the cross-lingual distance between different historical periods of diatopic language variants. This methodology applies to an adhoc constructed corpus in original spelling, on a balanced basis of fiction and non-fiction, which measures the historical distance between European and Brazilian Portuguese on the one hand...

Date: 2020 | Origin: Linguamática

More info.

Exploring Unsupervised Methods to Sematic Textual Similarity

Gamallo, Pablo; Pereira-Fariña, Martín

This paper presents some unsupervised methods for detecting semantic textual similarity, which are based on distributional models and dependency parsing. The systems are evaluated using the dataset realased by the ASSIN Shared Task co-located with PROPOR 2016. The more basic methods offer better behavior than the more complex ones, which include syntactic-semantic information in sentence analysis. Finally, the ...

Date: 2019 | Origin: Linguamática

More info.

Uma utilidade para o reconhecimento de topónimos em documentos medievais

Canosa, Xavier; Gamallo, Pablo; Varela, Xavier; Taboada, José Ángel; Martínez Lema, Paulo; Garcia, Marcos

Este artigo apresenta o método de construção duma ferramenta para a anotação de entidades geográficas mencionadas em textos medievais. A nova ferramenta foi desenvolvida a partir dos módulos de língua contemporânea do LinguaKit, pacote multilingue de ferramentas de PLN. Uma coleção de corpora anotados manualmente serviu de recurso para elaborar uma lista de topónimos medievais (gazetteers) e observar padrões pa...

Date: 2019 | Origin: Linguamática

More info.

LinguaKit: uma ferramenta multilingue para a análise linguística e a extração d...

Gamallo, Pablo; Garcia, Marcos

Este artigo apresenta LinguaKit, uma suite multilingue de ferramentas de análise, extração, anotação e correção linguísticas. LinguaKit permite realizar tarefas tão diversas como a lematização, a etiquetagem morfossintática ou a análise sintática (entre outras), incluindo também aplicações para a análise de sentimentos (ou minaria de opiniões), a extração de termos multipalavra, ou a anotação concetual e ligaçã...

Date: 2017 | Origin: Linguamática

More info.

12 documents found, page 1 of 2

A Galician-Portuguese Generative Model

Leveraging Advanced Prompting Strategies in Llama-8b for Enhanced Hyperpartisan...

Enhancing Large Language Models for Underrepresented Varieties: Pretraining Str...

Desenvolvimento e avaliação de um modelo NER no domínio da análise cultural e d...

Development and evaluation of a NER model in the domain of cultural analysis an...

Uso de tecnologias linguı́sticas para estudar a evolução dos sufixos -ÇOM e -VE...

Distância diacrónica automática entre variantes diatópicas do português e do es...

Exploring Unsupervised Methods to Sematic Textual Similarity

Uma utilidade para o reconhecimento de topónimos em documentos medievais

LinguaKit: uma ferramenta multilingue para a análise linguística e a extração d...

12 Results

Queried text

Refine Results

Author

Date

Document Type

Access rights

Resource

Subject