Large language models (LLMs) have revolutionized natural language processing, but their predominant focus on English has resulted in biases and performance differences across various languages. This situation is maintained in generative multilingual models, where English continues to be the predominant language. In these models, the presence of European Portuguese is marginal and that of the Galician variety is...
The objective of this work is to apply a perplexity-based methodology to automatically calculate the cross-lingual distance between different historical periods of diatopic language variants. This methodology applies to an adhoc constructed corpus in original spelling, on a balanced basis of fiction and non-fiction, which measures the historical distance between European and Brazilian Portuguese on the one hand...
à hora de desenvolver muitas ferramentas estatísticas de Processamento da Linguagem Natural torna-se essencial a utilização de grandes quantidades de dados. Para salvar a limitação da escassez de recursos computacionais para línguas minorizadas como o galego é necessário desenhar novas estratégias. No caso do galego, importantes romanistas têm teorizado que galego e português são variantes do português europeu....