Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model crea...
Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model crea...
Large language models (LLMs) have revolutionized natural language processing, but their predominant focus on English has resulted in biases and performance differences across various languages. This situation is maintained in generative multilingual models, where English continues to be the predominant language. In these models, the presence of European Portuguese is marginal and that of the Galician variety is...
Natural Language Processing (NLP) research has predominantly focused on the English language, leading to a wealth of resources and advancements tailored to English. However, there is a growing need to extend these capabilities to other languages, such as European Portuguese, to ensure the inclusivity and accessibility of NLP technologies. In this study, we explore the evaluation of NLP models in the European Po...
survlab provides functions for imputing non-detect values in environmental laboratory data using survival models (including Tobit models) with automatic distribution selection. Is designed specifically for working with analytical data where measurements fall below detection limits or limits of quantification (LOQ).
Convolutional neural networks often generate multiple logits from multiple networks. In most cases, we use simple techniques like addition or column averaging for loss computation. But this allows gradients to be distributed equally among all paths. The proposed approach attempts to guide the gradients of backpropagation along the weakest branches of the neural network. A weakness score is proposed that defines...
Fine-tuning Large Language Models (LLMs) for specific tasks, such as machine translation, is a computationally expensive process that often requires substantial hardware resources. Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA), offer a resource-efficient alternative by significantly reducing the number of trainable parameters and mem...
This paper presents a specialized fine-tuning approach for the Mistral-7B Large Language Model (LLM) tailored for biomedical applications. We employ Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, to adapt the model to the intricacies of biomedical language and domain-specific knowledge. By integrating LoRA, we aim to preserve the general language understanding capabilities of Mistral-7B w...
Describing land cover changes from multi-temporal remote sensing imagery requires capturing both visual transformations and their semantic meaning in natural language. Existing methods often struggle to balance visual accuracy with descriptive coherence. We propose MVLT-LoRA-CC (Multi-modal Vision Language Transformer with Low-Rank Adaptation for Change Captioning), a framework that integrates a Vision Transfor...
Plagiarism detection is essential for maintaining academic integrity, ensuring that scholarly works are original and properly cited. With the rise of online resources and AI writing tools, the risk of plagiarism has increased, making detection crucial in the academic process. Detection methods can be monolingual or cross-lingual and are classified as intrinsic or extrinsic, utilizing various techniques such as ...