This paper describes the linguistic preprocessing methods on hybrid systems provided by an Artificial Intelligence (AI) international company, Defined.ai. The startup focuses on providing high-quality data, models, and AI tools. The main goal of this work is to enhance and advance the quality of preprocessing models by applying linguistic knowledge. Thus, we focus on two introductory linguistic models in a spee...
This article aims to describe the work conducted at VoiceInteraction, a company specialized in speech processing solutions, with a particular focus on automatic transcription using a Hybrid Automatic Speech Recognizer (ASR). The primary objective revolved around studying the phonetic characteristics of the Russian language, encompassing four main tasks: describing the phonetic-phonological inventory, validating...
Machine Translation (MT) research has witnessed continuous growth, accompanied by an increasing demand for automated error detection and correction in textual content. In response, Unbabel has developed a hybrid approach that combines machine translation with human editors in post-edition (PE) to provide high-quality translations. To facilitate the tasks of post-editors, Unbabel has created a proprietary error ...
This paper proposes a typology concerning errors and linguistic structures found in the source text that have an impact on Machine Translation (MT). The main objectives of this project were firstly, to make a comparison between error typologies and analyze them according to their suitability; analyze annotated data and build a data-driven typology while adapting the previous existing typologies; make a distinct...
The following article describes the research developed at Unbabel, a Portuguese Machine-Translation start-up, that combines Machine Translation (MT) with human post-edition with a focus on customer service content. With the work carried out within a real multilingual AI powered, human-refined, MT industry, we aim to contribute to furthering MT quality and good-practices, by exposing the importance of having con...
PROPOR 2020 will be the 14th edition of the biennial PROPOR conference, hosted alternately in Brazil and in Portugal. Past meetings were held in Lisbon, PT (1993); Curitiba, BR (1996); Porto Alegre, BR (1998); Évora, PT (1999); Atibaia, BR (2000); Faro, PT (2003); Itatiaia, BR (2006); Aveiro, PT (2008); Porto Alegre, BR (2010); Coimbra, PT (2012); São Carlos, BR (2014), Tomar, PT (2016), and Canela, BR (2018). ...
This book constitutes the proceedings of the 14th International Conference on Computational Processing of the Portuguese Language, PROPOR 2020, held in Evora, Portugal, in March 2020. The 36 full papers presented together with 5 short papers were carefully reviewed and selected from 70 submissions. They are grouped in topical sections on speech processing; resources and evaluation; natural language processing a...
This paper presents an acoustic-prosodic analysis of entrainment in map-task dialogues in European Portuguese. Our main goal is to analyze how turn-by-turn entrainment varies with distinct structural metadata events: types of sentence-like units (SUs) in consecutive turns (e.g. interrogatives followed by declaratives, or both declaratives), and with the presence of discourse markers, affirmative cue words, and ...
Automatic personality analysis has gained great attention in the last years as a fundamental dimension in human-machine interactions. However, the development of this technology in some domains, such as the classification of children’s personality, has been hindered by the limited number and size of the available speech corpora due to ethical concerns on collecting such corpora. To circumvent the lack of data, ...
This paper presents a global analysis of entrainment in map-task dialogues in European Portuguese, including 48 dialogues, between 24 speakers. Our main goal is to analyze the acoustic-prosodic similarities between speaker pairs, namely if there are global entrainment cues displayed in the dialogues, if entrainment is manifested in distinct sets of features shared amongst the speakers, if entrainment depends on...