The following article describes the research developed at Unbabel, a Portuguese Machine-Translation start-up, that combines Machine Translation (MT) with human post-edition with a focus on customer service content. With the work carried out within a real multilingual AI powered, human-refined, MT industry, we aim to contribute to furthering MT quality and good-practices, by exposing the importance of having con...
This paper aims to analyse the effects of including gaming entities in the performance of the NER system, for the English language and in a machine translation industrial context of customer support content. To identify and classify gaming entities (by the Named Entity Recognition (NER) model), three new categories were created and added to the already used annotation typology: GAME NAME, GAME FEATURE and GAME ...
This paper presents an acoustic-prosodic analysis of entrainment in map-task dialogues in European Portuguese. Our main goal is to analyze how turn-by-turn entrainment varies with distinct structural metadata events: types of sentence-like units (SUs) in consecutive turns (e.g. interrogatives followed by declaratives, or both declaratives), and with the presence of discourse markers, affirmative cue words, and ...
Automatic personality analysis has gained great attention in the last years as a fundamental dimension in human-machine interactions. However, the development of this technology in some domains, such as the classification of children’s personality, has been hindered by the limited number and size of the available speech corpora due to ethical concerns on collecting such corpora. To circumvent the lack of data, ...
This paper presents a global analysis of entrainment in map-task dialogues in European Portuguese, including 48 dialogues, between 24 speakers. Our main goal is to analyze the acoustic-prosodic similarities between speaker pairs, namely if there are global entrainment cues displayed in the dialogues, if entrainment is manifested in distinct sets of features shared amongst the speakers, if entrainment depends on...
This paper performs a global analysis of entrainment between dyads in map-task dialogues in European Portuguese (EP), including 48 dialogues, between 24 speakers. Our main goals focus on the acoustic-prosodic similarities between speakers, namely if there are global entrainment cues displayed in the dialogues, if there are degrees of entrainment manifested in distinct sets of features shared amongst the speaker...
This work describes the discourse markers present in two corpora for European Portuguese, in different domains (university lectures and map-task dialogues). In this study, we also perform a multiclass automatic classification task based on prosodic features to verify in both corpora which words are discourse markers, which are disfluencies, and which are sentence like-units (SUs). Results show that the selectio...
The first contribution of this study is the description of the prosodic behavior of discourse markers present in two speech corpora of European Portuguese (EP) in different domains (university lectures, and map-task dialogues). The second contribution is a multiclass classification to verify, given their prosodic features, which words in both corpora are classified as discourse markers, which are disfluencies, ...
This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is ...
In this paper, we argue that the vocative in European Portuguese has different prosodic properties according to its syntactic distribution. In order to prove this, we have built a corpus and analyzed the intonation contours associated to the vocative and the boundary’s strength between this constituent and the sentence. Our data shows that the vocative’s distribution plays a crucial part in its prosodic charact...