Publicação

Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors

Ver documento

Detalhes bibliográficos
Resumo:During the last decade, machine translation has played an important role in the translation market and has become an essential tool for speeding up the translation process and for reducing the time and costs needed. Nevertheless, the quality of the results obtained is not completely satisfactory, as it is considerably variable, depending on numerous factors. Given this, it is necessary to combine MT with human intervention, by post-editing the machine-translated texts, in order to reach high-quality translations. This work aims at describing the MT process provided by Unbabel, a Portuguese start-up that combines MT with post-editing provided by online editors. The main objective of the study is to contribute to improving the quality of the translated text, by analyzing annotated translated texts, from English into Italian, to define linguistic specifications to improve the tools used at the start-up to aid human editors and annotators. The analysis of guidelines provided to the annotator to guide his/her editing process has also been developed, a task that contributed to improve the inter-annotator agreement, thus making the annotated data reliable. Accomplishing these goals allowed for the identification and the categorization of the most frequent errors in translated texts, namely errors whose resolution is bound to significantly improve the efficacy and quality of the translation. The data collected allowed us to identify register as the most frequent error category and also the one with the most impact on the quality of translations, and for these reasons this category is analyzed in more detail along the work. From the analysis of errors in this category, it was possible to define and implement a set of rules in the Smartcheck, a tool used at Unbabel to automatically detect errors in the target text produced by the MT system to guarantee a higher quality of the translated texts after post-edition.
Autores principais:Testa, Ingrid
Assunto:Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
Ano:2018
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:inglês
Origem:Repositório da Universidade de Lisboa
_version_ 1866810815147933696
author Testa, Ingrid
author_facet Testa, Ingrid
author_role author
contributor_name_str_mv Mendes, Sara
Moniz, Helena
Repositório Científico de Acesso Aberto da ULisboa
country_str PT
creators_json_txt [{\"Person.name\":\"Testa, Ingrid\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Mendes, Sara
Moniz, Helena
Repositório Científico de Acesso Aberto da ULisboa
datacite.creators.creator.creatorName.fl_str_mv Testa, Ingrid
datacite.date.Accepted.fl_str_mv 2018-11-22T00:00:00Z
datacite.date.available.fl_str_mv 2019-01-09T10:19:18Z
datacite.date.embargoed.fl_str_mv 2019-01-09T10:19:18Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Língua inglesa - Tradução para italiano
Língua inglesa - Tradução automática
Língua italiana - Tradução automática
Tradução - Metodologia
Tradução automática
Teses de mestrado - 2018
datacite.titles.title.fl_str_mv Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
dc.contributor.none.fl_str_mv Mendes, Sara
Moniz, Helena
Repositório Científico de Acesso Aberto da ULisboa
dc.creator.none.fl_str_mv Testa, Ingrid
dc.date.Accepted.fl_str_mv 2018-11-22T00:00:00Z
dc.date.available.fl_str_mv 2019-01-09T10:19:18Z
dc.date.embargoed.fl_str_mv 2019-01-09T10:19:18Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10451/36289
dc.language.none.fl_str_mv eng
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Língua inglesa - Tradução para italiano
Língua inglesa - Tradução automática
Língua italiana - Tradução automática
Tradução - Metodologia
Tradução automática
Teses de mestrado - 2018
dc.title.fl_str_mv Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_bdcc
description During the last decade, machine translation has played an important role in the translation market and has become an essential tool for speeding up the translation process and for reducing the time and costs needed. Nevertheless, the quality of the results obtained is not completely satisfactory, as it is considerably variable, depending on numerous factors. Given this, it is necessary to combine MT with human intervention, by post-editing the machine-translated texts, in order to reach high-quality translations. This work aims at describing the MT process provided by Unbabel, a Portuguese start-up that combines MT with post-editing provided by online editors. The main objective of the study is to contribute to improving the quality of the translated text, by analyzing annotated translated texts, from English into Italian, to define linguistic specifications to improve the tools used at the start-up to aid human editors and annotators. The analysis of guidelines provided to the annotator to guide his/her editing process has also been developed, a task that contributed to improve the inter-annotator agreement, thus making the annotated data reliable. Accomplishing these goals allowed for the identification and the categorization of the most frequent errors in translated texts, namely errors whose resolution is bound to significantly improve the efficacy and quality of the translation. The data collected allowed us to identify register as the most frequent error category and also the one with the most impact on the quality of translations, and for these reasons this category is analyzed in more detail along the work. From the analysis of errors in this category, it was possible to define and implement a set of rules in the Smartcheck, a tool used at Unbabel to automatically detect errors in the target text produced by the MT system to guarantee a higher quality of the translated texts after post-edition.
dirty 0
eu_rights_str_mv openAccess
format masterThesis
fulltext.url.fl_str_mv https://repositorio.ulisboa.pt/bitstreams/d71f1592-89ac-472b-924c-0e71b243d6cc/download
id ul_ad8a6943f4a39d82dbd82ef1bb52bb25
identifier.url.fl_str_mv http://hdl.handle.net/10451/36289
instacron_str ul
institution Universidade de Lisboa
instname_str Universidade de Lisboa
language eng
network_acronym_str ul
network_name_str Repositório da Universidade de Lisboa
oai_identifier_str oai:repositorio.ulisboa.pt:10451/36289
organization_str_mv urn:organizationAcronym:ul
person_str_mv Testa, Ingrid
publishDate 2018
reponame_str Repositório da Universidade de Lisboa
repository_id_str urn:repositoryAcronym:ul
service_str_mv urn:repositoryAcronym:ul
spelling engpt_PTDuring the last decade, machine translation has played an important role in the translation market and has become an essential tool for speeding up the translation process and for reducing the time and costs needed. Nevertheless, the quality of the results obtained is not completely satisfactory, as it is considerably variable, depending on numerous factors. Given this, it is necessary to combine MT with human intervention, by post-editing the machine-translated texts, in order to reach high-quality translations. This work aims at describing the MT process provided by Unbabel, a Portuguese start-up that combines MT with post-editing provided by online editors. The main objective of the study is to contribute to improving the quality of the translated text, by analyzing annotated translated texts, from English into Italian, to define linguistic specifications to improve the tools used at the start-up to aid human editors and annotators. The analysis of guidelines provided to the annotator to guide his/her editing process has also been developed, a task that contributed to improve the inter-annotator agreement, thus making the annotated data reliable. Accomplishing these goals allowed for the identification and the categorization of the most frequent errors in translated texts, namely errors whose resolution is bound to significantly improve the efficacy and quality of the translation. The data collected allowed us to identify register as the most frequent error category and also the one with the most impact on the quality of translations, and for these reasons this category is analyzed in more detail along the work. From the analysis of errors in this category, it was possible to define and implement a set of rules in the Smartcheck, a tool used at Unbabel to automatically detect errors in the target text produced by the MT system to guarantee a higher quality of the translated texts after post-edition.application/pdfpt_PTQuality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errorsTesta, IngridMendes, SaraMoniz, HelenaHostingInstitutionOrganizationalRepositório Científico de Acesso Aberto da ULisboae-mailmailto:repositorio@reitoria.ulisboa.ptrepositorio@reitoria.ulisboa.ptURNurn:tid:2020482682019-01-09T10:19:18Z2018-11-222018-08-292018-11-22T00:00:00ZHandlehttp://hdl.handle.net/10451/36289http://purl.org/coar/access_right/c_abf2open accessLíngua inglesa - Tradução para italianoLíngua inglesa - Tradução automáticaLíngua italiana - Tradução automáticaTradução - MetodologiaTradução automáticaTeses de mestrado - 20181610571 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesishttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorio.ulisboa.pt/bitstreams/d71f1592-89ac-472b-924c-0e71b243d6cc/download
spellingShingle Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
Testa, Ingrid
Língua inglesa - Tradução para italiano
Língua inglesa - Tradução automática
Língua italiana - Tradução automática
Tradução - Metodologia
Tradução automática
Teses de mestrado - 2018
status SINGLETON
subject.fl_str_mv Língua inglesa - Tradução para italiano
Língua inglesa - Tradução automática
Língua italiana - Tradução automática
Tradução - Metodologia
Tradução automática
Teses de mestrado - 2018
title Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_full Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_fullStr Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_full_unstemmed Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_short Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_sort Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
topic Língua inglesa - Tradução para italiano
Língua inglesa - Tradução automática
Língua italiana - Tradução automática
Tradução - Metodologia
Tradução automática
Teses de mestrado - 2018
topic_facet Língua inglesa - Tradução para italiano
Língua inglesa - Tradução automática
Língua italiana - Tradução automática
Tradução - Metodologia
Tradução automática
Teses de mestrado - 2018
url http://hdl.handle.net/10451/36289
visible 1