Publicação

Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors

Detalhes bibliográficos
Resumo:	During the last decade, machine translation has played an important role in the translation market and has become an essential tool for speeding up the translation process and for reducing the time and costs needed. Nevertheless, the quality of the results obtained is not completely satisfactory, as it is considerably variable, depending on numerous factors. Given this, it is necessary to combine MT with human intervention, by post-editing the machine-translated texts, in order to reach high-quality translations. This work aims at describing the MT process provided by Unbabel, a Portuguese start-up that combines MT with post-editing provided by online editors. The main objective of the study is to contribute to improving the quality of the translated text, by analyzing annotated translated texts, from English into Italian, to define linguistic specifications to improve the tools used at the start-up to aid human editors and annotators. The analysis of guidelines provided to the annotator to guide his/her editing process has also been developed, a task that contributed to improve the inter-annotator agreement, thus making the annotated data reliable. Accomplishing these goals allowed for the identification and the categorization of the most frequent errors in translated texts, namely errors whose resolution is bound to significantly improve the efficacy and quality of the translation. The data collected allowed us to identify register as the most frequent error category and also the one with the most impact on the quality of translations, and for these reasons this category is analyzed in more detail along the work. From the analysis of errors in this category, it was possible to define and implement a set of rules in the Smartcheck, a tool used at Unbabel to automatically detect errors in the target text produced by the MT system to guarantee a higher quality of the translated texts after post-edition.
Autores principais:	Testa, Ingrid
Assunto:	Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
Ano:	2018
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade de Lisboa
Idioma:	inglês
Origem:	Repositório da Universidade de Lisboa

_version_	1866810815147933696
author	Testa, Ingrid
author_facet	Testa, Ingrid
author_role	author
contributor_name_str_mv	Mendes, Sara Moniz, Helena Repositório Científico de Acesso Aberto da ULisboa
country_str	PT
creators_json_txt	[{\"Person.name\":\"Testa, Ingrid\"}]
datacite.contributors.contributor.contributorName.fl_str_mv	Mendes, Sara Moniz, Helena Repositório Científico de Acesso Aberto da ULisboa
datacite.creators.creator.creatorName.fl_str_mv	Testa, Ingrid
datacite.date.Accepted.fl_str_mv	2018-11-22T00:00:00Z
datacite.date.available.fl_str_mv	2019-01-09T10:19:18Z
datacite.date.embargoed.fl_str_mv	2019-01-09T10:19:18Z
datacite.rights.fl_str_mv	http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv	Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
datacite.titles.title.fl_str_mv	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
dc.contributor.none.fl_str_mv	Mendes, Sara Moniz, Helena Repositório Científico de Acesso Aberto da ULisboa
dc.creator.none.fl_str_mv	Testa, Ingrid
dc.date.Accepted.fl_str_mv	2018-11-22T00:00:00Z
dc.date.available.fl_str_mv	2019-01-09T10:19:18Z
dc.date.embargoed.fl_str_mv	2019-01-09T10:19:18Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	http://hdl.handle.net/10451/36289
dc.language.none.fl_str_mv	eng
dc.rights.none.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv	Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
dc.title.fl_str_mv	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
dc.type.none.fl_str_mv	http://purl.org/coar/resource_type/c_bdcc
description	During the last decade, machine translation has played an important role in the translation market and has become an essential tool for speeding up the translation process and for reducing the time and costs needed. Nevertheless, the quality of the results obtained is not completely satisfactory, as it is considerably variable, depending on numerous factors. Given this, it is necessary to combine MT with human intervention, by post-editing the machine-translated texts, in order to reach high-quality translations. This work aims at describing the MT process provided by Unbabel, a Portuguese start-up that combines MT with post-editing provided by online editors. The main objective of the study is to contribute to improving the quality of the translated text, by analyzing annotated translated texts, from English into Italian, to define linguistic specifications to improve the tools used at the start-up to aid human editors and annotators. The analysis of guidelines provided to the annotator to guide his/her editing process has also been developed, a task that contributed to improve the inter-annotator agreement, thus making the annotated data reliable. Accomplishing these goals allowed for the identification and the categorization of the most frequent errors in translated texts, namely errors whose resolution is bound to significantly improve the efficacy and quality of the translation. The data collected allowed us to identify register as the most frequent error category and also the one with the most impact on the quality of translations, and for these reasons this category is analyzed in more detail along the work. From the analysis of errors in this category, it was possible to define and implement a set of rules in the Smartcheck, a tool used at Unbabel to automatically detect errors in the target text produced by the MT system to guarantee a higher quality of the translated texts after post-edition.
dirty	0
eu_rights_str_mv	openAccess
format	masterThesis
fulltext.url.fl_str_mv	https://repositorio.ulisboa.pt/bitstreams/d71f1592-89ac-472b-924c-0e71b243d6cc/download
id	ul_ad8a6943f4a39d82dbd82ef1bb52bb25
identifier.url.fl_str_mv	http://hdl.handle.net/10451/36289
instacron_str	ul
institution	Universidade de Lisboa
instname_str	Universidade de Lisboa
language	eng
network_acronym_str	ul
network_name_str	Repositório da Universidade de Lisboa
oai_identifier_str	oai:repositorio.ulisboa.pt:10451/36289
organization_str_mv	urn:organizationAcronym:ul
person_str_mv	Testa, Ingrid
publishDate	2018
reponame_str	Repositório da Universidade de Lisboa
repository_id_str	urn:repositoryAcronym:ul
service_str_mv	urn:repositoryAcronym:ul
spelling	engpt_PTDuring the last decade, machine translation has played an important role in the translation market and has become an essential tool for speeding up the translation process and for reducing the time and costs needed. Nevertheless, the quality of the results obtained is not completely satisfactory, as it is considerably variable, depending on numerous factors. Given this, it is necessary to combine MT with human intervention, by post-editing the machine-translated texts, in order to reach high-quality translations. This work aims at describing the MT process provided by Unbabel, a Portuguese start-up that combines MT with post-editing provided by online editors. The main objective of the study is to contribute to improving the quality of the translated text, by analyzing annotated translated texts, from English into Italian, to define linguistic specifications to improve the tools used at the start-up to aid human editors and annotators. The analysis of guidelines provided to the annotator to guide his/her editing process has also been developed, a task that contributed to improve the inter-annotator agreement, thus making the annotated data reliable. Accomplishing these goals allowed for the identification and the categorization of the most frequent errors in translated texts, namely errors whose resolution is bound to significantly improve the efficacy and quality of the translation. The data collected allowed us to identify register as the most frequent error category and also the one with the most impact on the quality of translations, and for these reasons this category is analyzed in more detail along the work. From the analysis of errors in this category, it was possible to define and implement a set of rules in the Smartcheck, a tool used at Unbabel to automatically detect errors in the target text produced by the MT system to guarantee a higher quality of the translated texts after post-edition.application/pdfpt_PTQuality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errorsTesta, IngridMendes, SaraMoniz, HelenaHostingInstitutionOrganizationalRepositório Científico de Acesso Aberto da ULisboae-mailmailto:repositorio@reitoria.ulisboa.ptrepositorio@reitoria.ulisboa.ptURNurn:tid:2020482682019-01-09T10:19:18Z2018-11-222018-08-292018-11-22T00:00:00ZHandlehttp://hdl.handle.net/10451/36289http://purl.org/coar/access_right/c_abf2open accessLíngua inglesa - Tradução para italianoLíngua inglesa - Tradução automáticaLíngua italiana - Tradução automáticaTradução - MetodologiaTradução automáticaTeses de mestrado - 20181610571 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesishttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorio.ulisboa.pt/bitstreams/d71f1592-89ac-472b-924c-0e71b243d6cc/download
spellingShingle	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors Testa, Ingrid Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
status	SINGLETON
subject.fl_str_mv	Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
title	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_full	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_fullStr	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_full_unstemmed	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_short	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
title_sort	Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors
topic	Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
topic_facet	Língua inglesa - Tradução para italiano Língua inglesa - Tradução automática Língua italiana - Tradução automática Tradução - Metodologia Tradução automática Teses de mestrado - 2018
url	http://hdl.handle.net/10451/36289
visible	1

Publicação

Quality in human post-editing of machine-translated texts : error annotation and linguistic specifications for tackling register errors

Registos relacionados