Publicação
Portuguese Native Language Identification
| Resumo: | This study presents the first Native Language Identification (NLI) study for L2 Portuguese.We used a sub-set of the NLI-PT dataset, containing texts written by speakers of five different native languages: Chinese, English, German, Italian, and Spanish.We explore the linguistic annotations available in NLI-PT to extract a range of (morpho-)syntactic features and apply NLI classification methods to predict the native language of the authors. The best results were obtained using an ensemble combination of the features, achieving 54:1% accuracy. |
|---|---|
| Autores principais: | Malmasi, Shervin |
| Outros Autores: | del Río, Iria; Zampieri, Marcos |
| Assunto: | Native language identification Learner corpus Portuguese |
| Ano: | 2018 |
| País: | Portugal |
| Tipo de documento: | capítulo de livro |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório da Universidade de Lisboa |
| Resumo: | This study presents the first Native Language Identification (NLI) study for L2 Portuguese.We used a sub-set of the NLI-PT dataset, containing texts written by speakers of five different native languages: Chinese, English, German, Italian, and Spanish.We explore the linguistic annotations available in NLI-PT to extract a range of (morpho-)syntactic features and apply NLI classification methods to predict the native language of the authors. The best results were obtained using an ensemble combination of the features, achieving 54:1% accuracy. |
|---|