Publicação
On the advantages of word-frequency and contextual diversity measures extracted from subtitles: the case of Portuguese
| Resumo: | We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word-frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1,920 Portuguese words (and 1,920 non-words) with different lengths in letters (M = 6.89, SD = 2.10) and syllables (M = 2.99, SD = 0.94). Multiple regression analyses on latency and accuracy data were conducted to compare the proportion of variance explained by the Portuguese subtitle-word frequency measures with that accounted by the recent written-word frequency database (P-PAL; Soares et al., 2014a). As its international counterparts, SUBTLEX-PT explains approximately 15% more of the variance in the lexical decision performance of young adults than P-PAL database. Moreover, in line with recent studies, contextual diversity accounted for approximately 2% more of the variance in participant´s reading performance than the raw frequency counts obtained from subtitles. SUBTLEX-PT is freely available for research purposes at http://p-pal.di.uminho.pt/about/database. |
|---|---|
| Autores principais: | Soares, Ana Paula |
| Outros Autores: | Machado, João F.; Costa, Ana; Iriarte Sanromán, Álvaro; Simões, Alberto; Almeida, J. J.; Comesaña, Montserrat; Perea, Manuel |
| Assunto: | Word frequency Contextual diversity Subtitles Portuguese |
| Ano: | 2015 |
| País: | Portugal |
| Tipo de documento: | artigo |
| Tipo de acesso: | acesso restrito |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | We examined the potential advantage of the lexical databases using subtitles and present SUBTLEX-PT, a new lexical database for 132,710 Portuguese words obtained from a 78 million corpus based on film and television series subtitles, offering word-frequency and contextual diversity measures. Additionally we validated SUBTLEX-PT with a lexical decision study involving 1,920 Portuguese words (and 1,920 non-words) with different lengths in letters (M = 6.89, SD = 2.10) and syllables (M = 2.99, SD = 0.94). Multiple regression analyses on latency and accuracy data were conducted to compare the proportion of variance explained by the Portuguese subtitle-word frequency measures with that accounted by the recent written-word frequency database (P-PAL; Soares et al., 2014a). As its international counterparts, SUBTLEX-PT explains approximately 15% more of the variance in the lexical decision performance of young adults than P-PAL database. Moreover, in line with recent studies, contextual diversity accounted for approximately 2% more of the variance in participant´s reading performance than the raw frequency counts obtained from subtitles. SUBTLEX-PT is freely available for research purposes at http://p-pal.di.uminho.pt/about/database. |
|---|