Publicação
Automatic creation of stock market lexicons for sentiment analysis using StockTwits data
| Resumo: | Sentiment analysis has been increasingly applied to the stock market domain. In particular, investor sentiment indicators can be used to model and predict stock market variables. In this context, the quality of the sentiment analysis is highly dependent of the opinion lexicon adopted. However, there is a lack of lexicons adjusted to microblogging stock market data. In this work, we propose an automatic procedure for the creation of such lexicon by exploring a large set of labeled messages from StockTwits, a popular financial microblogging service, and using four statistical measures: adaptations of the known TF-IDF, Information Gain, Class Percentage, and a newly proposed Weighted Class Probability. The obtained lexicons are competitive when compared with a set of six reference lexicons. Moreover, we verified that it is beneficial to use continuous sentiment scores instead of sentiment labels. |
|---|---|
| Autores principais: | Oliveira, Nuno |
| Outros Autores: | Cortez, Paulo; Areal, Nelson |
| Assunto: | Sentiment analysis Opinion mining Stock market Lexicon Microblogging data Information retrieval |
| Ano: | 2014 |
| País: | Portugal |
| Tipo de documento: | comunicação em conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Sentiment analysis has been increasingly applied to the stock market domain. In particular, investor sentiment indicators can be used to model and predict stock market variables. In this context, the quality of the sentiment analysis is highly dependent of the opinion lexicon adopted. However, there is a lack of lexicons adjusted to microblogging stock market data. In this work, we propose an automatic procedure for the creation of such lexicon by exploring a large set of labeled messages from StockTwits, a popular financial microblogging service, and using four statistical measures: adaptations of the known TF-IDF, Information Gain, Class Percentage, and a newly proposed Weighted Class Probability. The obtained lexicons are competitive when compared with a set of six reference lexicons. Moreover, we verified that it is beneficial to use continuous sentiment scores instead of sentiment labels. |
|---|