Publicação
Determining the number of clusters in categorical data
| Resumo: | Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets. |
|---|---|
| Autores principais: | Silvestre, Cláudia |
| Outros Autores: | Cardoso, Margarida; Figueiredo, Mário |
| Assunto: | Cluster analysis Model selection Categorical variables |
| Ano: | 2013 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Científico do Instituto Politécnico de Lisboa |
Registos relacionados
groups Clustering and selecting categorical features
por: Silvestre, C.
Publicado em: (2013)
por: Silvestre, C.
Publicado em: (2013)
article Feature selection for clustering categorical data with an embedded modelling approach
por: Silvestre, Cláudia
Publicado em: (2014)
por: Silvestre, Cláudia
Publicado em: (2014)
groups Clustering and selecting categorical features
por: Silvestre, Cláudia
Publicado em: (2013)
por: Silvestre, Cláudia
Publicado em: (2013)
article Feature selection for clustering categorical data with an embedded modeling approach
por: Silvestre, C.
Publicado em: (2015)
por: Silvestre, C.
Publicado em: (2015)
article Enhancing the selection of a model-based clustering with external categorical variables
por: Baudry, Jean-Patrick
Publicado em: (2015)
por: Baudry, Jean-Patrick
Publicado em: (2015)
article Enhancing the selection of a model-based clustering with external categorical variables
por: Baudry, J.-P.
Publicado em: (2015)
por: Baudry, J.-P.
Publicado em: (2015)
groups Categorical data clustering using a minimum message length criterion
por: Silvestre, Cláudia
Publicado em: (2012)
por: Silvestre, Cláudia
Publicado em: (2012)
groups The number of clusters on trust
por: Silvestre, Cláudia
Publicado em: (2016)
por: Silvestre, Cláudia
Publicado em: (2016)
groups An MML embedded approach for estimating the number of clusters
por: Silvestre, Cláudia
Publicado em: (2022)
por: Silvestre, Cláudia
Publicado em: (2022)
book An MML embedded approach for estimating the number of clusters
por: Silvestre, Cláudia
Publicado em: (2023)
por: Silvestre, Cláudia
Publicado em: (2023)
article Mining categorical sequences from data using a hybrid clustering method
por: De Angelis, L.
Publicado em: (2014)
por: De Angelis, L.
Publicado em: (2014)
article A Categorical Clustering of Publishers for Mobile Performance Marketing
por: Silva, Susana
Publicado em: (2019)
por: Silva, Susana
Publicado em: (2019)
groups A clustering view on ESS measures of political interest: an EM-MML approach
por: Silvestre, Cláudia
Publicado em: (2017)
por: Silvestre, Cláudia
Publicado em: (2017)
school Clustering with discrete mixture models: An integrated approach for model selection
por: Silvestre, Cláudia Marisa Vasconcelos
Publicado em: (2014)
por: Silvestre, Cláudia Marisa Vasconcelos
Publicado em: (2014)
article Clustering of variables with a three-way approach for health sciences
por: Bacelar-Nicolau, Helena
Publicado em: (2014)
por: Bacelar-Nicolau, Helena
Publicado em: (2014)
groups Machine learning in incident categorization automation
por: Silva, S.
Publicado em: (2018)
por: Silva, S.
Publicado em: (2018)
article Probabilistic clustering of interval data
por: Brito, Paula
Publicado em: (2015)
por: Brito, Paula
Publicado em: (2015)
article Probabilistic clustering of interval data
por: Brito, P.
Publicado em: (2015)
por: Brito, P.
Publicado em: (2015)
school Credit scoring for the housing model
por: Bento, Viriato Gonçalo Rodrigues
Publicado em: (2022)
por: Bento, Viriato Gonçalo Rodrigues
Publicado em: (2022)
groups Identifying Special Structures in Interval-Data via Model-Base Clustering
por: Brito, Paula
Publicado em: (2013)
por: Brito, Paula
Publicado em: (2013)
article Determinants of Africa’s development: An exploratory study
por: Veiga, J.
Publicado em: (2022)
por: Veiga, J.
Publicado em: (2022)
article How do you self-categorize? Gender and sexual orientation self-categorization in homosexual/heterosexual men and women
por: Fasoli, F.
Publicado em: (2018)
por: Fasoli, F.
Publicado em: (2018)
article Measuring similarity of complex and heterogeneous data in clustering of large data sets
por: Nicolau, Helena Bacelar
Publicado em: (2009)
por: Nicolau, Helena Bacelar
Publicado em: (2009)
school Self-categorization, multiple categorization and well-being of homeless population in Lisbon
por: Solerti, Francesca
Publicado em: (2021)
por: Solerti, Francesca
Publicado em: (2021)
article Water quality monitoring using cluster analysis and linear models
por: Gonçalves, A. Manuela
Publicado em: (2011)
por: Gonçalves, A. Manuela
Publicado em: (2011)
groups Less is more in incident categorization
por: Silva, S.
Publicado em: (2018)
por: Silva, S.
Publicado em: (2018)
school Automatization of incident categorization
por: Silva, Sara Alexandra Teixeira da
Publicado em: (2018)
por: Silva, Sara Alexandra Teixeira da
Publicado em: (2018)
article Telling things apart: the distance between response keys influences categorization times
por: Lakens, D.
Publicado em: (2011)
por: Lakens, D.
Publicado em: (2011)
article The impact of stimuli color in lexical decision and semantic word categorization tasks
por: Garrido, M. V.
Publicado em: (2019)
por: Garrido, M. V.
Publicado em: (2019)
article Voice matters: Social categorization and stereotyping of speakers based on sexual orientation and nationality categories
por: Fasoli, F.
Publicado em: (2023)
por: Fasoli, F.
Publicado em: (2023)
article Norms regarding secondary victimization of bullying victims: Do they differ according to the victim's categorization?
por: Correia, I.
Publicado em: (2010)
por: Correia, I.
Publicado em: (2010)
article A self-organizing map clustering approach to support territorial zoning
por: Silva, Marcos A. S. da
Publicado em: (2024)
por: Silva, Marcos A. S. da
Publicado em: (2024)
article We see the orange not the lemon: typicality effects in ultra-rapid categorization in adults with and without autism spectrum disorder
por: Costa Do Carmo, Joana
Publicado em: (2020)
por: Costa Do Carmo, Joana
Publicado em: (2020)
article Categorize readmitted patients in Intensive Medicine by means of Clustering Data Mining
por: Veloso, Rui Pedro Brás
Publicado em: (2017)
por: Veloso, Rui Pedro Brás
Publicado em: (2017)
article Multi-objective clustering algorithm applied to the mathE categorization problem
por: Azevedo, Beatriz Flamia
Publicado em: (2026)
por: Azevedo, Beatriz Flamia
Publicado em: (2026)
groups On cluster analysis of complex and heterogeneous data
por: Bacelar-Nicolau, Helena
Publicado em: (2014)
por: Bacelar-Nicolau, Helena
Publicado em: (2014)
article Selection of variables in Discrete Discriminant Analysis
por: Marques, A.
Publicado em: (2013)
por: Marques, A.
Publicado em: (2013)
groups When the young think that every plant is parsley. Social variability of ethnobotanical knowledge and plant categorization in two rural areas from Portugal
por: Carvalho, Ana Maria
Publicado em: (2009)
por: Carvalho, Ana Maria
Publicado em: (2009)
article Perceptions of quantitative methods in higher education: mapping student profiles
por: Ramos, Madalena
Publicado em: (2011)
por: Ramos, Madalena
Publicado em: (2011)
groups New PCA-based category encoder for efficient data processing in IoT devices
por: Farkhari, H.
Publicado em: (2022)
por: Farkhari, H.
Publicado em: (2022)
Registos relacionados
-
groups Clustering and selecting categorical features
por: Silvestre, C.
Publicado em: (2013) -
article Feature selection for clustering categorical data with an embedded modelling approach
por: Silvestre, Cláudia
Publicado em: (2014) -
groups Clustering and selecting categorical features
por: Silvestre, Cláudia
Publicado em: (2013) -
article Feature selection for clustering categorical data with an embedded modeling approach
por: Silvestre, C.
Publicado em: (2015) -
article Enhancing the selection of a model-based clustering with external categorical variables
por: Baudry, Jean-Patrick
Publicado em: (2015)