Publicação
Categorical data clustering using a minimum message length criterion
| Resumo: | Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets. |
|---|---|
| Autores principais: | Silvestre, Cláudia |
| Outros Autores: | Cardoso, Margarida; Figueiredo, Mário |
| Assunto: | Cluster analysis Categorical data Expectation-maximization algorithm MML - Minimum Message Lenght - criterion |
| Ano: | 2012 |
| País: | Portugal |
| Tipo de documento: | documento de conferência |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Científico do Instituto Politécnico de Lisboa |
Registos relacionados
groups O critério Minimum Message Length na estimação de modelos de mistura sobre dados mistos
por: Silvestre, Cláudia
Publicado em: (2017)
por: Silvestre, Cláudia
Publicado em: (2017)
book O critério Minimum Message Length na estimação de modelos de mistura sobre dados mistos
por: Silvestre, Cláudia
Publicado em: (2020)
por: Silvestre, Cláudia
Publicado em: (2020)
groups Clustering and selecting categorical features
por: Silvestre, Cláudia
Publicado em: (2013)
por: Silvestre, Cláudia
Publicado em: (2013)
article Feature selection for clustering categorical data with an embedded modelling approach
por: Silvestre, Cláudia
Publicado em: (2014)
por: Silvestre, Cláudia
Publicado em: (2014)
groups The number of clusters on trust
por: Silvestre, Cláudia
Publicado em: (2016)
por: Silvestre, Cláudia
Publicado em: (2016)
groups An MML embedded approach for estimating the number of clusters
por: Silvestre, Cláudia
Publicado em: (2022)
por: Silvestre, Cláudia
Publicado em: (2022)
book An MML embedded approach for estimating the number of clusters
por: Silvestre, Cláudia
Publicado em: (2023)
por: Silvestre, Cláudia
Publicado em: (2023)
groups A clustering view on ESS measures of political interest: an EM-MML approach
por: Silvestre, Cláudia
Publicado em: (2017)
por: Silvestre, Cláudia
Publicado em: (2017)
groups Model selection in discrete clustering: the EM-MML algorithm
por: Silvestre, Cláudia
Publicado em: (2016)
por: Silvestre, Cláudia
Publicado em: (2016)
groups Determining the number of clusters in categorical data
por: Silvestre, Cláudia
Publicado em: (2013)
por: Silvestre, Cláudia
Publicado em: (2013)
article Spectrometric differentiation of yeast strains using minimum volume increase and minimum direction change clustering criteria
por: Fachada, Nuno
Publicado em: (2014)
por: Fachada, Nuno
Publicado em: (2014)
article Enhancing the selection of a model-based clustering with external categorical variables
por: Baudry, Jean-Patrick
Publicado em: (2015)
por: Baudry, Jean-Patrick
Publicado em: (2015)
article A Categorical Clustering of Publishers for Mobile Performance Marketing
por: Silva, Susana
Publicado em: (2019)
por: Silva, Susana
Publicado em: (2019)
article Image Transfer over MQTT in IoT: Message Segmentation and Encryption for Remote Indicator Panels
por: Valente, David
Publicado em: (2024)
por: Valente, David
Publicado em: (2024)
article High performance multithreaded message passing on a myrinet cluster
por: Alves, Albano
Publicado em: (2002)
por: Alves, Albano
Publicado em: (2002)
article Age-related decline in intestinal villus length : a cross-sectional study on the human gut
por: Vara-Luiz, Francisco
Publicado em: (2026)
por: Vara-Luiz, Francisco
Publicado em: (2026)
groups Comunicação nas federações desportivas: um estudo quantitativo
por: Silvestre, Cláudia
Publicado em: (2018)
por: Silvestre, Cláudia
Publicado em: (2018)
article Evaluating applications performance in a multi-networked cluster
por: Alves, Albano
Publicado em: (2003)
por: Alves, Albano
Publicado em: (2003)
article Establishing a new pasteurization criterion based on Alicyclobacillus acidoterrestris spores for shelf-stable high-acidic fruit products
por: Silva, Filipa M.
Publicado em: (2000)
por: Silva, Filipa M.
Publicado em: (2000)
article CoR's Faster Route over Myrinet
por: Pina, António
Publicado em: (2000)
por: Pina, António
Publicado em: (2000)
article On the impact of message brokers implementations in the choreography of microservices
por: Ahmed Gamal Ibrahim
Publicado em: (2025)
por: Ahmed Gamal Ibrahim
Publicado em: (2025)
school Self-categorization, multiple categorization and well-being of homeless population in Lisbon
por: Solerti, Francesca
Publicado em: (2021)
por: Solerti, Francesca
Publicado em: (2021)
school Desagregação temporal de sucessões cronológicas: uma aplicação à trimestralização do consumo privado e da formação bruta de capital fixo Caboverdianos
por: Santos, Sara Emília de Figueiredo
Publicado em: (2009)
por: Santos, Sara Emília de Figueiredo
Publicado em: (2009)
article Hyperspectral unmixing based on mixtures of Dirichlet components
por: Nascimento, Jose
Publicado em: (2012)
por: Nascimento, Jose
Publicado em: (2012)
article Anaxarchus of Abdera: Adiaphoria and Criterion of Truth on the Threshold of Hellenistic Age
por: Pajón Leyra, Ignacio
Publicado em: (2019)
por: Pajón Leyra, Ignacio
Publicado em: (2019)
school Automatization of incident categorization
por: Silva, Sara Alexandra Teixeira da
Publicado em: (2018)
por: Silva, Sara Alexandra Teixeira da
Publicado em: (2018)
article Multilayer perceptron network optimization for chaotic time series modeling
por: Mu Qiao
Publicado em: (2023)
por: Mu Qiao
Publicado em: (2023)
article Using mindsets to boost health: how construal level and goal pursuit shape health message effectiveness on cessation behaviors
por: Herter, M. M.
Publicado em: (2022)
por: Herter, M. M.
Publicado em: (2022)
article Landscape mosaic composition and mean contributive value index
por: Casquilho, José Pinto
Publicado em: (2010)
por: Casquilho, José Pinto
Publicado em: (2010)
article Multi-objective clustering algorithm applied to the mathE categorization problem
por: Azevedo, Beatriz Flamia
Publicado em: (2026)
por: Azevedo, Beatriz Flamia
Publicado em: (2026)
article We see the orange not the lemon: typicality effects in ultra-rapid categorization in adults with and without autism spectrum disorder
por: Costa Do Carmo, Joana
Publicado em: (2020)
por: Costa Do Carmo, Joana
Publicado em: (2020)
groups A MAP approach to evidence accumulation clustering
por: Lourenço, André Ribeiro
Publicado em: (2015)
por: Lourenço, André Ribeiro
Publicado em: (2015)
draft Disentangling the minimum wage puzzle : an analysis of job accessions and separations from a longitudinal matched employer- employee data set
por: Portugal, Pedro
Publicado em: (2001)
por: Portugal, Pedro
Publicado em: (2001)
article Use of text messaging services to promote health behaviors in children
por: Fassnacht, Daniel Benjamin
Publicado em: (2015)
por: Fassnacht, Daniel Benjamin
Publicado em: (2015)
article RoCL: a resource oriented communication library
por: Alves, Albano
Publicado em: (2003)
por: Alves, Albano
Publicado em: (2003)
school Message framing to encourage sustainable fashion consumption : the interaction of materialism and gender
por: Nägele, Christian
Publicado em: (2018)
por: Nägele, Christian
Publicado em: (2018)
school IMBus: instant Messaging bus
por: Vicente, Carlos Filipe Figueira
Publicado em: (2010)
por: Vicente, Carlos Filipe Figueira
Publicado em: (2010)
groups Minimum social standards in Portugal: changes and adequacy from the pre to the post austerity period
por: Branco, Francisco
Publicado em: (2017)
por: Branco, Francisco
Publicado em: (2017)
groups Minimum Social Standards in Portugal: changes and adequacy from the pre to the post austerity period
por: Pereira, Elvira
Publicado em: (2017)
por: Pereira, Elvira
Publicado em: (2017)
groups Minimum Social Standards in Portugal: changes and adequacy from the pre to the post austerity period
por: Pereira, E.
Publicado em: (2017)
por: Pereira, E.
Publicado em: (2017)
Registos relacionados
-
groups O critério Minimum Message Length na estimação de modelos de mistura sobre dados mistos
por: Silvestre, Cláudia
Publicado em: (2017) -
book O critério Minimum Message Length na estimação de modelos de mistura sobre dados mistos
por: Silvestre, Cláudia
Publicado em: (2020) -
groups Clustering and selecting categorical features
por: Silvestre, Cláudia
Publicado em: (2013) -
article Feature selection for clustering categorical data with an embedded modelling approach
por: Silvestre, Cláudia
Publicado em: (2014) -
groups The number of clusters on trust
por: Silvestre, Cláudia
Publicado em: (2016)