Publicação

Enabling network inference methods to handle missing data and outliers

Ver documento

Detalhes bibliográficos
Resumo:The inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction.
Autores principais:Folch-Fortuny, Abel
Outros Autores:Villaverde, Alejandro F.; Ferrer, Alberto; Banga, Julio R.
Assunto:Network inference Missing data Outlier detection Projection to latent structures Trimmed scores regression Information theory Mutual information
Ano:2015
País:Portugal
Tipo de documento:artigo
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:inglês
Origem:RepositóriUM - Universidade do Minho
_version_ 1866876385014841344
author Folch-Fortuny, Abel
author2 Villaverde, Alejandro F.
Ferrer, Alberto
Banga, Julio R.
author2_role author
author
author
author_facet Folch-Fortuny, Abel
Villaverde, Alejandro F.
Ferrer, Alberto
Banga, Julio R.
author_role author
contributor_name_str_mv Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Folch-Fortuny, Abel\"},{\"Person.name\":\"Villaverde, Alejandro F.\"},{\"Person.name\":\"Ferrer, Alberto\"},{\"Person.name\":\"Banga, Julio R.\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Folch-Fortuny, Abel
Villaverde, Alejandro F.
Ferrer, Alberto
Banga, Julio R.
datacite.date.Accepted.fl_str_mv 2015-01-01T00:00:00Z
datacite.date.available.fl_str_mv 2015-09-10T14:36:34Z
datacite.date.embargoed.fl_str_mv 2015-09-10T14:36:34Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Network inference
Missing data
Outlier detection
Projection to latent structures
Trimmed scores regression
Information theory
Mutual information
datacite.titles.title.fl_str_mv Enabling network inference methods to handle missing data and outliers
dc.contributor.none.fl_str_mv Universidade do Minho
dc.creator.none.fl_str_mv Folch-Fortuny, Abel
Villaverde, Alejandro F.
Ferrer, Alberto
Banga, Julio R.
dc.date.Accepted.fl_str_mv 2015-01-01T00:00:00Z
dc.date.available.fl_str_mv 2015-09-10T14:36:34Z
dc.date.embargoed.fl_str_mv 2015-09-10T14:36:34Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/37041
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv BioMed Central (BMC)
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Network inference
Missing data
Outlier detection
Projection to latent structures
Trimmed scores regression
Information theory
Mutual information
dc.title.fl_str_mv Enabling network inference methods to handle missing data and outliers
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_6501
description The inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction.
dirty 0
eu_rights_str_mv openAccess
format article
fulltext.url.fl_str_mv https://prod-dspace.uminho.pt/bitstreams/797ef8c2-05dc-4710-9ad4-6d82fbed271f/download
id rum_ac08099e2c02f8d6360ca62e6dd8c3cd
identifier.url.fl_str_mv https://hdl.handle.net/1822/37041
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language eng
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/37041
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Folch-Fortuny, Abel
Villaverde, Alejandro F.
Ferrer, Alberto
Banga, Julio R.
publishDate 2015
publisher.none.fl_str_mv BioMed Central (BMC)
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling engBioMed Central (BMC)porThe inference of complex networks from data is a challenging problem in biological sciences, as well as in a wide range of disciplines such as chemistry, technology, economics, or sociology. The quantity and quality of the data greatly affect the results. While many methodologies have been developed for this task, they seldom take into account issues such as missing data or outlier detection and correction, which need to be properly addressed before network inference. Results Here we present an approach to (i) handle missing data and (ii) detect and correct outliers based on multivariate projection to latent structures. The method, called trimmed scores regression (TSR), enables network inference methods to analyse incomplete datasets by imputing the missing values coherently with the latent data structure. Furthermore, it substitutes the faulty values in a dataset by proper estimations. We provide an implementation of this approach, and show how it can be integrated with any network inference method as a preliminary data curation step. This functionality is demonstrated with a state of the art network inference method based on mutual information distance and entropy reduction, MIDER. Conclusion The methodology presented here enables network inference methods to analyse a large number of incomplete and faulty datasets that could not be reliably analysed so far. Our comparative studies show the superiority of TSR over other missing data approaches used by practitioners. Furthermore, the method allows for outlier detection and correction.application/pdfporEnabling network inference methods to handle missing data and outliersFolch-Fortuny, AbelVillaverde, Alejandro F.Ferrer, AlbertoBanga, Julio R.HostingInstitutionOrganizationalUniversidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptISSNIsPartOf1471-2105DOIIsPartOf10.1186/s12859-015-0717-72015-09-10T14:36:34Z20152015-09-07T13:35:35Z2015-01-01T00:00:00ZHandlehttps://hdl.handle.net/1822/37041http://purl.org/coar/access_right/c_abf2open accessNetwork inferenceMissing dataOutlier detectionProjection to latent structuresTrimmed scores regressionInformation theoryMutual information873026 bytesliteraturehttp://purl.org/coar/resource_type/c_6501journal articlehttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://prod-dspace.uminho.pt/bitstreams/797ef8c2-05dc-4710-9ad4-6d82fbed271f/download
spellingShingle Enabling network inference methods to handle missing data and outliers
Folch-Fortuny, Abel
Network inference
Missing data
Outlier detection
Projection to latent structures
Trimmed scores regression
Information theory
Mutual information
status SINGLETON
subject.fl_str_mv Network inference
Missing data
Outlier detection
Projection to latent structures
Trimmed scores regression
Information theory
Mutual information
title Enabling network inference methods to handle missing data and outliers
title_full Enabling network inference methods to handle missing data and outliers
title_fullStr Enabling network inference methods to handle missing data and outliers
title_full_unstemmed Enabling network inference methods to handle missing data and outliers
title_short Enabling network inference methods to handle missing data and outliers
title_sort Enabling network inference methods to handle missing data and outliers
topic Network inference
Missing data
Outlier detection
Projection to latent structures
Trimmed scores regression
Information theory
Mutual information
topic_facet Network inference
Missing data
Outlier detection
Projection to latent structures
Trimmed scores regression
Information theory
Mutual information
url https://hdl.handle.net/1822/37041
visible 1