Publicação

Robust identification of target genes and outliers in triple-negative breast cancer data

Ver documento

Detalhes bibliográficos
Resumo:Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.
Autores principais:Segaert, Pieter
Outros Autores:Lopes, Marta B.; Casimiro, Sandra; Vinga, Susana; Rousseeuw, Peter J.
Assunto:Logistic regression Cellwise outliers Gene networks Sparsity
Ano:2018
País:Portugal
Tipo de documento:artigo
Tipo de acesso:acesso aberto
Instituição associada:Universidade de Lisboa
Idioma:inglês
Origem:Repositório da Universidade de Lisboa
_version_ 1866810844180905984
author Segaert, Pieter
author2 Lopes, Marta B.
Casimiro, Sandra
Vinga, Susana
Rousseeuw, Peter J.
author2_role author
author
author
author
author_facet Segaert, Pieter
Lopes, Marta B.
Casimiro, Sandra
Vinga, Susana
Rousseeuw, Peter J.
author_role author
contributor_name_str_mv Repositório Científico de Acesso Aberto da ULisboa
country_str PT
creators_json_txt [{\"Person.name\":\"Segaert, Pieter\"},{\"Person.name\":\"Lopes, Marta B.\",\"Person.identifier.orcid\":\"0000-0002-4135-1857\"},{\"Person.name\":\"Casimiro, Sandra\",\"Person.identifier.orcid\":\"0000-0002-6917-4477\"},{\"Person.name\":\"Vinga, Susana\",\"Person.identifier.orcid\":\"0000-0002-1954-5487\"},{\"Person.name\":\"Rousseeuw, Peter J.\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Repositório Científico de Acesso Aberto da ULisboa
datacite.creators.creator.creatorName.fl_str_mv Segaert, Pieter
Lopes, Marta B.
Casimiro, Sandra
Vinga, Susana
Rousseeuw, Peter J.
datacite.date.Accepted.fl_str_mv 2018-07-04T00:00:00Z
datacite.date.available.fl_str_mv 2022-09-23T14:09:22Z
datacite.date.embargoed.fl_str_mv 2022-09-23T14:09:22Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Logistic regression
Cellwise outliers
Gene networks
Sparsity
datacite.titles.title.fl_str_mv Robust identification of target genes and outliers in triple-negative breast cancer data
dc.contributor.none.fl_str_mv Repositório Científico de Acesso Aberto da ULisboa
dc.creator.none.fl_str_mv Segaert, Pieter
Lopes, Marta B.
Casimiro, Sandra
Vinga, Susana
Rousseeuw, Peter J.
dc.date.Accepted.fl_str_mv 2018-07-04T00:00:00Z
dc.date.available.fl_str_mv 2022-09-23T14:09:22Z
dc.date.embargoed.fl_str_mv 2022-09-23T14:09:22Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10451/54573
dc.language.none.fl_str_mv eng
dc.publisher.none.fl_str_mv SAGE
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Logistic regression
Cellwise outliers
Gene networks
Sparsity
dc.title.fl_str_mv Robust identification of target genes and outliers in triple-negative breast cancer data
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_6501
description Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.
dirty 0
eu_rights_str_mv openAccess
format article
fulltext.url.fl_str_mv https://repositorio.ulisboa.pt/bitstreams/70d461a0-2f74-4207-a438-45a39d80e55e/download
funding.funder.alternateName_str_mv EC
FCT
FCT
FCT
FCT
funding.funder.identifier_str_mv http://doi.org/10.13039/501100008530
http://doi.org/10.13039/501100001871
http://doi.org/10.13039/501100001871
http://doi.org/10.13039/501100001871
http://doi.org/10.13039/501100001871
funding.funder.name_str_mv European Commission
Fundação para a Ciência e a Tecnologia
Fundação para a Ciência e a Tecnologia
Fundação para a Ciência e a Tecnologia
Fundação para a Ciência e a Tecnologia
funding.name_str_mv H2020
6817 - DCRRNI ID
6817 - DCRRNI ID
3599-PPCDT
Investigador FCT
id ul_a593fc0e12a4840bcb4f2bf11b9bef2e
identifier.url.fl_str_mv http://hdl.handle.net/10451/54573
instacron_str ul
institution Universidade de Lisboa
instname_str Universidade de Lisboa
language eng
network_acronym_str ul
network_name_str Repositório da Universidade de Lisboa
oai_identifier_str oai:repositorio.ulisboa.pt:10451/54573
organization_str_mv urn:organizationAcronym:ul
person_str_mv Segaert, Pieter
Lopes, Marta B.
Lopes, Marta B.
https://www.ciencia-id.pt/FD16-A07F-7B12
FD16-A07F-7B12
http://orcid.org/0000-0002-4135-1857
0000-0002-4135-1857
Casimiro, Sandra
Casimiro, Sandra
https://www.ciencia-id.pt/0F12-5181-0B22
0F12-5181-0B22
http://orcid.org/0000-0002-6917-4477
0000-0002-6917-4477
Vinga, Susana
Vinga, Susana
https://www.ciencia-id.pt/9713-F74D-4805
9713-F74D-4805
http://orcid.org/0000-0002-1954-5487
0000-0002-1954-5487
Rousseeuw, Peter J.
publishDate 2018
publisher.none.fl_str_mv SAGE
reponame_str Repositório da Universidade de Lisboa
repository_id_str urn:repositoryAcronym:ul
service_str_mv urn:repositoryAcronym:ul
spelling engSAGEpt_PTCorrect classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.application/pdfpt_PTRobust identification of target genes and outliers in triple-negative breast cancer dataSegaert, PieterPersonalLopes, Marta B.DSpacehttp://dspace.org/items/5a511048-f9f4-471c-93d0-5c637df97ac8DSpacehttp://dspace.org/items/5a511048-f9f4-471c-93d0-5c637df97ac8B. LopesMartaCiência IDhttps://www.ciencia-id.ptFD16-A07F-7B12ORCIDhttp://orcid.org0000-0002-4135-1857Researcher IDhttps://www.researcherid.comF-5378-2011Scopus Author IDhttps://www.scopus.com55489480400Scopus Author IDhttps://www.scopus.com7202369144Scopus Author IDhttps://www.scopus.com55489480400PersonalCasimiro, SandraDSpacehttp://dspace.org/items/fd665de1-a7e6-4638-94c3-ecc645f84607DSpacehttp://dspace.org/items/fd665de1-a7e6-4638-94c3-ecc645f84607Cara de Anjo CasimiroSandra CristinaCiência IDhttps://www.ciencia-id.pt0F12-5181-0B22ORCIDhttp://orcid.org0000-0002-6917-4477Scopus Author IDhttps://www.scopus.com14043403400PersonalVinga, SusanaDSpacehttp://dspace.org/items/d7e30d0b-702b-4588-8bb1-5dcd3ee6f6e7DSpacehttp://dspace.org/items/d7e30d0b-702b-4588-8bb1-5dcd3ee6f6e7VingaSusanaCiência IDhttps://www.ciencia-id.pt9713-F74D-4805ORCIDhttp://orcid.org0000-0002-1954-5487Scopus Author IDhttps://www.scopus.com55893670600Rousseeuw, Peter J.HostingInstitutionOrganizationalRepositório Científico de Acesso Aberto da ULisboae-mailmailto:repositorio@reitoria.ulisboa.ptrepositorio@reitoria.ulisboa.ptISSNIsPartOf0962-2802DOIIsPartOf10.1177/09622802187947222022-09-23T14:09:22Z2018-07-042018-07-04T00:00:00ZHandlehttp://hdl.handle.net/10451/54573http://purl.org/coar/access_right/c_abf2open accessLogistic regressionCellwise outliersGene networksSparsity609162 bytesEuropean CommissionStatistical multi-Omics UNDerstanding of Patient SamplesH2020Crossref Funder IDhttp://doi.org/10.13039/501100008530Fundação para a Ciência e a TecnologiaAssociate Laboratory of Energy, Transports and Aeronautics6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaInstituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaPERSEIDS - Personalizing cancer therapy through integrated modeling and decision3599-PPCDTCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaIntegrative computational physiologyInvestigador FCTCrossref Funder IDhttp://doi.org/10.13039/501100001871literaturehttp://purl.org/coar/resource_type/c_6501journal article2018-07-04http://creativecommons.org/licenses/by-nc/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorio.ulisboa.pt/bitstreams/70d461a0-2f74-4207-a438-45a39d80e55e/downloadStatistical Methods in Medical Research2810-1130423056
spellingShingle Robust identification of target genes and outliers in triple-negative breast cancer data
Segaert, Pieter
Logistic regression
Cellwise outliers
Gene networks
Sparsity
status SINGLETON
subject.fl_str_mv Logistic regression
Cellwise outliers
Gene networks
Sparsity
title Robust identification of target genes and outliers in triple-negative breast cancer data
title_full Robust identification of target genes and outliers in triple-negative breast cancer data
title_fullStr Robust identification of target genes and outliers in triple-negative breast cancer data
title_full_unstemmed Robust identification of target genes and outliers in triple-negative breast cancer data
title_short Robust identification of target genes and outliers in triple-negative breast cancer data
title_sort Robust identification of target genes and outliers in triple-negative breast cancer data
topic Logistic regression
Cellwise outliers
Gene networks
Sparsity
topic_facet Logistic regression
Cellwise outliers
Gene networks
Sparsity
url http://hdl.handle.net/10451/54573
visible 1