Publicação
Robust identification of target genes and outliers in triple-negative breast cancer data
| Resumo: | Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data. |
|---|---|
| Autores principais: | Segaert, Pieter |
| Outros Autores: | Lopes, Marta B.; Casimiro, Sandra; Vinga, Susana; Rousseeuw, Peter J. |
| Assunto: | Logistic regression Cellwise outliers Gene networks Sparsity |
| Ano: | 2018 |
| País: | Portugal |
| Tipo de documento: | artigo |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório da Universidade de Lisboa |
| _version_ | 1866810844180905984 |
|---|---|
| author | Segaert, Pieter |
| author2 | Lopes, Marta B. Casimiro, Sandra Vinga, Susana Rousseeuw, Peter J. |
| author2_role | author author author author |
| author_facet | Segaert, Pieter Lopes, Marta B. Casimiro, Sandra Vinga, Susana Rousseeuw, Peter J. |
| author_role | author |
| contributor_name_str_mv | Repositório Científico de Acesso Aberto da ULisboa |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Segaert, Pieter\"},{\"Person.name\":\"Lopes, Marta B.\",\"Person.identifier.orcid\":\"0000-0002-4135-1857\"},{\"Person.name\":\"Casimiro, Sandra\",\"Person.identifier.orcid\":\"0000-0002-6917-4477\"},{\"Person.name\":\"Vinga, Susana\",\"Person.identifier.orcid\":\"0000-0002-1954-5487\"},{\"Person.name\":\"Rousseeuw, Peter J.\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | Repositório Científico de Acesso Aberto da ULisboa |
| datacite.creators.creator.creatorName.fl_str_mv | Segaert, Pieter Lopes, Marta B. Casimiro, Sandra Vinga, Susana Rousseeuw, Peter J. |
| datacite.date.Accepted.fl_str_mv | 2018-07-04T00:00:00Z |
| datacite.date.available.fl_str_mv | 2022-09-23T14:09:22Z |
| datacite.date.embargoed.fl_str_mv | 2022-09-23T14:09:22Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | Logistic regression Cellwise outliers Gene networks Sparsity |
| datacite.titles.title.fl_str_mv | Robust identification of target genes and outliers in triple-negative breast cancer data |
| dc.contributor.none.fl_str_mv | Repositório Científico de Acesso Aberto da ULisboa |
| dc.creator.none.fl_str_mv | Segaert, Pieter Lopes, Marta B. Casimiro, Sandra Vinga, Susana Rousseeuw, Peter J. |
| dc.date.Accepted.fl_str_mv | 2018-07-04T00:00:00Z |
| dc.date.available.fl_str_mv | 2022-09-23T14:09:22Z |
| dc.date.embargoed.fl_str_mv | 2022-09-23T14:09:22Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10451/54573 |
| dc.language.none.fl_str_mv | eng |
| dc.publisher.none.fl_str_mv | SAGE |
| dc.rights.cclincense.fl_str_mv | http://creativecommons.org/licenses/by-nc/4.0/ |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | Logistic regression Cellwise outliers Gene networks Sparsity |
| dc.title.fl_str_mv | Robust identification of target genes and outliers in triple-negative breast cancer data |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_6501 |
| description | Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | article |
| fulltext.url.fl_str_mv | https://repositorio.ulisboa.pt/bitstreams/70d461a0-2f74-4207-a438-45a39d80e55e/download |
| funding.funder.alternateName_str_mv | EC FCT FCT FCT FCT |
| funding.funder.identifier_str_mv | http://doi.org/10.13039/501100008530 http://doi.org/10.13039/501100001871 http://doi.org/10.13039/501100001871 http://doi.org/10.13039/501100001871 http://doi.org/10.13039/501100001871 |
| funding.funder.name_str_mv | European Commission Fundação para a Ciência e a Tecnologia Fundação para a Ciência e a Tecnologia Fundação para a Ciência e a Tecnologia Fundação para a Ciência e a Tecnologia |
| funding.name_str_mv | H2020 6817 - DCRRNI ID 6817 - DCRRNI ID 3599-PPCDT Investigador FCT |
| id | ul_a593fc0e12a4840bcb4f2bf11b9bef2e |
| identifier.url.fl_str_mv | http://hdl.handle.net/10451/54573 |
| instacron_str | ul |
| institution | Universidade de Lisboa |
| instname_str | Universidade de Lisboa |
| language | eng |
| network_acronym_str | ul |
| network_name_str | Repositório da Universidade de Lisboa |
| oai_identifier_str | oai:repositorio.ulisboa.pt:10451/54573 |
| organization_str_mv | urn:organizationAcronym:ul |
| person_str_mv | Segaert, Pieter Lopes, Marta B. Lopes, Marta B. https://www.ciencia-id.pt/FD16-A07F-7B12 FD16-A07F-7B12 http://orcid.org/0000-0002-4135-1857 0000-0002-4135-1857 Casimiro, Sandra Casimiro, Sandra https://www.ciencia-id.pt/0F12-5181-0B22 0F12-5181-0B22 http://orcid.org/0000-0002-6917-4477 0000-0002-6917-4477 Vinga, Susana Vinga, Susana https://www.ciencia-id.pt/9713-F74D-4805 9713-F74D-4805 http://orcid.org/0000-0002-1954-5487 0000-0002-1954-5487 Rousseeuw, Peter J. |
| publishDate | 2018 |
| publisher.none.fl_str_mv | SAGE |
| reponame_str | Repositório da Universidade de Lisboa |
| repository_id_str | urn:repositoryAcronym:ul |
| service_str_mv | urn:repositoryAcronym:ul |
| spelling | engSAGEpt_PTCorrect classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.application/pdfpt_PTRobust identification of target genes and outliers in triple-negative breast cancer dataSegaert, PieterPersonalLopes, Marta B.DSpacehttp://dspace.org/items/5a511048-f9f4-471c-93d0-5c637df97ac8DSpacehttp://dspace.org/items/5a511048-f9f4-471c-93d0-5c637df97ac8B. LopesMartaCiência IDhttps://www.ciencia-id.ptFD16-A07F-7B12ORCIDhttp://orcid.org0000-0002-4135-1857Researcher IDhttps://www.researcherid.comF-5378-2011Scopus Author IDhttps://www.scopus.com55489480400Scopus Author IDhttps://www.scopus.com7202369144Scopus Author IDhttps://www.scopus.com55489480400PersonalCasimiro, SandraDSpacehttp://dspace.org/items/fd665de1-a7e6-4638-94c3-ecc645f84607DSpacehttp://dspace.org/items/fd665de1-a7e6-4638-94c3-ecc645f84607Cara de Anjo CasimiroSandra CristinaCiência IDhttps://www.ciencia-id.pt0F12-5181-0B22ORCIDhttp://orcid.org0000-0002-6917-4477Scopus Author IDhttps://www.scopus.com14043403400PersonalVinga, SusanaDSpacehttp://dspace.org/items/d7e30d0b-702b-4588-8bb1-5dcd3ee6f6e7DSpacehttp://dspace.org/items/d7e30d0b-702b-4588-8bb1-5dcd3ee6f6e7VingaSusanaCiência IDhttps://www.ciencia-id.pt9713-F74D-4805ORCIDhttp://orcid.org0000-0002-1954-5487Scopus Author IDhttps://www.scopus.com55893670600Rousseeuw, Peter J.HostingInstitutionOrganizationalRepositório Científico de Acesso Aberto da ULisboae-mailmailto:repositorio@reitoria.ulisboa.ptrepositorio@reitoria.ulisboa.ptISSNIsPartOf0962-2802DOIIsPartOf10.1177/09622802187947222022-09-23T14:09:22Z2018-07-042018-07-04T00:00:00ZHandlehttp://hdl.handle.net/10451/54573http://purl.org/coar/access_right/c_abf2open accessLogistic regressionCellwise outliersGene networksSparsity609162 bytesEuropean CommissionStatistical multi-Omics UNDerstanding of Patient SamplesH2020Crossref Funder IDhttp://doi.org/10.13039/501100008530Fundação para a Ciência e a TecnologiaAssociate Laboratory of Energy, Transports and Aeronautics6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaInstituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaPERSEIDS - Personalizing cancer therapy through integrated modeling and decision3599-PPCDTCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaIntegrative computational physiologyInvestigador FCTCrossref Funder IDhttp://doi.org/10.13039/501100001871literaturehttp://purl.org/coar/resource_type/c_6501journal article2018-07-04http://creativecommons.org/licenses/by-nc/4.0/http://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorio.ulisboa.pt/bitstreams/70d461a0-2f74-4207-a438-45a39d80e55e/downloadStatistical Methods in Medical Research2810-1130423056 |
| spellingShingle | Robust identification of target genes and outliers in triple-negative breast cancer data Segaert, Pieter Logistic regression Cellwise outliers Gene networks Sparsity |
| status | SINGLETON |
| subject.fl_str_mv | Logistic regression Cellwise outliers Gene networks Sparsity |
| title | Robust identification of target genes and outliers in triple-negative breast cancer data |
| title_full | Robust identification of target genes and outliers in triple-negative breast cancer data |
| title_fullStr | Robust identification of target genes and outliers in triple-negative breast cancer data |
| title_full_unstemmed | Robust identification of target genes and outliers in triple-negative breast cancer data |
| title_short | Robust identification of target genes and outliers in triple-negative breast cancer data |
| title_sort | Robust identification of target genes and outliers in triple-negative breast cancer data |
| topic | Logistic regression Cellwise outliers Gene networks Sparsity |
| topic_facet | Logistic regression Cellwise outliers Gene networks Sparsity |
| url | http://hdl.handle.net/10451/54573 |
| visible | 1 |