Publicação
Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
| Resumo: | Seafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood. |
|---|---|
| Autores principais: | Yokochi, Clara |
| Outros Autores: | Bispo, Regina; Ricardo, Fernando; Calado, Ricardo |
| Assunto: | Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| Ano: | 2023 |
| País: | Portugal |
| Tipo de documento: | artigo |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
| _version_ | 1865920616224260096 |
|---|---|
| author | Yokochi, Clara |
| author2 | Bispo, Regina Ricardo, Fernando Calado, Ricardo |
| author2_role | author author author |
| author_facet | Yokochi, Clara Yokochi, Clara Bispo, Regina Ricardo, Fernando Calado, Ricardo Bispo, Regina Ricardo, Fernando Calado, Ricardo |
| author_role | author |
| contributor_name_str_mv | CMA - Centro de Matemática e Aplicações DM - Departamento de Matemática Taylor & Francis RUN |
| country_str | PT |
| creators_json_str | [{\"Person.name\":\"Yokochi, Clara\"},{\"Person.name\":\"Bispo, Regina\"},{\"Person.name\":\"Ricardo, Fernando\"},{\"Person.name\":\"Calado, Ricardo\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | CMA - Centro de Matemática e Aplicações DM - Departamento de Matemática Taylor & Francis RUN |
| datacite.creators.creator.creatorName.fl_str_mv | Yokochi, Clara Bispo, Regina Ricardo, Fernando Calado, Ricardo |
| datacite.date.Accepted.fl_str_mv | 2023-09-01T00:00:00Z |
| datacite.date.available.fl_str_mv | 2024-02-23T23:55:28Z |
| datacite.date.embargoed.fl_str_mv | 2024-02-23T23:55:28Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| datacite.titles.title.fl_str_mv | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| dc.contributor.none.fl_str_mv | CMA - Centro de Matemática e Aplicações DM - Departamento de Matemática Taylor & Francis RUN |
| dc.creator.none.fl_str_mv | Yokochi, Clara Bispo, Regina Ricardo, Fernando Calado, Ricardo |
| dc.date.Accepted.fl_str_mv | 2023-09-01T00:00:00Z |
| dc.date.available.fl_str_mv | 2024-02-23T23:55:28Z |
| dc.date.embargoed.fl_str_mv | 2024-02-23T23:55:28Z |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | http://hdl.handle.net/10362/164075 |
| dc.language.none.fl_str_mv | eng |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.subject.none.fl_str_mv | Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| dc.title.fl_str_mv | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_6501 |
| description | Seafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | article |
| fulltext.url.fl_str_mv | https://run.unl.pt/bitstreams/3876048b-867c-4d9a-840c-992adf229060/download |
| funding.funder.alternateName_str_mv | FCT FCT |
| funding.funder.identifier_str_mv | http://doi.org/10.13039/501100001871 http://doi.org/10.13039/501100001871 |
| funding.funder.name_str_mv | Fundação para a Ciência e a Tecnologia Fundação para a Ciência e a Tecnologia |
| funding.name_str_mv | 6817 - DCRRNI ID 6817 - DCRRNI ID |
| id | run_6fc31eef8e60fa3dface1ccbdb40288a |
| identifier.url.fl_str_mv | http://hdl.handle.net/10362/164075 |
| instacron_str | unl |
| institution | Universidade Nova de Lisboa |
| instname_str | Universidade Nova de Lisboa |
| language | eng |
| network_acronym_str | run |
| network_name_str | Repositório Institucional da UNL |
| oai_identifier_str | oai:run.unl.pt:10362/164075 |
| organization_str_mv | urn:organizationAcronym:unl |
| person_str_mv | Yokochi, Clara Bispo, Regina Ricardo, Fernando Calado, Ricardo |
| publishDate | 2023 |
| reponame_str | Repositório Institucional da UNL |
| repository_id_str | urn:repositoryAcronym:run |
| service_str_mv | urn:repositoryAcronym:run |
| spelling | engenSeafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood.application/pdfenRegularization Methods for High-Dimensional Data as a Tool for Seafood TraceabilityYokochi, ClaraBispo, ReginaRicardo, FernandoCalado, RicardoCMA - Centro de Matemática e AplicaçõesDM - Departamento de MatemáticaTaylor & FrancisHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptISSNIsPartOf1559-8608URNIsPartOfPURE: 83886940URNIsPartOfPURE UUID: d2a2cff3-7736-4810-816e-7d4516483f67URNIsPartOfScopus: 85169556874URNIsPartOfWOS: 001058567900001URNIsPartOfORCID: /0000-0002-6723-2557/work/153923453DOIIsPartOf10.1007/s42519-023-00341-82024-02-23T23:55:28Z2023-092023-09-01T00:00:00ZHandlehttp://hdl.handle.net/10362/164075http://purl.org/coar/access_right/c_abf2open accessElastic netLASSORegularizationRidge regressionTraceabilityStatistics and ProbabilitySDG 3 - Good Health and Well-beingSDG 14 - Life Below Water704949 bytesFundação para a Ciência e a TecnologiaCenter for Mathematics and Applications6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaCenter for Mathematics and Applications6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871literaturehttp://purl.org/coar/resource_type/c_6501journal articlehttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/3876048b-867c-4d9a-840c-992adf229060/download |
| spellingShingle | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability Yokochi, Clara Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water Yokochi, Clara Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| status | NEW |
| subject.fl_str_mv | Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| title | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| title_full | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| title_fullStr | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| title_full_unstemmed | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| title_short | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| title_sort | Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability |
| topic | Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| topic_facet | Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water |
| url | http://hdl.handle.net/10362/164075 |
| visible | 1 |