Publicação

Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability

Ver documento

Detalhes bibliográficos
Resumo:Seafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood.
Autores principais:Yokochi, Clara
Outros Autores:Bispo, Regina; Ricardo, Fernando; Calado, Ricardo
Assunto:Elastic net LASSO Regularization Ridge regression Traceability Statistics and Probability SDG 3 - Good Health and Well-being SDG 14 - Life Below Water
Ano:2023
País:Portugal
Tipo de documento:artigo
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
_version_ 1865920616224260096
author Yokochi, Clara
author2 Bispo, Regina
Ricardo, Fernando
Calado, Ricardo
author2_role author
author
author
author_facet Yokochi, Clara
Yokochi, Clara
Bispo, Regina
Ricardo, Fernando
Calado, Ricardo
Bispo, Regina
Ricardo, Fernando
Calado, Ricardo
author_role author
contributor_name_str_mv CMA - Centro de Matemática e Aplicações
DM - Departamento de Matemática
Taylor & Francis
RUN
country_str PT
creators_json_str [{\"Person.name\":\"Yokochi, Clara\"},{\"Person.name\":\"Bispo, Regina\"},{\"Person.name\":\"Ricardo, Fernando\"},{\"Person.name\":\"Calado, Ricardo\"}]
datacite.contributors.contributor.contributorName.fl_str_mv CMA - Centro de Matemática e Aplicações
DM - Departamento de Matemática
Taylor & Francis
RUN
datacite.creators.creator.creatorName.fl_str_mv Yokochi, Clara
Bispo, Regina
Ricardo, Fernando
Calado, Ricardo
datacite.date.Accepted.fl_str_mv 2023-09-01T00:00:00Z
datacite.date.available.fl_str_mv 2024-02-23T23:55:28Z
datacite.date.embargoed.fl_str_mv 2024-02-23T23:55:28Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
datacite.titles.title.fl_str_mv Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
dc.contributor.none.fl_str_mv CMA - Centro de Matemática e Aplicações
DM - Departamento de Matemática
Taylor & Francis
RUN
dc.creator.none.fl_str_mv Yokochi, Clara
Bispo, Regina
Ricardo, Fernando
Calado, Ricardo
dc.date.Accepted.fl_str_mv 2023-09-01T00:00:00Z
dc.date.available.fl_str_mv 2024-02-23T23:55:28Z
dc.date.embargoed.fl_str_mv 2024-02-23T23:55:28Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv http://hdl.handle.net/10362/164075
dc.language.none.fl_str_mv eng
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.subject.none.fl_str_mv Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
dc.title.fl_str_mv Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_6501
description Seafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood.
dirty 0
eu_rights_str_mv openAccess
format article
fulltext.url.fl_str_mv https://run.unl.pt/bitstreams/3876048b-867c-4d9a-840c-992adf229060/download
funding.funder.alternateName_str_mv FCT
FCT
funding.funder.identifier_str_mv http://doi.org/10.13039/501100001871
http://doi.org/10.13039/501100001871
funding.funder.name_str_mv Fundação para a Ciência e a Tecnologia
Fundação para a Ciência e a Tecnologia
funding.name_str_mv 6817 - DCRRNI ID
6817 - DCRRNI ID
id run_6fc31eef8e60fa3dface1ccbdb40288a
identifier.url.fl_str_mv http://hdl.handle.net/10362/164075
instacron_str unl
institution Universidade Nova de Lisboa
instname_str Universidade Nova de Lisboa
language eng
network_acronym_str run
network_name_str Repositório Institucional da UNL
oai_identifier_str oai:run.unl.pt:10362/164075
organization_str_mv urn:organizationAcronym:unl
person_str_mv Yokochi, Clara
Bispo, Regina
Ricardo, Fernando
Calado, Ricardo
publishDate 2023
reponame_str Repositório Institucional da UNL
repository_id_str urn:repositoryAcronym:run
service_str_mv urn:repositoryAcronym:run
spelling engenSeafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood.application/pdfenRegularization Methods for High-Dimensional Data as a Tool for Seafood TraceabilityYokochi, ClaraBispo, ReginaRicardo, FernandoCalado, RicardoCMA - Centro de Matemática e AplicaçõesDM - Departamento de MatemáticaTaylor & FrancisHostingInstitutionOrganizationalRUNe-mailmailto:run@unl.ptrun@unl.ptISSNIsPartOf1559-8608URNIsPartOfPURE: 83886940URNIsPartOfPURE UUID: d2a2cff3-7736-4810-816e-7d4516483f67URNIsPartOfScopus: 85169556874URNIsPartOfWOS: 001058567900001URNIsPartOfORCID: /0000-0002-6723-2557/work/153923453DOIIsPartOf10.1007/s42519-023-00341-82024-02-23T23:55:28Z2023-092023-09-01T00:00:00ZHandlehttp://hdl.handle.net/10362/164075http://purl.org/coar/access_right/c_abf2open accessElastic netLASSORegularizationRidge regressionTraceabilityStatistics and ProbabilitySDG 3 - Good Health and Well-beingSDG 14 - Life Below Water704949 bytesFundação para a Ciência e a TecnologiaCenter for Mathematics and Applications6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871Fundação para a Ciência e a TecnologiaCenter for Mathematics and Applications6817 - DCRRNI IDCrossref Funder IDhttp://doi.org/10.13039/501100001871literaturehttp://purl.org/coar/resource_type/c_6501journal articlehttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://run.unl.pt/bitstreams/3876048b-867c-4d9a-840c-992adf229060/download
spellingShingle Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
Yokochi, Clara
Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
Yokochi, Clara
Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
status NEW
subject.fl_str_mv Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
title Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
title_full Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
title_fullStr Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
title_full_unstemmed Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
title_short Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
title_sort Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability
topic Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
topic_facet Elastic net
LASSO
Regularization
Ridge regression
Traceability
Statistics and Probability
SDG 3 - Good Health and Well-being
SDG 14 - Life Below Water
url http://hdl.handle.net/10362/164075
visible 1