Publicação
Improving auditing and annotation of DNA barcode reference libraries of animal COI sequences for Molecular Ecology applications
| Resumo: | The uncovering and description of Earth’s biodiversity constitute an ongoing and incomplete chapter of the scientific endeavour. Fortunately, biodiversity studies have been greatly benefiting from molecular tools, such as DNA (meta)barcoding, which provide efficient identification tools for biomonitoring and conservation programmes. The accuracy of species-level assignments, and the taxonomic span of the identifications, relies on comprehensive DNA barcode reference libraries. However, the occurrence of accidental errors in libraries’ records may compromise the accuracy of species’ assignments, including the fortuitous operational flaws in the generation of the barcodes, the eventual taxonomic uncertainty or the occurrence of undescribed diversity. This study describes a web-accessible R-based application - BAGS (Barcode, Audit & Grade System) - that performs automated auditing and annotation of cytochrome c oxidase subunit I (COI) sequences libraries, retrieved from the Barcode of Life Data System (BOLD), for a given taxonomic group of animals. Several initial quality-filtering steps are implemented, as well as the optional filtering of species by their presence in marine and non-marine habitats. This is followed by the auditing and sorting of the barcode records for each species in the library, according to five qualitative grades (A to E) that depend on the attributes of the data and congruency of species names with sequences clustered in Barcode Index Numbers (BINs). Finally, BAGS’ reporting tool allows researchers to quickly audit and set aside the most useful and reliable data from the reference libraries, highlighting and segregating records according to their congruency. To verify BAGS’ performance and accuracy in grade annotation, successful tests were carried out in three large datasets comprising a) marine fishes of the world, b) Chironomidae of Europe (Insecta), and c) marine Amphipoda of the world (Crustacea). BAGS has the potential to fulfil a significant gap in the current landscape of DNA barcoding research tools by quickly screening reference libraries to gauge the congruence status of data and facilitate the triage of ambiguous data for posterior review. Thereby, BAGS may become a valuable addition in forthcoming DNA (meta)barcoding studies, in the long term contributing to globally improve the quality and reliability of the public reference libraries. |
|---|---|
| Autores principais: | Fontes, João Tadeu Silva |
| Assunto: | DNA barcoding DNA metabarcoding Reference libraries BOLD R Bibliotecas de referência |
| Ano: | 2020 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | The uncovering and description of Earth’s biodiversity constitute an ongoing and incomplete chapter of the scientific endeavour. Fortunately, biodiversity studies have been greatly benefiting from molecular tools, such as DNA (meta)barcoding, which provide efficient identification tools for biomonitoring and conservation programmes. The accuracy of species-level assignments, and the taxonomic span of the identifications, relies on comprehensive DNA barcode reference libraries. However, the occurrence of accidental errors in libraries’ records may compromise the accuracy of species’ assignments, including the fortuitous operational flaws in the generation of the barcodes, the eventual taxonomic uncertainty or the occurrence of undescribed diversity. This study describes a web-accessible R-based application - BAGS (Barcode, Audit & Grade System) - that performs automated auditing and annotation of cytochrome c oxidase subunit I (COI) sequences libraries, retrieved from the Barcode of Life Data System (BOLD), for a given taxonomic group of animals. Several initial quality-filtering steps are implemented, as well as the optional filtering of species by their presence in marine and non-marine habitats. This is followed by the auditing and sorting of the barcode records for each species in the library, according to five qualitative grades (A to E) that depend on the attributes of the data and congruency of species names with sequences clustered in Barcode Index Numbers (BINs). Finally, BAGS’ reporting tool allows researchers to quickly audit and set aside the most useful and reliable data from the reference libraries, highlighting and segregating records according to their congruency. To verify BAGS’ performance and accuracy in grade annotation, successful tests were carried out in three large datasets comprising a) marine fishes of the world, b) Chironomidae of Europe (Insecta), and c) marine Amphipoda of the world (Crustacea). BAGS has the potential to fulfil a significant gap in the current landscape of DNA barcoding research tools by quickly screening reference libraries to gauge the congruence status of data and facilitate the triage of ambiguous data for posterior review. Thereby, BAGS may become a valuable addition in forthcoming DNA (meta)barcoding studies, in the long term contributing to globally improve the quality and reliability of the public reference libraries. |
|---|