Publicação
Development of computational tools for the analysis of 2D-nuclear magnetic resonance data
| Resumo: | Metabolomics is one of the omics’ sciences that has been gaining a lot of interest due to its potential on correlating an organism’s biochemical activity and its phenotype. The applications of metabolomics are being extended as new techniques reveal new information on metabolic profiles and molecules, thus elucidating biological, chemical and functional knowledge. The main techniques that collect data are based on mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy. The last one has the advantage of analyzing a sample in vivo without damaging it and while its sensitivity is pointed out as a disadvantage, multidimensional NMR delivers a solution to this issue. It adds layers of information, generating new data that requires advanced bioinformatics methods in order to extract biological meaning. Since multidimensional NMR has different approaches within itself, the need to estab lish an integrated framework that allows a researcher to load its data and extract relevant knowledge has become more imperative over the years. Also, establishing common data analysis pipelines on one-dimensional and multidimensional NMR remains a challenge in current scientific research hindering reproducibility across research groups. In recent work from the host group, specmine, an R package for metabolomics and spectral data analysis/mining, has been developed to wrap and deliver key metabolomic methods that allow a researcher to perform a complete analysis. In this dissertation, tools integrated in specmine were developed to read, visualize and analyze two-dimensional (2D) NMR. A new specmine structure was created for this type of data, easing interpretation and data visualization. In terms of visualization a novel approach towards three-dimensional environments enables users to interact with their data allowing peak hovering or identification of rich resonance regions. The selection of which samples to plot, when the user does not specify an input, is based on a signal-to-noise ratio scale which plots samples with opposite signal-to-noise ratios. A method to perform peak detection on 2D NMR based on local maximum search was implemented to obtain a data structure that best benefits from specmine’s functionalities. These include preprocessing, univariate and multivariate analysis as well as machine learning and feature selection methods. The 2D NMR functions were validated using experimental data from two scientific papers, available on metabolomic databases and applying the necessary preprocessing steps to compare spectra and results. These data originated two case studies from different NMR sources, Bruker and Varian, which reinforces specmine’s flexibility. The case studies were carried out using mainly specmine and other packages for specific processing steps, such as, probabilistic quotient normalization. A pipeline to analyze 2D NMR was added to specmine, in a form of a vignette, to provide a guideline for the newly developed functionalities. |
|---|---|
| Autores principais: | Pereira, Bruno Miguel Marques |
| Assunto: | 2D NMR Metabolomics Multivariate analysis Nuclear Magnetic Resonance (NMR) Univariate analysis Análise multivariada Análise univariada Metabolómica Ressonância Magnética Nuclear (RMN) |
| Ano: | 2021 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Metabolomics is one of the omics’ sciences that has been gaining a lot of interest due to its potential on correlating an organism’s biochemical activity and its phenotype. The applications of metabolomics are being extended as new techniques reveal new information on metabolic profiles and molecules, thus elucidating biological, chemical and functional knowledge. The main techniques that collect data are based on mass spectrometry and nuclear magnetic resonance (NMR) spectroscopy. The last one has the advantage of analyzing a sample in vivo without damaging it and while its sensitivity is pointed out as a disadvantage, multidimensional NMR delivers a solution to this issue. It adds layers of information, generating new data that requires advanced bioinformatics methods in order to extract biological meaning. Since multidimensional NMR has different approaches within itself, the need to estab lish an integrated framework that allows a researcher to load its data and extract relevant knowledge has become more imperative over the years. Also, establishing common data analysis pipelines on one-dimensional and multidimensional NMR remains a challenge in current scientific research hindering reproducibility across research groups. In recent work from the host group, specmine, an R package for metabolomics and spectral data analysis/mining, has been developed to wrap and deliver key metabolomic methods that allow a researcher to perform a complete analysis. In this dissertation, tools integrated in specmine were developed to read, visualize and analyze two-dimensional (2D) NMR. A new specmine structure was created for this type of data, easing interpretation and data visualization. In terms of visualization a novel approach towards three-dimensional environments enables users to interact with their data allowing peak hovering or identification of rich resonance regions. The selection of which samples to plot, when the user does not specify an input, is based on a signal-to-noise ratio scale which plots samples with opposite signal-to-noise ratios. A method to perform peak detection on 2D NMR based on local maximum search was implemented to obtain a data structure that best benefits from specmine’s functionalities. These include preprocessing, univariate and multivariate analysis as well as machine learning and feature selection methods. The 2D NMR functions were validated using experimental data from two scientific papers, available on metabolomic databases and applying the necessary preprocessing steps to compare spectra and results. These data originated two case studies from different NMR sources, Bruker and Varian, which reinforces specmine’s flexibility. The case studies were carried out using mainly specmine and other packages for specific processing steps, such as, probabilistic quotient normalization. A pipeline to analyze 2D NMR was added to specmine, in a form of a vignette, to provide a guideline for the newly developed functionalities. |
|---|