Publicação

Pheno-metabolomics: integrative bioinformatics for yeast molecular biotechnology

Detalhes bibliográficos
Resumo:	Pheno-metabolomics is a bioinformatic field of study related with the establishment of links between metabolic data, genotype and phenotype, generated using high-throughput methods. The knowledge obtained in this field has been a major contribution towards the understanding of the vast genetic diversity of Saccharomyces cerevisiae strains that adapted to different ecological niches and are used for most distinct biotechnological applications. Only a holistic approach covering molecular biology, phenotypic characterisation, analytical chemistry, signal processing and bioinformatics could provide detailed information on the vast and dynamical relationships between genomics, phenomics and metabolomics. The main objectives of this thesis are the exploration of genetic, phenotypic and metabolic diversity of a S. cerevisiae strain collection and the assessment of the available bioinformatic and computational approaches for subsequent data fusion. We have constituted a strain collection comprising 172 S. cerevisiae strains of worldwide geographical origins and technological uses (winemaking – commercial and natural isolates –, brewing, bakery, distillery – sake, cachaça –, laboratorial strains and strains from particular environments – pathogenic, isolates from fruits, soil and oak exudates). Their phenotype was screened by considering 30 physiological traits that are important from an oenological point of view. Growth in the presence of potassium bisulphite, growth at 40 °C and resistance to ethanol were the phenotypes that contributed the most to strain variability, as revealed by principal component analysis (PCA). Mann-Whitney test exposed significant associations between phenotypic results and strains technological group. Naïve Bayesian classifier identified three of the 30 phenotypic tests – growth in iprodion (0.05 mg/mL), cycloheximide (0.1 μg/mL) and potassium bisulphite (150 mg/L) –, that provided more information for the assignment of an isolate to the group of commercial strains. Results show the usefulness of computational approaches to simplify strain selection procedures. For subsequent genetic analysis, the usefulness of interdelta sequence amplification for the characterisation of our strain collection was evaluated. Experiments were carried out in two laboratories, using varying combinations of Taq DNA polymerase and thermal cyclers for the analysis of 12 S. cerevisiae strains. Data were obtained by microfluidic electrophoresis and the reproducibility of the technique was evaluated by non-parametric statistical tests. We showed that the source of Taq DNA polymerase and the technical differences between laboratories had the highest impact on reproducibility. We also concluded that the comparative analysis of interdelta patterns was more reliable and reproducible when fragment sizes were compared and when was based on a smaller fraction of bands with intermediate sizes between 100 and 1000 bp. To obtain most reproducible genetic data, 11 polymorphic microsatellites were then used for the characterisation of the 172 S. cerevisiae strains of our collection. Data were computationally related with the previously obtained results of 30 phenotypic tests. We found 280 alleles, whereas microsatellite ScAAT1 contributed the most to intra-strain variability, together with the alleles 20, 9 and 16, from microsatellites ScAAT4, ScAAT5 and ScAAT6, respectively. Computational models were developed and cross-validated to predict the strain’s technological group from the microsatellite allelic profile. Associations between microsatellites and specific phenotypes were scored using information gain ratio, and significant findings were confirmed by permutation tests and estimation of false discovery rates. The phenotypes associated with higher number of alleles were the capacity to resist to sulphur dioxide and the galactosidase activity. Our results demonstrated the capacity of computational modelling to estimate, from microsatellite allelic combinations, both the phenotype and the belonging of a strain to a certain technological group. The genomic constitution of S. cerevisiae was shaped through the action of multiple independent rounds of domestication and microevolutionary changes for the adaptation to environmental conditions. We evaluated genome variations among four isolates of the commercial winemaking strain S. cerevisiae Zymaflore VL1. These isolates were obtained in vineyards surrounding wineries where this strain was applied during several years, and the experiments were accomplished in comparison to the commercial reference strain. Comparative genome hybridization showed amplification of 14 genes among the recovered isolates that were related with mitosis, meiosis, lysine biosynthesis, galactose and asparagine catabolism. The occurrence of microevolutionary changes was supported by DNA sequencing due to the finding of 1198 SNPs and 113 InDels. Phenotypic screening revealed 14 traits that distinguished the recovered isolates from the reference strain which was unable to grow at 18 °C, but evidenced some growth in the presence of CuSO4 (5mM) and SDS 0.01% (v/v). The metabolite profiles revealed differences in the production of succinic acid, benzene ethanol, 2-methyl-1-butanol and isobutanol. Our approaches were then expanded to include also metabolic analysis. Individual must fermentations were performed with the 172 strains and from the combined data of fiber optics spectroscopy, physiological and molecular results, a sub-group of 24 strains was chosen. High-performance liquid chromatography analysis revealed variable results, with glucose, fructose and acetic acid contributing the most for inter-strain variability. Metabolites relevant to aromatic profiles were determined by gas chromatography-mass spectrometry and PCA showed substantial variance between the amounts of alcohols and esters produced. Partial least squares regression (PLS-R) was used in pairwise comparison approaches to predict strains’ metabolic profiles, using phenotypic and genetic data, and relevant associations were identified for 9 of the 24 metabolites. Data were then projected onto a common system of coordinates, revealing a sub-set of 17 statistical relevant multi-dimensional modules (md-modules), combining sets of most-correlated features of noteworthy biological importance. The combination of PLS-R and md-modules identification revealed to be a successful approach for a better understanding of the S. cerevisiae pheno-metabolome.
Autores principais:	Duarte, Ricardo Franco
Assunto:	Ciências Naturais::Ciências Biológicas
Ano:	2014
País:	Portugal
Tipo de documento:	tese de doutoramento
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Pheno-metabolomics is a bioinformatic field of study related with the establishment of links between metabolic data, genotype and phenotype, generated using high-throughput methods. The knowledge obtained in this field has been a major contribution towards the understanding of the vast genetic diversity of Saccharomyces cerevisiae strains that adapted to different ecological niches and are used for most distinct biotechnological applications. Only a holistic approach covering molecular biology, phenotypic characterisation, analytical chemistry, signal processing and bioinformatics could provide detailed information on the vast and dynamical relationships between genomics, phenomics and metabolomics. The main objectives of this thesis are the exploration of genetic, phenotypic and metabolic diversity of a S. cerevisiae strain collection and the assessment of the available bioinformatic and computational approaches for subsequent data fusion. We have constituted a strain collection comprising 172 S. cerevisiae strains of worldwide geographical origins and technological uses (winemaking – commercial and natural isolates –, brewing, bakery, distillery – sake, cachaça –, laboratorial strains and strains from particular environments – pathogenic, isolates from fruits, soil and oak exudates). Their phenotype was screened by considering 30 physiological traits that are important from an oenological point of view. Growth in the presence of potassium bisulphite, growth at 40 °C and resistance to ethanol were the phenotypes that contributed the most to strain variability, as revealed by principal component analysis (PCA). Mann-Whitney test exposed significant associations between phenotypic results and strains technological group. Naïve Bayesian classifier identified three of the 30 phenotypic tests – growth in iprodion (0.05 mg/mL), cycloheximide (0.1 μg/mL) and potassium bisulphite (150 mg/L) –, that provided more information for the assignment of an isolate to the group of commercial strains. Results show the usefulness of computational approaches to simplify strain selection procedures. For subsequent genetic analysis, the usefulness of interdelta sequence amplification for the characterisation of our strain collection was evaluated. Experiments were carried out in two laboratories, using varying combinations of Taq DNA polymerase and thermal cyclers for the analysis of 12 S. cerevisiae strains. Data were obtained by microfluidic electrophoresis and the reproducibility of the technique was evaluated by non-parametric statistical tests. We showed that the source of Taq DNA polymerase and the technical differences between laboratories had the highest impact on reproducibility. We also concluded that the comparative analysis of interdelta patterns was more reliable and reproducible when fragment sizes were compared and when was based on a smaller fraction of bands with intermediate sizes between 100 and 1000 bp. To obtain most reproducible genetic data, 11 polymorphic microsatellites were then used for the characterisation of the 172 S. cerevisiae strains of our collection. Data were computationally related with the previously obtained results of 30 phenotypic tests. We found 280 alleles, whereas microsatellite ScAAT1 contributed the most to intra-strain variability, together with the alleles 20, 9 and 16, from microsatellites ScAAT4, ScAAT5 and ScAAT6, respectively. Computational models were developed and cross-validated to predict the strain’s technological group from the microsatellite allelic profile. Associations between microsatellites and specific phenotypes were scored using information gain ratio, and significant findings were confirmed by permutation tests and estimation of false discovery rates. The phenotypes associated with higher number of alleles were the capacity to resist to sulphur dioxide and the galactosidase activity. Our results demonstrated the capacity of computational modelling to estimate, from microsatellite allelic combinations, both the phenotype and the belonging of a strain to a certain technological group. The genomic constitution of S. cerevisiae was shaped through the action of multiple independent rounds of domestication and microevolutionary changes for the adaptation to environmental conditions. We evaluated genome variations among four isolates of the commercial winemaking strain S. cerevisiae Zymaflore VL1. These isolates were obtained in vineyards surrounding wineries where this strain was applied during several years, and the experiments were accomplished in comparison to the commercial reference strain. Comparative genome hybridization showed amplification of 14 genes among the recovered isolates that were related with mitosis, meiosis, lysine biosynthesis, galactose and asparagine catabolism. The occurrence of microevolutionary changes was supported by DNA sequencing due to the finding of 1198 SNPs and 113 InDels. Phenotypic screening revealed 14 traits that distinguished the recovered isolates from the reference strain which was unable to grow at 18 °C, but evidenced some growth in the presence of CuSO4 (5mM) and SDS 0.01% (v/v). The metabolite profiles revealed differences in the production of succinic acid, benzene ethanol, 2-methyl-1-butanol and isobutanol. Our approaches were then expanded to include also metabolic analysis. Individual must fermentations were performed with the 172 strains and from the combined data of fiber optics spectroscopy, physiological and molecular results, a sub-group of 24 strains was chosen. High-performance liquid chromatography analysis revealed variable results, with glucose, fructose and acetic acid contributing the most for inter-strain variability. Metabolites relevant to aromatic profiles were determined by gas chromatography-mass spectrometry and PCA showed substantial variance between the amounts of alcohols and esters produced. Partial least squares regression (PLS-R) was used in pairwise comparison approaches to predict strains’ metabolic profiles, using phenotypic and genetic data, and relevant associations were identified for 9 of the 24 metabolites. Data were then projected onto a common system of coordinates, revealing a sub-set of 17 statistical relevant multi-dimensional modules (md-modules), combining sets of most-correlated features of noteworthy biological importance. The combination of PLS-R and md-modules identification revealed to be a successful approach for a better understanding of the S. cerevisiae pheno-metabolome.