Publicação

Evaluation and development of algorithms and computational tools for metabolic pathway optimization

Detalhes bibliográficos
Resumo:	Metabolic engineering exploits microorganisms to build cell factories, allowing to produce valuable compounds from their enzymatic machinery. It involves the selection of an organism, along with a set of genetic modifications to optimize the process. Information regarding biological mechanisms are scattered among the literature. Metabolic databases provide a centralized platform compiling existing biological data to build a catalog of all known enzymatic transformations across all domains of life. The development of genome-scale metabolic models allows to expose all possible biochemical transformations that an organism can offer. Computer algorithms use these models to exploit the capabilities and limitations of the organisms. Constraint-based modeling approaches allow to predict phenotype given modifications in the network. In recent years, there has been a significant increase in the number of available models, and for certain organisms several models were built. The accuracy of these methods is in many cases dependent on the quality of these models, that is limited to the available information in the literature (or databases). This thesis improves the existing methods by developing better data management strategies for the metabolic modeling community. Metabolic databases are usually the input data for many modeling tools, and the quality of solutions depends on the quality of the databases. Currently, several metabolic databases exist, most of them sharing a common set of information, and there is a need for a centralized system to take the most advantage of their content. However, each database adopts its own naming system to catalog its instances, being in many cases, diffcult to compare with others. An integration pipeline is here designed to fuse metabolic databases into a common namespace allowing better analysis of the entire metabolic catalog across several databases, and exploring different methods to reconcile the metabolites and reactions included in these databases. In a second part of this work, the Systems Biology Markup Language which is the most common medium to store and represent genome-scale metabolic models is analyzed. Like databases, models also adopt unique nomenclatures for reactions and compounds. Here, methods to annotate metabolites and reactions in models are developed allowing to connect models with database instances, thus allowing to adopt a single naming system for their entities. The purpose of the methods is to standardize the entire model, therefore, other entities such as, genes, compartments, simulation media, are also considered to unify these models. The standardization methods were implemented in the KBase platform, which allows to improve the compatibility of this system with models built from external tools. In the last part of this thesis, the pathway enumeration problem is revisited. Synthetic biology explores cellular modi cations to produce valuable products by inserting enzymatic capabilities of other organisms. The selection of suitable set of genes is highly combinatorial, since in many cases there are several alternatives to reach the target product. A common limitation of most of the existing methods is the inability to fully explore this combinatorial space. In this work, the (hyper)graph methods are analyzed and improved to fully enumerate biological pathways. As result, two existing algorithms were improved regarding to scalability, allowing to fully enumerate larger solution sets.
Autores principais:	Liu, Filipe Alexandre Wang
Assunto:	Ciências Naturais::Ciências da Computação e da Informação
Ano:	2018
País:	Portugal
Tipo de documento:	tese de doutoramento
Tipo de acesso:	acesso aberto
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Metabolic engineering exploits microorganisms to build cell factories, allowing to produce valuable compounds from their enzymatic machinery. It involves the selection of an organism, along with a set of genetic modifications to optimize the process. Information regarding biological mechanisms are scattered among the literature. Metabolic databases provide a centralized platform compiling existing biological data to build a catalog of all known enzymatic transformations across all domains of life. The development of genome-scale metabolic models allows to expose all possible biochemical transformations that an organism can offer. Computer algorithms use these models to exploit the capabilities and limitations of the organisms. Constraint-based modeling approaches allow to predict phenotype given modifications in the network. In recent years, there has been a significant increase in the number of available models, and for certain organisms several models were built. The accuracy of these methods is in many cases dependent on the quality of these models, that is limited to the available information in the literature (or databases). This thesis improves the existing methods by developing better data management strategies for the metabolic modeling community. Metabolic databases are usually the input data for many modeling tools, and the quality of solutions depends on the quality of the databases. Currently, several metabolic databases exist, most of them sharing a common set of information, and there is a need for a centralized system to take the most advantage of their content. However, each database adopts its own naming system to catalog its instances, being in many cases, diffcult to compare with others. An integration pipeline is here designed to fuse metabolic databases into a common namespace allowing better analysis of the entire metabolic catalog across several databases, and exploring different methods to reconcile the metabolites and reactions included in these databases. In a second part of this work, the Systems Biology Markup Language which is the most common medium to store and represent genome-scale metabolic models is analyzed. Like databases, models also adopt unique nomenclatures for reactions and compounds. Here, methods to annotate metabolites and reactions in models are developed allowing to connect models with database instances, thus allowing to adopt a single naming system for their entities. The purpose of the methods is to standardize the entire model, therefore, other entities such as, genes, compartments, simulation media, are also considered to unify these models. The standardization methods were implemented in the KBase platform, which allows to improve the compatibility of this system with models built from external tools. In the last part of this thesis, the pathway enumeration problem is revisited. Synthetic biology explores cellular modi cations to produce valuable products by inserting enzymatic capabilities of other organisms. The selection of suitable set of genes is highly combinatorial, since in many cases there are several alternatives to reach the target product. A common limitation of most of the existing methods is the inability to fully explore this combinatorial space. In this work, the (hyper)graph methods are analyzed and improved to fully enumerate biological pathways. As result, two existing algorithms were improved regarding to scalability, allowing to fully enumerate larger solution sets.