Publicação
Development of algorithms for the analysis and data mining of chemical compound prices
| Resumo: | Nowadays, the products deriving from the biotechnology industry have become quite valu able in the world market. Hence, it is highly advantageous to find out how the prices of the different chemical compounds needed for biotechnological processes behave in the bioeconomy. The SISBI project was developed to allow the retrieval and collection of different prices associated with certain chemical compounds through different available sources and databases. With access to this information, some behaviours and patterns can be detected in the price variations, indicating other relevant knowledge, such as the biotechnological interest of this compound in the field. However, it is necessary to take into account that SISBI data, although relevant, have inconsistencies that do not support an efficient analysis of these data, which is the case for the existence of duplicates, different units and problems in the price integration. As a result, this study developed algorithms to identify and solve these problems and to analyze the prices of compounds through time series. To effectively evaluate these data, a new database, bioanalysis, was built based on the data from the SISBI project. Then, several preprocessing methods were applied, including the elimination of duplicates, conversion of units, removal of defective and inconsistent prices, which led to the solution of the various complications encountered. Consequently, once the data was prepared for analysis, the prices pertaining to two specific metabolites, 4-aminopyridine and methane, were examined. Thus, different price variations over time were compared between different configurations (quantity + unit) of the same metabolite and between different metabolites. These variations were divided by the different price providers to identify any specific relationship or pattern depending on where the data originate. However, in this study, no particularly cheap provider was detected between 4-aminopyridine configurations or between the two metabolites. The only association found occurred only between certain methane configurations. In addition, the price variations analyzed are mostly constant, and when they are not, they do not show any pattern or seasonality. These results revealed that, using only the prices available to date, no correlation was determined by identifying the providers associated with low prices when comparing different metabolites or configurations. |
|---|---|
| Autores principais: | Faria, Sofia Maria Alves |
| Assunto: | Biotechnology Chemical compounds Time series Algorithms Preprocessing Biotecnologia Compostos químicos Series temporais Algoritmos Pré- processamento Engenharia e Tecnologia::Biotecnologia Industrial |
| Ano: | 2020 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Nowadays, the products deriving from the biotechnology industry have become quite valu able in the world market. Hence, it is highly advantageous to find out how the prices of the different chemical compounds needed for biotechnological processes behave in the bioeconomy. The SISBI project was developed to allow the retrieval and collection of different prices associated with certain chemical compounds through different available sources and databases. With access to this information, some behaviours and patterns can be detected in the price variations, indicating other relevant knowledge, such as the biotechnological interest of this compound in the field. However, it is necessary to take into account that SISBI data, although relevant, have inconsistencies that do not support an efficient analysis of these data, which is the case for the existence of duplicates, different units and problems in the price integration. As a result, this study developed algorithms to identify and solve these problems and to analyze the prices of compounds through time series. To effectively evaluate these data, a new database, bioanalysis, was built based on the data from the SISBI project. Then, several preprocessing methods were applied, including the elimination of duplicates, conversion of units, removal of defective and inconsistent prices, which led to the solution of the various complications encountered. Consequently, once the data was prepared for analysis, the prices pertaining to two specific metabolites, 4-aminopyridine and methane, were examined. Thus, different price variations over time were compared between different configurations (quantity + unit) of the same metabolite and between different metabolites. These variations were divided by the different price providers to identify any specific relationship or pattern depending on where the data originate. However, in this study, no particularly cheap provider was detected between 4-aminopyridine configurations or between the two metabolites. The only association found occurred only between certain methane configurations. In addition, the price variations analyzed are mostly constant, and when they are not, they do not show any pattern or seasonality. These results revealed that, using only the prices available to date, no correlation was determined by identifying the providers associated with low prices when comparing different metabolites or configurations. |
|---|