Resultados da pesquisa

Catálogo de Publicações - Todos

Filtros
  1. 1

    Java development platform for real-time applications in multi-core architectures

    Publicação
    por Anjos, José Serafim Gouveia
    The increasing complexity that modern real-time systems are achieving has motivated many software developers to shift from the more traditional real-time used languages like C and ADA to real-time Java. Java in its earliest form seamed like irrelevant for the real-time community until the Real-Time Specification for Java changed that. The RTSJ standard aimed at creating an extension of the “Java Language specification” and “The Java Virtual machine” that allowed the creation of real-time applications using Java. For some time now multi-core architectures have been the answer to the shortcoming showed by single-core processors. The idea to physically have parallel processing seams attractive, but the migration from current sequential processing models to parallel processing ones is not trivial. ARINC 653 (Avionics Application Standard Software Interface) is a software specification for space and time partitioning. It defines an API for software of avionics, following the architecture of Integrated Modular Avionics. It is part of ARINC 600- Series Standards for Digital Aircraft & Flight Simulators. This document presents a view on how it is possible to make the transition from standard real-time sequential processing applications that use more traditional languages to real-time Java applications in a multi-core architecture, by using a practical example. The point will be made by taking an existing real-time C coded application that is a avionic communication software that uses the ARINC 653 standard. The idea is to make a Java version of the application and a parallel processing version and prove that not only it is possible to do so, but it brings several benefits. To support this document it will be used the research that the JEOPARD project is making along with the tools that the project is developing.
    2009 dissertação de mestrado Portugal acesso restrito
  2. 2

    Accelerating the irradiance cache through parallel component-based rendering

    Publicação
    por Debattista, Kurt
    Outros Autores: Santos, Luís Paulo; Chalmers, Alan
    The irradiance cache is an acceleration data structure which caches indirect diffuse samples within the framework of a distributed ray-tracing algorithm. Previously calculated values can be stored and reused in future calculations, resulting in an order of magnitude improvement in computational performance. However, the irradiance cache is a shared data structure and so it is notoriously difficult to parallelise over a distributed parallel system. The hurdle to overcome is when and how to share cached samples. This sharing incurs communication overheads and yet must happen frequently to minimise cache misses and thus maximise the performance of the cache. We present a novel component-based parallel algorithm implemented on a cluster of computers, whereby the indirect diffuse calculations are calculated on a subset of nodes in the cluster. This method exploits the inherent spatial coherent nature of the irradiance cache; by reducing the set of nodes amongst which cached values must be shared, the sharing frequency can be kept high, thus decreasing both communication overheads and cache misses. We demonstrate how our new parallel rendering algorithm significantly outperforms traditional methods of distributing the irradiance cache.
    2006 comunicação em conferência Portugal acesso aberto
  3. 3

    Towards a faster and accurate supertree inference

    Publicação
    por Neves, Diogo Telmo
    Outros Autores: Sobral, João Luís Ferreira
    Phylogenetic inference is one of the most challenging and important problems in computational biology. However, computing evolutionary links on data sets containing only few thousands of taxa easily becomes a daunting task. Moreover, recent advances in next-generation sequencing technologies are turning this problem even much harder, either in terms of complexity or scale. Therefore, phylogenetic inference requires new algorithms and methods to handle the unprecedented growth of biological data. In this paper, we identify several types of parallelism that are available while refining a supertree. We also present four improvements that we made to SuperFine-a state-of-The-Art supertree (meta)method-, which add support: i) to use FastTree as the inference tool; ii) to use a parallel version of FastTree, or RAxML, as the inference tool; iii) to exploit intra-polytomy parallelism within the so-called polytomy refinement phase; and iv) to exploit, at the same time, inter-polytomy and intra-polytomy parallelism within the polytomy refinement phase. Together, these improvements allow an efficient and transparent exploitation of hybrid-polytomy parallelism. Additionally, we pinpoint how future contributions should enhance the performance of such applications. Our studies show groundbreaking results in terms of the achieved speedups, specially when using biological data sets. Moreover, we show that the new parallel strategy-which exploits the hybrid-polytomy parallelism within the polytomy refinement phase-exhibits good scalability, even in the presence of asymmetric sets of tasks. Furthermore, the achieved results show that the radical improvement in performance does not impair tree accuracy, which is a key issue in phylogenetic inferences.
    2016 comunicação em conferência Portugal acesso aberto
  4. 4

    Solving linear equation systems using parallel processing

    Publicação
    por Pais, Jorge C.
    Outros Autores: Delgado, Raimundo
    This paper shows the abilities of the parallel processing in the solution of linear equation systems. The solution of linear equation systems is one of the most time consuming task in the analysis of the structural problem in civil engineering. This is more evident in finite element analysis because the solving phase spends almost the whole time of the analysis. To solve this time consuming it is proposed the use of the parallel processing in the solution of the equation systems. The Gaussian elimination method, the Cholesky factorization method and the Conjugate Gradient iterative method were chosen. For these methods it was analysed the sequential time, the parallel time, the speedup and the efficiency of the parallel algorithm relatively at the sequential algorithm. Parallel times are gotten for 2 to 16 processors because this work was developed in a parallel computer with 16 transputers IMS T800- 20 every one with 2 MBytes of RAM.
    1995 comunicação em conferência Portugal acesso aberto
  5. 5

    Configuring and executing ETL tasks on GRID environments - requirements and specificities

    Publicação
    por Santos, Vasco
    Outros Autores: Oliveira, Bruno; Silva, Rui; Belo, O.
    Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated especially to support business intelligence processes. Nevertheless, in order to maintain a data warehouse up-to-date, data intensive tasks retrieve regularly specialized information from specific preselected information sources, transforming and conforming it accordingly to some specific business requirements provided by decision-makers. Such tasks, commonly named as Extract-Transform-Load (ETL) processes, have a limited time frame window to be executed over an ever increasing amount of data with extremely complex operations. The common approach to deal with the need of more computational power is the acquisition of new and more powerful hardware. This expensive approach disregards the unused computational resources available in desktop computers already present at most enterprises’ computational environments. This paper intends to define a different approach to deal with ETL processes, taking advantage of parallel processing over a GRID environment using XML data as an effective support to data storage and communication, demonstrating that GRID environments could be a real alternative for the implementation of low cost data warehouses.
    2011 comunicação em conferência Portugal acesso restrito
  6. 6

    Análise do rendimento computacional de mecanismos de processamento paralelo suportados pelo OpenCV e C++

    Publicação
    por Mendes, Manuel João Fernandes
    Este projeto tem como objetivo investigar o desempenho de algoritmos de processamento paralelo, com um foco específico no uso da biblioteca Open Computer Vision (OpenCV) como base e a sua implementação em C++. A pesquisa inclui uma ampla gama de métodos de paralelização, desde uma discussão detalhada das técnicas existentes até os desafios e avanços mais recentes na área. A metodologia experimental consistiu na implementação e avaliação dos algoritmos paralelos como Intel Threading Building Blocks (TBB), Open Multi-Processing (OpenMP) e POSIX Thread (Pthread) em diferentes cenários de teste, com a utilização de um conjunto de dados diversos e métricas estatísticas. A implementação detalha o desenvolvimento dos algoritmos paralelos, no qual se descrevem as decisões do projeto, que incluem quando fazer a medição e onde fazer a filtragem das amostras, as otimizações realizadas nos algoritmos como a manipulação do número de threads e a criação de estruturas de dados própria e as considerações específicas da implementação em C++, como a preparação de diferentes builds do OpenCV que permite a adição de diferentes bibliotecas de paralelização. Os resultados obtidos pelos diversos métodos de paralelização revelam ganhos de desempenho significativos, com o método transform em destaque por obter um desempenho superior de 3,43 vezes mais eficiente do que o processamento sequencial para o tamanho médio das imagens. Enquanto imagens muito pequenas podem apresentar perdas devido ao overhead associado à paralelização e também à capacidade do escalonador. A análise dos resultados permite a identificação de técnicas de paralelização mais eficazes para diferentes tamanhos de imagem, o que revela a importância da escolha correta do algoritmo para otimizar o desempenho. As conclusões indicam a necessidade de considerar o tamanho da imagem e as características do hardware ao selecionar métodos de paralelização, e apontam para futuras pesquisas em direção a uma abordagem híbrida que aproveite tanto a Unidade Central de Processamento (CPU) quanto a Unidade de Processamento Gráfico (GPU) para maximizar o desempenho do processamento paralelo.
    2025 dissertação de mestrado Portugal acesso aberto
  7. 7

    Hyperspectral compressive sensing: a comparison of embedded GPU and ARM implementations

    Publicação
    por Nascimento, Jose
    Outros Autores: Véstias, Mário
    Hyperspectral imaging involves the sensing of a large amount of spatial information across several adjacent wavelengths. Typically, hyperspectral images can be represented by a three-dimensional data cube. The collected data cube is extremely large to be transmitted from the satellite/airborne platform to the ground station. Compressive sensing (CS) is an emerging technique that acquire directly the compressed signal instead of acquiring the full data set. This reduces the amount of data that needs to be measured, transmitted and stored in first place. In this paper, a comparison of a CS method implementation for an ARM and for a GPU is conducted. This study takes into account the accuracy, the performance, and the power consumption for both implementations. The 256-cores GPU of a Jetson TX2 board, the dual-core ARM Cortex-A9 of a ZYNQ-7000 SoC FPGA and the quad-core ARM Cortex-A53 of a ZYNQ UltraScale SoC FPGA are the target platforms used for experimental validation. The obtained results indicate that the embedded GPU is faster but uses more power. Therefore, the most appropriate platform depends on the performance and power constraints of the project.
    2019 documento de conferência Portugal acesso restrito
  8. 8

    A many-core co-processor for embedded parallel computing on FPGA

    Publicação
    por José, Wilson
    Outros Autores: Neto, Horácio; Véstias, Mário
    Single processor architectures are unable to provide the required performance of high performance embedded systems. Parallel processing based on general-purpose processors can achieve these performances with a considerable increase of required resources. However, in many cases, simplified optimized parallel cores can be used instead of general-purpose processors achieving better performance at lower resource utilization. In this paper, we propose a configurable many-core architecture to serve as a co-processor for high-performance embedded computing on Field-Programmable Gate Arrays. The architecture consists of an array of configurable simple cores with support for floating-point operations interconnected with a configurable interconnection network. For each core it is possible to configure the size of the internal memory, the supported operations and number of interfacing ports. The architecture was tested in a ZYNQ-7020 FPGA in the execution of several parallel algorithms. The results show that the proposed many-core architecture achieves better performance than that achieved with a parallel generalpurpose processor and that up to 32 floating-point cores can be implemented in a ZYNQ-7020 SoC FPGA.
    2015 documento de conferência Portugal acesso restrito
  9. 9

    A parallel algorithm for statistical multiword term extraction from very large corpora

    Publicação
    por Gonçalves, Carlos
    Outros Autores: Silva, Joaquim F.; Cunha, Jose Alberto C.
    Multi-word Relevant Expressions (REs) can be defined as sequences of words (n-grams) with strong semantic meaning, such as "ice melting" and "Ministere des Affaires Etrangeres", useful in Information Retrieval, Document Clustering or Classification and Indexing of Documents. The need of extracting REs in several languages led research on statistical approaches rather than symbolic methods, since the former allow language-independence. Based on the assumption that REs have strong cohesion between their consecutive n-grams, the LocalMaxs algorithm is a language independent approach that extracts REs. Apart from its good precision, this extractor is time-consuming, being inoperable for Big Data if implemented in a sequential manner. This paper presents the first parallel and distributed version of this algorithm, achieving almost linear speedup and sizeup when processing corpora up to 1 billion words, using up to 54 virtual machines in a public cloud. This parallel version of the algorithm explores the statistical knowledge of the n-grams in the corpus, to promote the locality of the references.
    2015 documento de conferência Portugal acesso restrito
  10. 10

    An n-gram cache for large-scale parallel extraction of multiword relevant expressions with LocalMaxs

    Publicação
    por Gonçalves, Carlos
    Outros Autores: Silva, Joaquim F.; Cunha, José C.
    LocalMaxs extracts relevant multiword terms based on their cohesion but is computationally intensive, a critical issue for very large natural language corpora. The corpus properties concerning n-gram distribution determine the algorithm complexity and were empirically analyzed for corpora up to 982 million words. A parallel LocalMaxs implementation exhibits almost linear relative efficiency, speedup, and sizeup, when executed with up to 48 cloud virtual machines and a distributed key-value store. To reduce the remote data communication, we present a novel n-gram cache with cooperative-based warm-up, leading to reduced miss ratio and time penalty. A cache analytical model is used to estimate the performance of cohesion calculation of n-gram expressions, based on corpus empirical data. The model estimates agree with the real execution results.
    2017 documento de conferência Portugal acesso restrito