5 documents found, page 1 of 1

Sort by Issue Date

Block size, parallelism and predictive performance: finding the sweet spot in d...

Oliveira, Filipe; Carneiro, Davide Rua; Guimarães, Miguel; Oliveira, Óscar; Novais, Paulo

As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We ev...


Overcoming traditional ETL systems architectural problems using a service-orien...

Oliveira, Bruno; Oliveira, Óscar; Belo, Orlando

Developing analytical systems imposes several challenges related not only to the amount and heterogeneity of the involved data but also to the constant need to readapt and evolve to overcome new business challenges. Data are a determinant factor in the success of analytical and decision-making applications, being its nature, availability, and quality, crucial aspects for planning and structuring populating anal...


Dynamic management of distributed machine learning projects

Oliveira, Filipe; Alves, André; Moço, Hugo; Monteiro, José; Oliveira, Óscar; Carneiro, Davide Rua; Novais, Paulo

Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner ...


Predicting model training time to optimize distributed machine learning applica...

Guimarães, Miguel; Carneiro, Davide; Palumbo, Guilherme; Oliveira, Filipe; Oliveira, Óscar; Alves, Victor; Novais, Paulo

Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed data...


An ETL pattern for log configuration and analysis

Oliveira, Bruno; Oliveira, Óscar; Matos, Telmo; Santos, Vasco; Belo, Orlando

In many scenarios, such as the ones related to Data Warehousing Extract-Transform-Load (ETL) processes, logging techniques are usually applied for capturing event metrics across system levels for system auditing and system recovery. The diversity of strategies and architectures of the toolset used to support the ETL implementation introduces another layer of complexity, both for system development and audit. Al...


5 Results

Queried text

Refine Results

Author
















Date




Document Type



Access rights



Resource


Subject