As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We ev...
Developing analytical systems imposes several challenges related not only to the amount and heterogeneity of the involved data but also to the constant need to readapt and evolve to overcome new business challenges. Data are a determinant factor in the success of analytical and decision-making applications, being its nature, availability, and quality, crucial aspects for planning and structuring populating anal...
Given the new requirements of Machine Learning problems in the last years, especially in what concerns the volume, diversity and speed of data, new approaches are needed to deal with the associated challenges. In this paper we describe CEDEs - a distributed learning system that runs on top of an Hadoop cluster and takes advantage of blocks, replication and balancing. CEDEs trains models in a distributed manner ...
Despite major advances in recent years, the field of Machine Learning continues to face research and technical challenges. Mostly, these stem from big data and streaming data, which require models to be frequently updated or re-trained, at the expense of significant computational resources. One solution is the use of distributed learning algorithms, which can learn in a distributed manner, from distributed data...
In many scenarios, such as the ones related to Data Warehousing Extract-Transform-Load (ETL) processes, logging techniques are usually applied for capturing event metrics across system levels for system auditing and system recovery. The diversity of strategies and architectures of the toolset used to support the ETL implementation introduces another layer of complexity, both for system development and audit. Al...