Document details

Singing voice resynthesis using concatenative-based techniques

Author(s): Fonseca, Nuno

Date: 2011

Persistent ID: http://hdl.handle.net/10400.8/540

Origin: IC-online

Subject(s): Resynthesis; Singing; Voice


Description

Dissertação submetida à Faculdade de Engenharia da Universidade do Porto para satisfação parcial dos requisitos do grau de doutor em Engenharia Informática.

Singing has an important role in our life, and although synthesizers have been trying to replicate every musical instrument for decades, is was only during the last nine years that commercial singing synthesizers started to appear, allowing the ability to merge music and text, i.e., singing. These solutions may present realistic results on some situations, but they require time consuming processes and experienced users. The goal of this research work is to develop, create or adapt techniques that allow the resynthesis of the singing voice, i.e., allow the user to directly control a singing voice synthesizer using his/her own voice. The synthesizer should be able to replicate, as close as possible, the same melody, same phonetic sequence, and the same musical performance. Initially, some work was developed trying to resynthesize piano recordings with evolutionary approaches, using Genetic Algorithms, where a population of individuals (candidate solutions) representing a sequence of music notes evolved over time, tries to match an original audio stream. Later, the focus would return to the singing voice, exploring techniques as Hidden Markov Models, Neural Network Self Organized Maps, among others. Finally, a Concatenative Unit Selection approach was chosen as the core of a singing voice resynthesis system. By extracting energy, pitch and phonetic information (MFCC, LPC), and using it within a phonetic similarity Viterbi-based Unit Selection System, a sequence of internal sound library frames is chosen to replicate the original audio performance. Although audio artifacts still exist, preventing its use on professional applications, the concept of a new audio tool was created, that presents high potential for future work, not only in singing voice, but in other musical or speech domains.

This dissertation had the kind support of FCT (Portuguese Foundation for Science and Technology, an agency of the Portuguese Ministry for Science, Technology and Higher Education) under grant SFRH / BD / 30300 / 2006, and has been articulated with research project PTDC/SAU-BEB/104995/2008 (Assistive Real-Time Technology in Singing) whose objectives include the development of interactive technologies helping the teaching and learning of singing.

Document Type Doctoral thesis
Language English
Contributor(s) IC-Online
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents