A uniform phase representation for the harmonic model in speech synthesis applications

Degottex, Gilles; Erro Eslava, Daniel

dc.contributor.author	Degottex, Gilles
dc.contributor.author	Erro Eslava, Daniel
dc.date.accessioned	2015-10-16T12:55:51Z
dc.date.available	2015-10-16T12:55:51Z
dc.date.issued	2014-10-16
dc.identifier.citation	Journal on Audio, Speech and Music Processing 2014 : (2014) // Article ID 38	es
dc.identifier.issn	1687-4722
dc.identifier.uri	http://hdl.handle.net/10810/15924
dc.description.abstract	Feature-based vocoders, e.g., STRAIGHT, offer a way to manipulate the perceived characteristics of the speech signal in speech transformation and synthesis. For the harmonic model, which provide excellent perceived quality, features for the amplitude parameters already exist (e.g., Line Spectral Frequencies (LSF), Mel-Frequency Cepstral Coefficients (MFCC)). However, because of the wrapping of the phase parameters, phase features are more difficult to design. To randomize the phase of the harmonic model during synthesis, a voicing feature is commonly used, which distinguishes voiced and unvoiced segments. However, voice production allows smooth transitions between voiced/unvoiced states which makes voicing segmentation sometimes tricky to estimate. In this article, two-phase features are suggested to represent the phase of the harmonic model in a uniform way, without voicing decision. The synthesis quality of the resulting vocoder has been evaluated, using subjective listening tests, in the context of resynthesis, pitch scaling, and Hidden Markov Model (HMM)-based synthesis. The experiments show that the suggested signal model is comparable to STRAIGHT or even better in some scenarios. They also reveal some limitations of the harmonic framework itself in the case of high fundamental frequencies.	es
dc.description.sponsorship	G. Degottex has been funded by the Swiss National Science Foundation (SNSF) (grants PBSKP2_134325, PBSKP2_140021), Switzerland, and the Foundation for Research and Technology-Hellas (FORTH), Heraklion, Greece. D. Erro has been funded by the Basque Government (BER2TEK, IE12-333) and the Spanish Ministry of Economy and Competitiveness (SpeechTech4All, TEC2012-38939-C03-03).	es
dc.language.iso	eng	es
dc.publisher	Springer International Publishing	es
dc.rights	info:eu-repo/semantics/openAccess	es
dc.subject	speech synthesis	es
dc.subject	harmonic model	es
dc.subject	Phase modeling	es
dc.subject	voice transformation	es
dc.subject	parametric speech synthesis	es
dc.subject	group delay functions	es
dc.subject	spectral envelope	es
dc.subject	time-scale	es
dc.subject	vocoder	es
dc.subject	HMM	es
dc.subject	extraction	es
dc.subject	instants	es
dc.subject	sounds	es
dc.subject	audio	es
dc.subject	wave	es
dc.title	A uniform phase representation for the harmonic model in speech synthesis applications	es
dc.type	info:eu-repo/semantics/article	es
dc.rights.holder	© 2014 Degottex and Erro; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited	es
dc.relation.publisherversion	http://www.asmp.eurasipjournals.com/content/2014/1/38/abstract	es
dc.identifier.doi	10.1186/s13636-014-0038-1
dc.departamentoes	Ingeniería de comunicaciones	es_ES
dc.departamentoeu	Komunikazioen ingeniaritza	es_ES
dc.subject.categoria	ACOUSTICS
dc.subject.categoria	ELECTRICAL AND ELECTRONIC ENGINEERING

Ficheros en el ítem

Nombre:: s13636-014-0038-1-1.pdf
Tamaño:: 2.815Mb
Formato:: PDF

Ver/

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos

Mostrar el registro sencillo del ítem