The AHOLABText-to-Speech system for Blizzard Challenge 2021

García Romillo, Víctor; Hernáez Rioja, Inmaculada; Navas Cordón, Eva; García Romillo, Víctor; Hernáez Rioja, Inmaculada; Navas Cordón, Eva

View/Open

Texto completo (415.5Kb)

Date

2021-10-29

Author

García Romillo, Víctor

Hernáez Rioja, Inmaculada

Navas Cordón, Eva

García Romillo, Víctor

Hernáez Rioja, Inmaculada

Navas Cordón, Eva

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

The Blizzard Challenge 2021 : 64-69 (2021)

URI

http://hdl.handle.net/10810/72195

Abstract

In this paper we present the Text-to-Speech synthesis system proposed for the 2021 Blizzard Challenge by Aholab Signal Processing Group. The goal of this challenge is to build a synthetic voice from a provided speech corpus recorded in European Spanish. The challenge comprises two tasks: synthesising text containing only Spanish words and synthesising Spanish texts containing a small number of English words. Our system uses Tacotron-2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms. A Spanish linguistic front-end module was used to transform grapheme sequences into phoneme sequences. In order to improve the robustness of the system andmakethelearning of the alignments in the acoustic model easier, a prior knowledge based loss was added to it. Evaluation shows that our systems had a good performance on both tasks.

Collections

Comunicaciones