The AHOLABText-to-Speech system for Blizzard Challenge 2021
View/ Open
Date
2021-10-29Author
García Romillo, Víctor
García Romillo, Víctor
Metadata
Show full item record
The Blizzard Challenge 2021 : 64-69 (2021)
Abstract
In this paper we present the Text-to-Speech synthesis system proposed for the 2021 Blizzard Challenge by Aholab Signal Processing Group. The goal of this challenge is to build a synthetic voice from a provided speech corpus recorded in European Spanish. The challenge comprises two tasks: synthesising text containing only Spanish words and synthesising Spanish texts containing a small number of English words. Our system uses Tacotron-2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms. A Spanish linguistic front-end module was used to transform grapheme sequences into phoneme sequences. In order to improve the robustness of the system andmakethelearning of the alignments in the acoustic model easier, a prior knowledge based loss was added to it. Evaluation shows that our systems had a good performance on both tasks.