Basque and Spanish Multilingual TTS Model for Speech-to-Speech Translation
View/ Open
Date
2023-06-30Author
De Zuazo Oteiza, Xabier
Metadata
Show full item recordAbstract
[EN] Lately, multiple Text-to-Speech models have emerged using Deep Neural networks to
synthesize audio from text. In this work, the state-of-the-art multilingual and
multi-speaker Text-to-Speech model has been trained in Basque, Spanish, Catalan, and
Galician. The research consisted of gathering the datasets, pre-processing their audio and
text data, training the model in the languages in different steps, and evaluating the
results at each point. For the training step, a transfer learning approach has been used
from a model already trained in three languages: English, Portuguese, and French.
Therefore, the final model created here supports a total of seven languages. Moreover,
these models also support zero-shot voice conversion, using an input audio file as a
reference. Finally, a prototype application has been created to do Speech-to-Speech
Translation, putting together the models trained here and other models from the
community. Along the way, some Deep Speech Speech-to-Text models have been
generated for Basque and Galician. [EU] Azkenaldian, Text-to-Speech eredu anitz sortu dira sare neuronal sakonak erabiliz, testutik audioa sintetizatzeko. Lan honetan, state-of-the-art Text-to-Speech eredu
eleaniztun eta hiztun anitzeko eredua landu da euskaraz, gaztelaniaz, katalanez eta
galegoz. Ikerketa honetan datu-multzoak bildu, haien audio- eta testu-datuak aldez
aurretik prozesatu, eredua hizkuntzetan entrenatu da urrats desberdinetan eta emaitzak
puntu bakoitzean ebaluatu dira. Entrenatze-urratserako, ikaskuntza-transferentzia
teknika erabili da dagoeneko hiru hizkuntzatan trebatutako eredu batetik abiatuta:
ingelesa, portugesa eta frantsesa. Beraz, hemen sortutako azken ereduak zazpi hizkuntza
onartzen ditu guztira. Gainera, eredu hauek zero-shot ahots bihurketa ere egiten dute,
sarrerako audio fitxategi bat erreferentzia gisa erabiliz. Azkenik, Speech-to-Speech
Translation egiteko prototipo aplikazio bat sortu da hemen entrenatutako ereduak eta
komunitateko beste eredu batzuk elkartuz. Bide horretan, Deep Speech Speech-to-Text
eredu batzuk sortu dira euskararako eta galegorako.