dc.contributor.author | Zubiaga Amar, Irune | |
dc.contributor.author | Justo Blanco, Raquel | |
dc.contributor.author | De Velasco Vázquez, Mikel | |
dc.contributor.author | Torres Barañano, María Inés | |
dc.date.accessioned | 2023-01-24T18:10:09Z | |
dc.date.available | 2023-01-24T18:10:09Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Proceedings of IberSPEECH : 186-190 (2022) | es_ES |
dc.identifier.uri | http://hdl.handle.net/10810/59459 | |
dc.description.abstract | Emotion recognition from speech is an active field of study that can help build more natural human-machine interaction systems. Even though the advancement of deep learning technology has brought improvements in this task, it is still a very challenging field. For instance, when considering real life scenarios, things such as tendency toward neutrality or the ambiguous definition of emotion can make labeling a difficult task causing the data-set to be severally imbalanced and not very representative. In this work we considered a real life scenario to carry out a series of emotion classification experiments. Specifically, we worked with a labeled corpus consisting of a set of audios from Spanish TV debates and their respective transcriptions. First, an analysis of the emotional information within the corpus was conducted. Then different data representations were analyzed as to choose the best one for our task; Spectrograms and UniSpeech-SAT were used for audio representation and DistilBERT for text representation. As a final step, Multimodal Machine Learning was used with the aim of improving the obtained classification results by combining acoustic and textual information. | es_ES |
dc.description.sponsorship | The research presented in this paper was conducted as part of the AMIC PdC project, which received funding from the Spanish Ministry of Science under grants TIN2017-85854-C4- 3-R, PID2021-126061OB-C42 and PDC2021-120846-C43 and
it was also partially funded by the European Union’s Horizon 2020 research and innovation program under grant agreement
No. 823907 (MENHIR). | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | ISCA | es_ES |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/823907 | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/TIN2017-85854-C4-3- R | es_ES |
dc.rights | info:eu-repo/semantics/openAccess | es_ES |
dc.subject | Acoustic Signal | es_ES |
dc.subject | Textual Information | es_ES |
dc.subject | Multi- modal Machine Learning | es_ES |
dc.subject | Emotion Recognition | es_ES |
dc.title | Speech emotion recognition in Spanish TV Debates | es_ES |
dc.type | info:eu-repo/semantics/conferenceObject | es_ES |
dc.rights.holder | (c) 2022 ISCA | es_ES |
dc.relation.publisherversion | https://www.isca-speech.org/archive/smm_2022/zubiaga22_smm.html | es_ES |
dc.identifier.doi | 10.21437/IberSPEECH.2022-38 | |
dc.contributor.funder | European Commission | |
dc.departamentoes | Ciencia de la computación e inteligencia artificial | es_ES |
dc.departamentoeu | Konputazio zientziak eta adimen artifiziala | es_ES |