Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation
Bengoetxea Azurmendi, Jaione
MetadataShow full item record
Counter Narratives (CN) are responses to Hate Speech (HS), which include non-negative feedback as well as fact-bound arguments, with the aim of de-escalating potentially hateful debates. However, due to the growing presence of the online world, HS quantity has been exponentially growing, and thus a need for automatic CN generation has been recently deemed necessary to deal with this hateful comments. Consequently, although research on this area has gained considerable interest in recent years, the majority of the studies have been focused on English. That is why the aim of this thesis is to provide some preliminary research on CN generation in Spanish and Basque, for which a HS-CN pair dataset will be used (CONAN). This dataset was Machine Translated (MT) both to Spanish and Basque, and each translated dataset was also manually post-edited. These datasets were used to conduct monolingual as well as crosslingual experiments, all of which were examined in terms of quantitative as well as qualitative evaluations. The results showed that, quantitatively speaking, the model trained with the Spanish post-edited datasets performed the best, while the MT model obtained the best results for Basque, although this outcome was highly influenced by training size. In terms of crosslingual results, the multilingual Basque model seems to slightly improve its monolingual baseline. Furthermore, the qualitative evaluation indicated that automatic metrics did not correlate well with human judgement, as manual evaluation showed a clear preference not only for the Spanish post-edited model, but also for the Basque post-edited experiment. This highlighted the importance of a manual evaluation step in text generation tasks.