Entailment for Zero- and Few-Shot Text Classification in Catalan: Monolingual vs. Multilingual Resources and Task Transfer

Baucells de la Peña, Irene

dc.contributor.advisor	López de Lacalle Lecuona, Oier
dc.contributor.author	Baucells de la Peña, Irene
dc.date.accessioned	2023-06-30T14:49:08Z
dc.date.available	2023-06-30T14:49:08Z
dc.date.issued	2023-06-30
dc.identifier.uri	http://hdl.handle.net/10810/61819
dc.description.abstract	As the field of NLP continues to evolve and expand in industry, new tasks and languages emerge for which task-specific data for fine-tuning is often scarce or unavailable. Against this background, zero- and few-shot methods are gaining ground. However, most of them have typically been studied in the context of English and often rely on leveraging extensive pre-existing resources, raising questions about their applicability to less resourced lan- guages. The current study investigates the application of one of these approaches, based on transforming the target task into a Natural Language Inference (NLI) task and using an NLI (or entailment) model to solve it, in the context of Catalan, a medium-sized language. Specifically, we address a multi-class text classification problem and ask whether (smaller) monolingual resources can compete with (larger) multilingual resources in such framework, experimenting with different combinations of pre-trained language models (LM) and NLI datasets to gain further insight into the contribution of each. In addition, we explore task transfer learning for potential performance improvements. Our results show that the larger size and richness of multilingual NLI datasets, and to a lesser extent the amount of text seen during LM pre-training, are key to the superior performance of multilingual models in the zero-shot setting, yet the monolingual LM seems to gain significance when the task requires a finer-grained classification. In contrast, in the few-shot setting, the weight of the base NLI dataset appears to decrease considerably and the monolingual LM becomes a stronger option. In turn, task transfer learning significantly improves the monolingual re- sults in the zero-shot scenario, but becomes less relevant in the few-shot scenario. Overall, our study demonstrates the potential and limitations of the approach in resource-limited settings, providing insights into the factors influencing the entailment models’ performance and highlighting areas for future improvement.
dc.language.iso	eng	es_ES
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Entailment for Zero- and Few-Shot Text Classification in Catalan: Monolingual vs. Multilingual Resources and Task Transfer	es_ES
dc.type	info:eu-repo/semantics/masterThesis
dc.date.updated	2023-02-09T11:17:24Z
dc.language.rfc3066	es
dc.rights.holder	© 2023, la autora
dc.contributor.degree	Máster Universitario en Análisis y Procesamiento del Lenguaje
dc.contributor.degree	Hizkuntzaren Azterketa eta Prozesamendua Unibertsitate Masterra
dc.identifier.gaurregister	128866-1075449-05	es_ES
dc.identifier.gaurassign	148169-1075449	es_ES

Files in this item

Name:: TFM_Irene_Baucells.pdf
Size:: 3.610Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Máster Universitario en Análisis y Procesamiento del Lenguaje

Show simple item record