Show simple item record

dc.contributor.advisorLópez de Lacalle Lecuona, Oier ORCID
dc.contributor.authorBaucells de la Peña, Irene
dc.date.accessioned2023-06-30T14:49:08Z
dc.date.available2023-06-30T14:49:08Z
dc.date.issued2023-06-30
dc.identifier.urihttp://hdl.handle.net/10810/61819
dc.description.abstractAs the field of NLP continues to evolve and expand in industry, new tasks and languages emerge for which task-specific data for fine-tuning is often scarce or unavailable. Against this background, zero- and few-shot methods are gaining ground. However, most of them have typically been studied in the context of English and often rely on leveraging extensive pre-existing resources, raising questions about their applicability to less resourced lan- guages. The current study investigates the application of one of these approaches, based on transforming the target task into a Natural Language Inference (NLI) task and using an NLI (or entailment) model to solve it, in the context of Catalan, a medium-sized language. Specifically, we address a multi-class text classification problem and ask whether (smaller) monolingual resources can compete with (larger) multilingual resources in such framework, experimenting with different combinations of pre-trained language models (LM) and NLI datasets to gain further insight into the contribution of each. In addition, we explore task transfer learning for potential performance improvements. Our results show that the larger size and richness of multilingual NLI datasets, and to a lesser extent the amount of text seen during LM pre-training, are key to the superior performance of multilingual models in the zero-shot setting, yet the monolingual LM seems to gain significance when the task requires a finer-grained classification. In contrast, in the few-shot setting, the weight of the base NLI dataset appears to decrease considerably and the monolingual LM becomes a stronger option. In turn, task transfer learning significantly improves the monolingual re- sults in the zero-shot scenario, but becomes less relevant in the few-shot scenario. Overall, our study demonstrates the potential and limitations of the approach in resource-limited settings, providing insights into the factors influencing the entailment models’ performance and highlighting areas for future improvement.
dc.language.isoenges_ES
dc.rightsinfo:eu-repo/semantics/openAccess
dc.titleEntailment for Zero- and Few-Shot Text Classification in Catalan: Monolingual vs. Multilingual Resources and Task Transferes_ES
dc.typeinfo:eu-repo/semantics/masterThesis
dc.date.updated2023-02-09T11:17:24Z
dc.language.rfc3066es
dc.rights.holder© 2023, la autora
dc.contributor.degreeMáster Universitario en Análisis y Procesamiento del Lenguaje
dc.contributor.degreeHizkuntzaren Azterketa eta Prozesamendua Unibertsitate Masterra
dc.identifier.gaurregister128866-1075449-05es_ES
dc.identifier.gaurassign148169-1075449es_ES


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record