Few-shot Learning for Argumentation in the Medical Domain

Manzanal Martín, Jon

dc.contributor.advisor	Agerri Gascón, Rodrigo
dc.contributor.author	Manzanal Martín, Jon
dc.date.accessioned	2023-06-30T14:54:00Z
dc.date.available	2023-06-30T14:54:00Z
dc.date.issued	2023-06-30
dc.identifier.uri	http://hdl.handle.net/10810/61823
dc.description.abstract	[EU] Azkenaldian, arlo medikoan arreta handiagoa jarri da Adimen Artifizialarekin lotutako tekniketan, medikuei galderak errazago eta azkarrago ebazten laguntzeko. Hori bereziki garrantzitsua da Ebidentzian Oinarritutako Medikuntzaren arloan, medikuek egituratu gabeko informazio asko erabili behar baitute erabakiak garaiz hartu ahal izateko. Testuinguru horretan, Argumentu-Meatzaritzak lagundu egiten du argudio-osagaiak eta haien arteko harremanak identifikatzen, deliberazio-prozesuak eta azalpen medikoak dituzten testuetan. Argumentu-Meatzaritzari buruzko lanen corpus nahiko ona dagoen arren, datu-multzo gehienak ingeleserako garatu dira, eta gaur egun bat bakarrik dago eremu medikorako. Eskura ditugun datu idatzien falta hori dela eta, tesi honetan prompting eta fine-tuning teknikak aztertuko ditugu, few-shot ingurune batean ingelesa ez den beste hizkuntza baterako eremu medikoan argumentu-meatzaritza egiteko estrategiarik onena ezartzeko. Gure emaitzek enpirikoki frogatzen dute few-shot prompting bidez sekuentziak etiketatzeko metodoak oso sentikorrak direla entrenamendu-datuak sortzeko erabilitako laginketa-metodoarekiko. Izan ere, eta argitaratutakoaren kontra, datuen laginketa alternatibo baten ondorioz, fine-tuning metodoek few-shot ebaluatzeko inguruneetako prompting teknikak gainditzen dituzte. Zehatzago esanda, arlo medikoan Argumentu-Meatzaritzarako entrenamendu-datuen %40 nahikoa da state-of-the-arten emaitzak lortzeko. Gainera, entrenamendu-datuen %10-20 soilik erabiltzeak (hau da, pertsona bakoitzak 15 orduz eskuz etiketatuta lan egiteak) oso errendimendulehiakorra lortzeko aukera ematen du.
dc.description.abstract	[EN] In recent times, in the medical field, more attention has been paid to techniques related to Artificial Intelligence to support doctors to solve questions in a simpler and faster way. This is particularly relevant in the field of Evidence-based Medicine, since doctors need to deal with a lot of unstructured information to be able to take timely decisions. In this context, Argument Mining helps to identify argumentative components and the relations between them in texts containing medical deliberation and explanatory processes. Although there is a relatively good body of work on Argument Mining, the large majority of datasets have been developed for English, and only one currently exists for the medical domain. Due to this lack of available annotated data, in this thesis we explore prompting and fine-tuning techniques to establish the best strategy to perform argument mining in the medical domain for a target language different to English in a few-shot setting. Our results empirically demonstrate that few-shot prompting approaches for sequence labelling are highly sensitive to the sampling method used to generate the training data. In fact, and contrary to published work, we show that an alternative data sampling results in fine-tuning methods outperforming prompting techniques in few-shot evaluation settings. More specifically, we establish that 40% of the training data for Argument Mining in the medical domain is enough to obtain state-of-the-art results. Furthermore, using just 10-20% of the training data (which amounts to 15 hours of manual labelling work per person) allows to obtain highly competitive performance.
dc.language.iso	eng	es_ES
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Few-shot Learning for Argumentation in the Medical Domain	es_ES
dc.type	info:eu-repo/semantics/masterThesis
dc.date.updated	2023-02-09T11:17:24Z
dc.language.rfc3066	es
dc.rights.holder	© 2023, el autor
dc.contributor.degree	Máster Universitario en Análisis y Procesamiento del Lenguaje
dc.contributor.degree	Hizkuntzaren Azterketa eta Prozesamendua Unibertsitate Masterra
dc.identifier.gaurregister	128865-882077-05	es_ES
dc.identifier.gaurassign	148195-882077	es_ES

Files in this item

Name:: TFM-JonManzanalMartin.pdf
Size:: 640.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Máster Universitario en Análisis y Procesamiento del Lenguaje

Show simple item record