Towards zero-shot cross-lingual named entity disambiguation

Barrena Madinabeitia, Ander; Soroa Echave, Aitor; Agirre Bengoa, Eneko

dc.contributor.author	Barrena Madinabeitia, Ander
dc.contributor.author	Soroa Echave, Aitor
dc.contributor.author	Agirre Bengoa, Eneko
dc.date.accessioned	2021-12-15T09:24:55Z
dc.date.available	2021-12-15T09:24:55Z
dc.date.issued	2021-12-01
dc.identifier.citation	Expert Systems with Applications 184 : (2021) // Article ID 115542	es_ES
dc.identifier.issn	0957-4174
dc.identifier.issn	1873-6793
dc.identifier.uri	http://hdl.handle.net/10810/54482
dc.description.abstract	[EN]In cross-Lingual Named Entity Disambiguation (XNED) the task is to link Named Entity mentions in text in some native language to English entities in a knowledge graph. XNED systems usually require training data for each native language, limiting their application for low resource languages with small amounts of training data. Prior work have proposed so-called zero-shot transfer systems which are only trained in English training data, but required native prior probabilities of entities with respect to mentions, which had to be estimated from native training examples, limiting their practical interest. In this work we present a zero-shot XNED architecture where, instead of a single disambiguation model, we have a model for each possible mention string, thus eliminating the need for native prior probabilities. Our system improves over prior work in XNED datasets in Spanish and Chinese by 32 and 27 points, and matches the systems which do require native prior information. We experiment with different multilingual transfer strategies, showing that better results are obtained with a purpose-built multilingual pre-training method compared to state-of-the-art generic multilingual models such as XLM-R. We also discovered, surprisingly, that English is not necessarily the most effective zero-shot training language for XNED into English. For instance, Spanish is more effective when training a zero-shot XNED system that dis-ambiguates Basque mentions with respect to an English knowledge graph.	es_ES
dc.description.sponsorship	This work has been partially funded by the Basque Government (IXA excellence research group (IT1343-19) and DeepText project), Project BigKnowledge (Ayudas Fundacion BBVA a equipos de investigacion cientifica 2018) and via the IARPA BETTER Program contract 2019-19051600006 (ODNI, IARPA activity). Ander Barrena enjoys a post-doctoral grant ESPDOC18/101 from the UPV/EHU and also acknowledges the support of the NVIDIA Corporation with the donation of a Titan V GPU used for this research. The author thankfully acknowledges the computer resources at CTE-Power9 + V100 and technical support provided by Barcelona Supercomputing Center (RES-IM-2020-1-0020).	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	cross-lingual named entity disambiguation	es_ES
dc.subject	cross-lingual entity linking	es_ES
dc.subject	zero-shot learning	es_ES
dc.subject	transfer learning	es_ES
dc.subject	pre-trained language models	es_ES
dc.subject	low-resource languages	es_ES
dc.title	Towards zero-shot cross-lingual named entity disambiguation	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	© 2021 The Author(s). This is an open access article under the CC BY-NC-ND licens	es_ES
dc.rights.holder	Atribución-NoComercial-SinDerivadas 3.0 España	*
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S0957417421009490?via%3Dihub	es_ES
dc.identifier.doi	10.1016/j.eswa.2021.115542
dc.departamentoes	Ciencia de la computación e inteligencia artificial	es_ES
dc.departamentoes	Lenguajes y sistemas informáticos	es_ES
dc.departamentoeu	Hizkuntza eta sistema informatikoak	es_ES
dc.departamentoeu	Konputazio zientziak eta adimen artifiziala	es_ES

Files in this item

Name:: 1-s2.0-S0957417421009490-main.pdf
Size:: 1.042Mb
Format:: PDF
Description:: Artículo

View/Open

Name:: license_rdf
Size:: 811bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record