Bilingual Lexicon Induction through Unsupervised Machine Translation

Artetxe Zurutuza, Mikel; Labaka Intxauspe, Gorka; Agirre Bengoa, Eneko

dc.contributor.author	Artetxe Zurutuza, Mikel
dc.contributor.author	Labaka Intxauspe, Gorka
dc.contributor.author	Agirre Bengoa, Eneko
dc.date.accessioned	2024-10-16T15:45:36Z
dc.date.available	2024-10-16T15:45:36Z
dc.date.issued	2019
dc.identifier.citation	Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics : 5002-5007 (2019)	es_ES
dc.identifier.uri	http://hdl.handle.net/10810/69979
dc.description.abstract	A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.	es_ES
dc.description.sponsorship	This research was partially supported by the Spanish MINECO (UnsupNMT TIN2017-91692-EXP and DOMINO PGC2018-102041-B-I00, cofunded by EU FEDER), the BigKnowledge project (BBVA foundation grant 2018), the UPV/EHU (excellence research group), and the NVIDIA GPU grant program. Mikel Artetxe was supported by a doctoral grant from the Spanish MECD.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	ACL	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.title	Bilingual Lexicon Induction through Unsupervised Machine Translation	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.rights.holder	(c)2019 The Association for Computational Linguistics, licensed on a Creative Commons Attribution 4.0 International License	es_ES
dc.relation.publisherversion	https://doi.org/10.18653/v1/P19-1494	es_ES
dc.identifier.doi	10.18653/v1/P19-1494
dc.departamentoes	Lenguajes y sistemas informáticos	es_ES
dc.departamentoeu	Hizkuntza eta sistema informatikoak	es_ES

Files in this item

Name:: P19-1494.pdf
Size:: 254.7Kb
Format:: PDF
Description:: Paper

View/Open

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Comunicaciones

Show simple item record