Show simple item record

dc.contributor.authorArtetxe Zurutuza, Mikel
dc.contributor.authorLabaka Intxauspe, Gorka ORCID
dc.contributor.authorAgirre Bengoa, Eneko ORCID
dc.date.accessioned2024-10-16T15:45:36Z
dc.date.available2024-10-16T15:45:36Z
dc.date.issued2019
dc.identifier.citationProceedings of the 57th Annual Meeting of the Association for Computational Linguistics : 5002-5007 (2019)es_ES
dc.identifier.urihttp://hdl.handle.net/10810/69979
dc.description.abstractA recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.es_ES
dc.description.sponsorshipThis research was partially supported by the Spanish MINECO (UnsupNMT TIN2017-91692-EXP and DOMINO PGC2018-102041-B-I00, cofunded by EU FEDER), the BigKnowledge project (BBVA foundation grant 2018), the UPV/EHU (excellence research group), and the NVIDIA GPU grant program. Mikel Artetxe was supported by a doctoral grant from the Spanish MECD.es_ES
dc.language.isoenges_ES
dc.publisherACLes_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.titleBilingual Lexicon Induction through Unsupervised Machine Translationes_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.rights.holder(c)2019 The Association for Computational Linguistics, licensed on a Creative Commons Attribution 4.0 International Licensees_ES
dc.relation.publisherversionhttps://doi.org/10.18653/v1/P19-1494es_ES
dc.identifier.doi10.18653/v1/P19-1494
dc.departamentoesLenguajes y sistemas informáticoses_ES
dc.departamentoeuHizkuntza eta sistema informatikoakes_ES


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

(c)2019 The Association for Computational Linguistics, licensed on a Creative Commons Attribution 4.0 International License
Except where otherwise noted, this item's license is described as (c)2019 The Association for Computational Linguistics, licensed on a Creative Commons Attribution 4.0 International License