Show simple item record

dc.contributor.authorArtetxe Zurutuza, Mikel
dc.contributor.authorLabaka Intxauspe, Gorka ORCID
dc.contributor.authorAgirre Bengoa, Eneko ORCID
dc.date.accessioned2024-10-16T18:30:00Z
dc.date.available2024-10-16T18:30:00Z
dc.date.issued2018
dc.identifier.citationProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing : 3632-3642 (2018)es_ES
dc.identifier.urihttp://hdl.handle.net/10810/69988
dc.description.abstractWhile modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further, yielding, for instance, 14.08 and 26.22 BLEU points in WMT 2014 English-German and English-French, respectively, an improvement of more than 7-10 BLEU points over previous unsupervised systems, and closing the gap with supervised SMT (Moses trained on Europarl) down to 2-5 BLEU points. Our implementation is available at https://github.com/artetxem/monoses.es_ES
dc.description.sponsorshipThis research was partially supported by the Spanish MINECO (TUNER TIN2015-65308-C51-R, MUSTER PCIN-2015-226 and TADEEP TIN2015-70214-P, cofunded by EU FEDER), the UPV/EHU (excellence research group), and the NVIDIA GPU grant program. Mikel Artetxe enjoys a doctoral grant from the Spanish MECDes_ES
dc.language.isoenges_ES
dc.publisherACLes_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.titleUnsupervised Statistical Machine Translationes_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.rights.holder(c) 2018 The authors under the Creative Commons Attribution 4.0 International (CC BY 4.0)es_ES
dc.relation.publisherversionhttps://doi.org/10.18653/v1/D18-1399es_ES
dc.identifier.doi10.18653/v1/D18-1399
dc.departamentoesLenguajes y sistemas informáticoses_ES
dc.departamentoeuHizkuntza eta sistema informatikoakes_ES


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

(c) 2018 The authors under the Creative Commons Attribution 4.0 International (CC BY 4.0)
Except where otherwise noted, this item's license is described as (c) 2018 The authors under the Creative Commons Attribution 4.0 International (CC BY 4.0)