Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations

Artetxe Zurutuza, Mikel; Labaka Intxauspe, Gorka; Agirre Bengoa, Eneko

Ver/

Texto completo (218.3Kb)

Fecha

2018

Autor

Artetxe Zurutuza, Mikel

Labaka Intxauspe, Gorka

Agirre Bengoa, Eneko

Metadatos

Mostrar el registro completo del ítem

Estadisticas en RECOLECTA
(LA Referencia)

Proceedings of the AAAI Conference on Artificial Intelligence 32(1) : 5012-5019 (2018)

URI

http://hdl.handle.net/10810/70464

Resumen

Using a dictionary to map independently trained word embeddings to a shared space has shown to be an effective approach to learn bilingual word embeddings. In this work, we propose a multi-step framework of linear transformations that generalizes a substantial body of previous work. The core step of the framework is an orthogonal transformation, and existing methods can be explained in terms of the additional normalization, whitening, re-weighting, de-whitening and dimensionality reduction steps. This allows us to gain new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction. The corresponding software is released as an open source project.

Colecciones

Comunicaciones