Principled Paraphrase Generation with Parallel Corpora

Ormazabal Oregi, Aitor; Artetxe Zurutuza, Mikel; Soroa Echave, Aitor; Labaka Intxauspe, Gorka; Agirre Bengoa, Eneko

dc.contributor.author	Ormazabal Oregi, Aitor
dc.contributor.author	Artetxe Zurutuza, Mikel
dc.contributor.author	Soroa Echave, Aitor
dc.contributor.author	Labaka Intxauspe, Gorka
dc.contributor.author	Agirre Bengoa, Eneko
dc.date.accessioned	2024-10-15T18:00:03Z
dc.date.available	2024-10-15T18:00:03Z
dc.date.issued	2022
dc.identifier.citation	Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics 1 : 1621-1638 (2022)	es_ES
dc.identifier.uri	http://hdl.handle.net/10810/69967
dc.description.abstract	Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision. In this paper, we formalize the implicit similarity function induced by this approach, and show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation. Based on these insights, we design an alternative similarity metric that mitigates this issue by requiring the entire translation distribution to match, and implement a relaxation of it through the Information Bottleneck method. Our approach incorporates an adversarial term into MT training in order to learn representations that encode as much information about the reference translation as possible, while keeping as little information about the input as possible. Paraphrases can be generated by decoding back to the source from this representation, without having to generate pivot translations. In addition to being more principled and efficient than round-trip MT, our approach offers an adjustable parameter to control the fidelity-diversity trade-off, and obtains better results in our experiments.	es_ES
dc.description.sponsorship	Aitor Ormazabal, Gorka Labaka, Aitor Soroa and Eneko Agirre were supported by the Basque Government (excellence research group IT1343-19 and DeepText project KK-2020/00088) and the Spanish MINECO (project DOMINO PGC2018-102041-B-I00 MCIU/AEI/FEDER, UE). Aitor Ormazabal was supported by a doctoral grant from the Spanish MECD. Computing infrastructure funded by UPV/EHU and Gipuzkoako Foru Aldundia.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	ACL	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.title	Principled Paraphrase Generation with Parallel Corpora	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.rights.holder	(c)2022 The Association for Computational Linguistics, licensed on a Creative Commons Attribution 4.0 International License.	es_ES
dc.relation.publisherversion	https://doi.org/10.18653/v1/2022.acl-long.114	es_ES
dc.identifier.doi	10.18653/v1/2022.acl-long.114
dc.departamentoes	Lenguajes y sistemas informáticos	es_ES
dc.departamentoeu	Hizkuntza eta sistema informatikoak	es_ES

Files in this item

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

Name:: 2022.acl-long.114.pdf
Size:: 497.9Kb
Format:: PDF
Description:: Paper

View/Open

This item appears in the following Collection(s)

Comunicaciones

Show simple item record