A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art

Lastra Díaz, Juan José; Goikoetxea Salutregi, Josu; Taieb, Mohamed Ali Hadj; García Serrano, Ana; Ben Aouicha, Mohamed; Agirre Bengoa, Eneko

dc.contributor.author	Lastra Díaz, Juan José
dc.contributor.author	Goikoetxea Salutregi, Josu
dc.contributor.author	Taieb, Mohamed Ali Hadj
dc.contributor.author	García Serrano, Ana
dc.contributor.author	Ben Aouicha, Mohamed
dc.contributor.author	Agirre Bengoa, Eneko
dc.date.accessioned	2024-11-12T16:43:10Z
dc.date.available	2024-11-12T16:43:10Z
dc.date.issued	2019-10
dc.identifier.citation	Engineering Applications of Artificial Intelligence 85 : 645-665 (2019)	es_ES
dc.identifier.issn	1873-6769
dc.identifier.uri	http://hdl.handle.net/10810/70430
dc.description.abstract	Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailed reproducibility protocol together with a collection of software tools and datasets as supplementary material to allow the exact replication of our results.	es_ES
dc.description.sponsorship	This work has been partially supported by the Spanish Ministery of Economy and Competitiveness VEMODALEN project (TIN2015-71785-R), the UPV/EHU (excellence research group) and the Spanish Research Agency LIHLITH project (PCIN-2017-118/AEI) in the framework of EU ERA-Net CHIST-ERA.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	ontology-based semantic similarity measures	es_ES
dc.subject	word embedding models	es_ES
dc.subject	information content models	es_ES
dc.subject	WordNet experimental survey	es_ES
dc.subject	HESML	es_ES
dc.title	A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)	es_ES
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S0952197619301745	es_ES
dc.identifier.doi	10.1016/j.engappai.2019.07.010
dc.departamentoes	Lenguajes y sistemas informáticos	es_ES
dc.departamentoeu	Hizkuntza eta sistema informatikoak	es_ES

Files in this item

Name:: eaai_2019.pdf
Size:: 2.285Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record

© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)

Except where otherwise noted, this item's license is described as © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)