Show simple item record

dc.contributor.authorLastra Díaz, Juan José
dc.contributor.authorGoikoetxea Salutregi, Josu
dc.contributor.authorTaieb, Mohamed Ali Hadj
dc.contributor.authorGarcía Serrano, Ana
dc.contributor.authorBen Aouicha, Mohamed
dc.contributor.authorAgirre Bengoa, Eneko ORCID
dc.date.accessioned2024-11-12T16:43:10Z
dc.date.available2024-11-12T16:43:10Z
dc.date.issued2019-10
dc.identifier.citationEngineering Applications of Artificial Intelligence 85 : 645-665 (2019)es_ES
dc.identifier.issn1873-6769
dc.identifier.urihttp://hdl.handle.net/10810/70430
dc.description.abstractHuman similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailed reproducibility protocol together with a collection of software tools and datasets as supplementary material to allow the exact replication of our results.es_ES
dc.description.sponsorshipThis work has been partially supported by the Spanish Ministery of Economy and Competitiveness VEMODALEN project (TIN2015-71785-R), the UPV/EHU (excellence research group) and the Spanish Research Agency LIHLITH project (PCIN-2017-118/AEI) in the framework of EU ERA-Net CHIST-ERA.es_ES
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectontology-based semantic similarity measureses_ES
dc.subjectword embedding modelses_ES
dc.subjectinformation content modelses_ES
dc.subjectWordNet experimental surveyes_ES
dc.subjectHESMLes_ES
dc.titleA reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the artes_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.holder© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)es_ES
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0952197619301745es_ES
dc.identifier.doi10.1016/j.engappai.2019.07.010
dc.departamentoesLenguajes y sistemas informáticoses_ES
dc.departamentoeuHizkuntza eta sistema informatikoakes_ES


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Except where otherwise noted, this item's license is described as © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)