A large reproducible benchmark of ontology-based methods and word embeddings for word similarity

Goikoetxea Salutregi, Josu; Lastra Díaz, Juan José; Agirre Bengoa, Eneko; Taieb, Mohamed Ali Hadj; García Serrano, Ana; Ben Aouicha, Mohamed; Sánchez, David

dc.contributor.author	Goikoetxea Salutregi, Josu
dc.contributor.author	Lastra Díaz, Juan José
dc.contributor.author	Agirre Bengoa, Eneko
dc.contributor.author	Taieb, Mohamed Ali Hadj
dc.contributor.author	García Serrano, Ana
dc.contributor.author	Ben Aouicha, Mohamed
dc.contributor.author	Sánchez, David
dc.date.accessioned	2024-12-03T17:08:13Z
dc.date.available	2024-12-03T17:08:13Z
dc.date.issued	2020-09-30
dc.identifier.citation	Information Systems 96 : (2021) // Article ID 101636	es_ES
dc.identifier.issn	0306-4379
dc.identifier.issn	1873-6076
dc.identifier.uri	http://hdl.handle.net/10810/70754
dc.description.abstract	This work is a companion reproducibility paper of the experiments and results reported in Lastra-Diaz et al. (2019a), which is based on the evaluation of a companion reproducibility dataset with the HESML V1R4 library and the long-term reproducibility tool called Reprozip. Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorization, memory, decision-making and reasoning. For this reason, the research on methods for the estimation of the degree of similarity and relatedness between words and concepts has received a lot of attention in the fields of artificial intelligence and cognitive sciences. However, despite the huge research effort done, there is a lack of a self-contained, reproducible and extensible collection of benchmarks which being amenable to become a de facto standard for large scale experimentation in this line of research. In order to bridge this reproducibility gap, this work introduces a set of reproducible experiments on word similarity and relatedness by providing a detailed reproducibility protocol together with a set of software tools and a self-contained reproducibility dataset, which allow that all experiments and results in our aforementioned work to be reproduced exactly. Our aforementioned primary work introduces the largest, most detailed and reproducible experimental survey on word similarity and relatedness reported in the literature, which is based on the implementation of all evaluated methods into the same software platform. Our reproducible experiments evaluate most of methods in the families of ontology-based semantic similarity measures and word embedding models. We also detail how to extend our experiments to evaluate other unconsidered experimental setups. Finally, we provide a corrigendum for a mismatch in the MC28 similarity scores used in our original experiments	es_ES
dc.description.sponsorship	This work has been partially supported by the Spanish project VEMODALEN (TIN2015-71785-R), the Basque Government (type A IT1343-19), BBVA BigKnowledge bigknowledge project, and the Spanish Research Agency LIHLITH project (PCIN-2017-118/AEI) in the framework of EU ERA-Net CHIST-ERA.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	A large reproducible benchmark of ontology-based methods and word embeddings for word similarity	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	© 2020 Elsevier under CC BY-NC-ND license	es_ES
dc.relation.publisherversion	https://doi.org/10.1016/j.is.2020.101636	es_ES
dc.identifier.doi	10.1016/j.is.2020.101636
dc.departamentoes	Lenguajes y sistemas informáticos	es_ES
dc.departamentoeu	Hizkuntza eta sistema informatikoak	es_ES

Files in this item

Name:: IS_preprint.pdf
Size:: 392.9Kb
Format:: PDF
Description:: Postprint

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record