Statistical model for the reproducibility in ranking based feature selection

Urkullu Villanueva, Ari; Pérez Martínez, Aritz; Calvo Molinos, Borja

Ver/

Technical_report_17-1.pdf (2.753Mb)

Fecha

2017-10-25

Autor

Urkullu Villanueva, Ari

Pérez Martínez, Aritz

Calvo Molinos, Borja

Metadatos

Mostrar el registro completo del ítem

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/23240

Resumen

Recently, concerns about the reproducibility of scientific studies have been growing among the scientific community, mainly due to the existing large quantity of irreproducible results. This has reach such an extent that a perception of a reproducibility crisis has spread through the scientific community (Baker, 2016). Among others, researchers point out “insufficient replication in the lab, poor oversight or low statistical power” as the reasons behind this crisis. Indeed, the A.S.A. warned almost two years ago that the problem derived from an inappropriate use of some statistical tools (Wasserstein & Lazar, 2016). Motivated to work on this reproducibility problem, in this paper we present a framework that allows to model the reproducibility in ranking based feature subset selection problems. In that context, among n features that could be relevant for a given objective, an attempt is made to choose the best subset of a prefixed size i ∈ {1,..., n} through a method capable of ranking the features. In this situation, we will analyze the reproducibility of a given method which is defined as the consistency of the selection in different repetitions of the same experiment.

Colecciones

Informes técnicos y Documentos de trabajo