Statistical model for the reproducibility in ranking based feature selection
Fecha
2017-10-25Metadatos
Mostrar el registro completo del ítemResumen
Recently, concerns about the reproducibility of scientific studies have been growing among the scientific community, mainly due to the existing large quantity of irreproducible results. This has reach such an extent that a perception of a reproducibility crisis has spread through the scientific community (Baker, 2016). Among others, researchers point out “insufficient replication in the lab, poor oversight or low statistical power” as the reasons behind this crisis. Indeed, the A.S.A. warned almost two years ago that the problem derived from an inappropriate use of some statistical tools (Wasserstein & Lazar, 2016). Motivated to work on this reproducibility problem, in this paper we present a framework that allows to model the reproducibility in ranking based feature subset selection problems. In that context, among n features that could be relevant for a given objective, an attempt is made to choose the best subset of a prefixed size i ∈ {1,..., n} through a method capable of ranking the features. In this situation, we will analyze the reproducibility of a given method which is defined as the consistency of the selection in different repetitions of the same experiment.