Development and validation of prediction models for complex sampling data
View/ Open
Date
2024-03-08Author
Iparragirre Letamendi, Amaia
Metadata
Show full item recordAbstract
Complex survey data are becoming increasingly well-known among researchers from different fields,including social and health sciences. This type of data is obtained by sampling the target populationthrough a complex sampling design. One of the characteristics of this kind of data compared to simplerandom samples, are the sampling weights, which indicate the number of units that each sampledobservation represents in the population. However, the role of sampling weights when modellingcomplex survey data has generated a large debate over the years. In this thesis, we analyze the impact thatsampling weights have in the development process of prediction models for survey data obtained basedon complex sampling designs. In particular, we have made advances in the context of estimation oflogistic regression model parameters, variable selection, estimation of the discrimination ability andclassification of individuals. The validity of the new design-based proposals has been analyzed by meansof extensive simulation studies in which we compare their performance to the traditional unweightedtechniques. In addition, the design-based proposals have been applied to real complex survey data andimplemented in two R-packages (wlasso and wROC) that are freely available.