An investigation of imputation methods for discrete databases and multi-variate time series

Garciarena Hualde, Unai

dc.contributor.advisor	Santana Hermida, Roberto	es
dc.contributor.author	Garciarena Hualde, Unai
dc.date.accessioned	2016-10-03T09:46:22Z
dc.date.available	2016-10-03T09:46:22Z
dc.date.issued	2016-10-03
dc.identifier.uri	http://hdl.handle.net/10810/19052
dc.description.abstract	When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.	es
dc.language.iso	eng	es
dc.relation.ispartofseries	2016;3
dc.rights	info:eu-repo/semantics/openAccess	es
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	missing data	es
dc.subject	imputation method	es
dc.subject	time series	es
dc.subject	supervised classification	es
dc.title	An investigation of imputation methods for discrete databases and multi-variate time series	es
dc.type	info:eu-repo/semantics/masterThesis	es
dc.rights.holder	Attribution-NonCommercial-NoDerivatives 4.0 International	*