Analysis of imputation methods for data gaps in high resolution smart meters in buildings
Missing data is one of the most common issues of the raw data in data analysis. Missing-ness could be ignored if it is considered not to have a significant impact on the analysis. In other cases, imputation methods are applied to handle them as machine learning models performed on the data with missing values may have a drastic decrease of the quality with the existence of the missing points. This thesis aims to determine the accuracy of the predictions of single and multiple imputation methods on the energy data as well as con-sidering the impact the weather variables have on them. To test the methods, the case study was conducted on four separate smart energy meter data from residential buildings located in Tartu, Estonia and each data set also comprised weather variables collected independently by the University of Tartu. The artificial miss-ing values were entered in the clean data to examine the imputation techniques which allowed to compare the outcome with the original complete data set. The results demon-strated the higher accuracy for multiple imputation methods as opposed to the univariate analysis and the importance of highly correlated variables for the prediction of missing points. We conclude that the increase of the variables included for the prediction of the analysis of the missing values is likely to increase the accuracy of the method as well. Despite multiple imputations appear to have the best accuracy, the challenges related to the con-current missing values for all variables coming from the same sensor should be considered.