Analysis of the Complexity of the Automatic Pipeline Generation Problem
2018 IEEE Congress on Evolutionary Computation (CEC) (pp. 1-8). IEEE.
Abstract
Strategies to automatize the selection of Machine Learning algorithms and their parameters have gained popularity in recent years, to the point of coining the term Automated Machine Learning. The most general version of this problem is pipeline optimization, which seeks an optimal combination of preprocessors and classifiers, along with their respective parameters. In this paper we address the pipeline generation problem from a broader perspective, that of problem complexity understanding as a previous step before proposing a solution, a comprehension we consider critical. The main contribution of this work is the analysis of the characteristics of the fitness landscape. Furthermore, a recently introduced tool for pipeline generation is used to investigate how an automatic method behaves in the previously studied landscape. Results show the high complexity of the pipeline optimization problem, as it can contain several disperse optima, and suffers from a severe lack of generality. Results also suggest that, depending on the dimensions of the search, the model quality target, and the data being modeled, basic search methods can produce results that match the user's expectations.