Daniel Parra Rodríguez presents his Master’s Thesis entitled “Optimal variable selection by means of Evolutionary Computation for classification algorithms” within the research lines of the GenObIA project.

Daniel Parra presents his Master’s Thesis within the research lines of the GenObIA project, entitled “Optimal variable selection by Evolutionary Computation for classification algorithms. Application to the identification of individuals at risk of developing overweight”. In this work, a feature selection system has been designed for classifier systems, based on Evolutionary Computing. In particular, different configurations of a genetic algorithm have been investigated and a particular structure of the selection process that provides interesting results is proposed. The algorithm has the mission to select the most suitable set of variables or features for a classification algorithm. A direct binary coding is used that allows us to perform the evaluation of individuals in an efficient way, in which an individual codes as 1 those variables that will be used in the classifier. To identify these variables, individuals are evaluated by the accuracy (true scores among the total number of cases), obtained by the classifier on which it is to be applied, on a reduced data set.

This system has been applied with the aforementioned classifiers to the Genobia-CM project data, although its design allows it to be applied to any other problem using the appropriate input data format, which is the usual one in classification problems. Genobia is a project involving a consortium of 20 institutions, hospitals and companies, financed by the European Social Fund and the Community of Madrid. The project seeks to design, using artificial intelligence, predictive algorithms for the identification of people at risk of developing overweight, obesity and their associated pathologies. This work has used a database with 1179 individuals provided by the consortium in which information on lifestyle habits and adherence to the Mediterranean diet is collected. The work presented focuses on the selection of variables that provide more information for the correct classification of users into two groups, on the one hand, those whose data indicate that they will not suffer from overweight and those with a greater probability of suffering from this disorder. For this, it has been necessary to understand both the data handled and the tools used for such selection. Our evolutionary selection algorithm has been successfully applied over the Gradient Boosting and decision trees algorithms, allowing to increase the accuracy up to 8%, reaching values of 75%. Our design has been made in such a way that it can be applied to the data provided by the consortium in the future. These data will include genetic information of each individual, as well as a larger number of cases.