To date, decision trees are among the most used classification models. They owe their popularity to their efficiency during both the learning and the classification phases and, above all, to the high interpretability of the learned classifiers. This latter aspect is of primary importance in those domains in which understanding and validating the decision process is as important as the accuracy degree of the prediction. Pruning is a common technique used to reduce the size of decision trees, thus improving their interpretability and possibly reducing the risk of overfitting. In the present work, we investigate on the integration between evolutionary algorithms and decision tree pruning, presenting a decision tree post-pruning strategy based on the well-known multi-objective evolutionary algorithm NSGA-II. Our approach is compared with the default pruning strategies of the decision tree learners C4.5 (J48 - on which the proposed method is based) and C5.0. We empirically show that evolutionary algorithms can be profitably applied to the classical problem of decision tree pruning, as the proposed strategy is capable of generating a more variegate set of solutions than both J48 and C5.0; moreover, the trees produced by our method tend to be smaller than the best candidates produced by the classical tree learners, while preserving most of their accuracy and sometimes improving it.
Decision tree pruning via multi-objective evolutionary computation
Sciavicco, Guido
2017
Abstract
To date, decision trees are among the most used classification models. They owe their popularity to their efficiency during both the learning and the classification phases and, above all, to the high interpretability of the learned classifiers. This latter aspect is of primary importance in those domains in which understanding and validating the decision process is as important as the accuracy degree of the prediction. Pruning is a common technique used to reduce the size of decision trees, thus improving their interpretability and possibly reducing the risk of overfitting. In the present work, we investigate on the integration between evolutionary algorithms and decision tree pruning, presenting a decision tree post-pruning strategy based on the well-known multi-objective evolutionary algorithm NSGA-II. Our approach is compared with the default pruning strategies of the decision tree learners C4.5 (J48 - on which the proposed method is based) and C5.0. We empirically show that evolutionary algorithms can be profitably applied to the classical problem of decision tree pruning, as the proposed strategy is capable of generating a more variegate set of solutions than both J48 and C5.0; moreover, the trees produced by our method tend to be smaller than the best candidates produced by the classical tree learners, while preserving most of their accuracy and sometimes improving it.I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.