In this paper, we consider a data set taken from the administration of the Behavior Assessment System for Children test to 157 subjects, and we approach the problem of clustering and classify the subjects in an interpretable fashion. Because the Behavior Assessment System for Children test is originally composed of 149 questions (152 in the particular version used for this experiment), we first propose a feature selection wrapper model composed by a multi‐objective evolutionary algorithm, the iterative clustering method expectation–maximization, and the classifier C4.5 for the unsupervised feature selection towards the classification of the data with two objectives: maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We propose a methodology to integrate feature selection for unsupervised classification, model evaluation, decision‐making (to choose the most satisfactory model according to an a posteriori process in a multi‐objective context), and testing. The selected data set that is the result of this process, where each instance is labeled with its class, is then used for supervised learning via both C4.5 and a novel evolutionary computation‐based fuzzy classifier to obtain interpretable rules. We discuss and compare the behavior of two different evolutionary algorithms (ENORA (Evolutionary NOn‐dominated Radial slots based Algorithm) and NSGA‐II (Non‐dominated Sorted Genetic Algorithm)) at different levels: as search strategies for feature selection, as search strategies for fuzzy classification, and in terms of quality of the results. It turns out that ENORA behaves better in terms of quality of the result in the feature selection phase (obtaining a selection that shows higher accuracy under C4.5 after cross‐validation), and again in the fuzzy classification phase, from both points of view: hypervolume evolution and interpretability of results. During the entire process, the solutions are validated by the psychologists who collected the data.

Unsupervised feature selection for interpretable classification in behavioral assessment of children

SCIAVICCO, Guido
Ultimo
2017

Abstract

In this paper, we consider a data set taken from the administration of the Behavior Assessment System for Children test to 157 subjects, and we approach the problem of clustering and classify the subjects in an interpretable fashion. Because the Behavior Assessment System for Children test is originally composed of 149 questions (152 in the particular version used for this experiment), we first propose a feature selection wrapper model composed by a multi‐objective evolutionary algorithm, the iterative clustering method expectation–maximization, and the classifier C4.5 for the unsupervised feature selection towards the classification of the data with two objectives: maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We propose a methodology to integrate feature selection for unsupervised classification, model evaluation, decision‐making (to choose the most satisfactory model according to an a posteriori process in a multi‐objective context), and testing. The selected data set that is the result of this process, where each instance is labeled with its class, is then used for supervised learning via both C4.5 and a novel evolutionary computation‐based fuzzy classifier to obtain interpretable rules. We discuss and compare the behavior of two different evolutionary algorithms (ENORA (Evolutionary NOn‐dominated Radial slots based Algorithm) and NSGA‐II (Non‐dominated Sorted Genetic Algorithm)) at different levels: as search strategies for feature selection, as search strategies for fuzzy classification, and in terms of quality of the results. It turns out that ENORA behaves better in terms of quality of the result in the feature selection phase (obtaining a selection that shows higher accuracy under C4.5 after cross‐validation), and again in the fuzzy classification phase, from both points of view: hypervolume evolution and interpretability of results. During the entire process, the solutions are validated by the psychologists who collected the data.
2017
Jiménez, Fernando; Jódar, Rosalia; Martín, Maria del Pilar; Sánchez, Gracia; Sciavicco, Guido
File in questo prodotto:
File Dimensione Formato  
expsys2017.pdf

solo gestori archivio

Descrizione: Full text editoriale
Tipologia: Full text (versione editoriale)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 749.31 kB
Formato Adobe PDF
749.31 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2374146
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 11
social impact