Sequences play a major role in the extraction of information from data. As an example, in business intelligence, they can be used to track the evolution of customer behaviors over time or to model relevant relationships. In this paper, we focus our attention on the domain of contact centers, where sequential data typically take the form of oral or written interactions, and word sequences often play a major role in text classification, and we investigate the connections between sequential data and text mining techniques. The main contribution of the paper is a new machine learning algorithm, called J48S, that associates semantic knowledge with telephone conversations. The proposed solution is based on the well-known C4.5 decision tree learner, and it is natively able to mix static, that is, numeric or categorical, data and sequential ones, such as texts, for classification purposes. The algorithm, evaluated in a real business setting, is shown to provide competitive classification performances compared with classical approaches, while generating highly interpretable models and effectively reducing the data preparation effort.

J48S: a Sequence Classification Approach to Speech Analysis based on Decision Trees

Sciavicco, Guido
Ultimo
2018

Abstract

Sequences play a major role in the extraction of information from data. As an example, in business intelligence, they can be used to track the evolution of customer behaviors over time or to model relevant relationships. In this paper, we focus our attention on the domain of contact centers, where sequential data typically take the form of oral or written interactions, and word sequences often play a major role in text classification, and we investigate the connections between sequential data and text mining techniques. The main contribution of the paper is a new machine learning algorithm, called J48S, that associates semantic knowledge with telephone conversations. The proposed solution is based on the well-known C4.5 decision tree learner, and it is natively able to mix static, that is, numeric or categorical, data and sequential ones, such as texts, for classification purposes. The algorithm, evaluated in a real business setting, is shown to provide competitive classification performances compared with classical approaches, while generating highly interpretable models and effectively reducing the data preparation effort.
2018
978-3-319-99971-5
File in questo prodotto:
File Dimensione Formato  
J48S.pdf

solo gestori archivio

Descrizione: Pre-print
Tipologia: Pre-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 287.04 kB
Formato Adobe PDF
287.04 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
J48S A Sequence Classification Approach to Text Analysis Based on Decision Trees.pdf

solo gestori archivio

Descrizione: Full text editoriale
Tipologia: Full text (versione editoriale)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 599.47 kB
Formato Adobe PDF
599.47 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2392501
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact