SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement. Results show that articulatory features generated by this approach can correlate up to 0.59 Pearson's product-moment correlation with measured articulatory features.

Improving Generalization of Vocal Tract Feature Reconstruction: From Augmented Acoustic Inversion to Articulatory Feature Reconstruction without Articulatory Data

Turrisi R.^{Primo

Formal Analysis};

2018

Abstract

We address the problem of reconstructing articulatory movements, given audio and/or phonetic labels. The scarce availability of multi-speaker articulatory data makes it difficult to learn a reconstruction that generalizes to new speakers and across datasets. We first consider the XRMB dataset where audio, articulatory measurements and phonetic transcriptions are available. We show that phonetic labels, used as input to deep recurrent neural networks that reconstruct articulatory features, are in general more helpful than acoustic features in both matched and mismatched training-testing conditions. In a second experiment, we test a novel approach that attempts to build articulatory features from prior articulatory information extracted from phonetic labels. Such approach recovers vocal tract movements directly from an acoustic-only dataset without using any articulatory measurement. Results show that articulatory features generated by this approach can correlate up to 0.59 Pearson's product-moment correlation with measured articulatory features.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	ISBN
	
				9781538643341
			
	Parole chiave
	
				acoustic inversion
Articulatory features
deep learning
tract variables
XRMB
			
	Appare nelle tipologie:
	
				04.2 Contributi in atti di convegno (in Volume)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2573913

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

1

social impact