SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

A crucial aspect in designing a learning algorithm is the selection of the hyperparameters (parameters that are not trained during the learning process). In particular the effectiveness of the stochastic gradient methods strongly depends on the steplength selection. In recent papers [9, 10], Franchini et al. propose to adopt an adaptive selection rule borrowed from the full-gradient scheme known as Limited Memory Steepest Descent method [8] and appropriately tailored to the stochastic framework. This strategy is based on the computation of the eigenvalues (Ritz-like values) of a suitable matrix obtained from the gradients of the most recent iterations, and it enables to give an estimation of the local Lipschitz constant of the current gradient of the objective function, without introducing line-search techniques. The possible increase of the size of the sub-sample used to compute the stochastic gradient is driven by means of an augmented inner product test approach [3]. The whole procedure makes the tuning of the parameters less expensive than the selection of a fixed steplength, although it remains dependent on the choice of threshold values bounding the variability of the steplength sequences. The contribution of this paper is to exploit a stochastic version of the Barzilai-Borwein formulas [1] to adaptively select the endpoints range for the Ritz-like values. A numerical experimentation for some convex loss functions highlights that the proposed procedure remains stable as well as the tuning of the hyperparameters appears less expensive.

Thresholding Procedure via Barzilai-Borwein Rules for the Steplength Selection in Stochastic Gradient Methods

G. Franchini;V. Ruggiero^Secondo;I. Trombini^Ultimo

2022

Abstract

A crucial aspect in designing a learning algorithm is the selection of the hyperparameters (parameters that are not trained during the learning process). In particular the effectiveness of the stochastic gradient methods strongly depends on the steplength selection. In recent papers [9, 10], Franchini et al. propose to adopt an adaptive selection rule borrowed from the full-gradient scheme known as Limited Memory Steepest Descent method [8] and appropriately tailored to the stochastic framework. This strategy is based on the computation of the eigenvalues (Ritz-like values) of a suitable matrix obtained from the gradients of the most recent iterations, and it enables to give an estimation of the local Lipschitz constant of the current gradient of the objective function, without introducing line-search techniques. The possible increase of the size of the sub-sample used to compute the stochastic gradient is driven by means of an augmented inner product test approach [3]. The whole procedure makes the tuning of the parameters less expensive than the selection of a fixed steplength, although it remains dependent on the choice of threshold values bounding the variability of the steplength sequences. The contribution of this paper is to exploit a stochastic version of the Barzilai-Borwein formulas [1] to adaptively select the endpoints range for the Ritz-like values. A numerical experimentation for some convex loss functions highlights that the proposed procedure remains stable as well as the tuning of the hyperparameters appears less expensive.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	ISBN
	
				9783030954697
			
	Parole chiave
	
				Adaptive sub-sampling strategies; Barzilai-Borwein rules; Learning rate selection rule; Reduction variance techniques; Stochastic gradient methods
			
	Appare nelle tipologie:
	
				04.2 Contributi in atti di convegno (in Volume)

File in questo prodotto:

File	Dimensione	Formato
main_revised.pdf solo gestori archivio Descrizione: Pre-print Tipologia: Pre-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 647.04 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	647.04 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
Thresholding Procedure via Barzilai-Borwein Rules.pdf solo gestori archivio Descrizione: Full text editoriale Tipologia: Full text (versione editoriale) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 927.45 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	927.45 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2477230

Citazioni

ND

1

0

social impact