SFERA Archivio dei prodotti della Ricerca dell'Università di Ferrara

The use of OCR software to convert printed characters to digital text is a fundamental tool within diachronic approaches to Corpus-assisted discourse Studies. However, OCR software is not totally accurate, and the resulting error rate may compromise the qualitative analysis of the studies. This paper proposes a mixed qualitative-quantitative approach to OCR error detection and correction in order to develop a methodology for compiling historical corpora. We present a case study on newspapers of the beginning of the 20th century for the linguistic analysis of the metaphors representing immigrants.

OCR Correction for Corpus-assisted Discourse Studies: a Case Study of Old Newspapers

Dario Del Fante^Primo;

2021

Abstract

The use of OCR software to convert printed characters to digital text is a fundamental tool within diachronic approaches to Corpus-assisted discourse Studies. However, OCR software is not totally accurate, and the resulting error rate may compromise the qualitative analysis of the studies. This paper proposes a mixed qualitative-quantitative approach to OCR error detection and correction in order to develop a methodology for compiling historical corpora. We present a case study on newspapers of the beginning of the 20th century for the linguistic analysis of the metaphors representing immigrants.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2021
			
	ISBN
	
				9788894253559
			
	Parole chiave
	
				corpus-assisted discourse studies, OCR detection, OCR correction
			
	Appare nelle tipologie:
	
				04.3 Abstract (Riassunto) in convegno in Rivista/Volume

File in questo prodotto:

File	Dimensione	Formato
AIUCD 2021 - Del Fante.pdf accesso aperto Descrizione: versione editoriale Tipologia: Full text (versione editoriale) Licenza: Creative commons Dimensione 660.13 kB Formato Adobe PDF Visualizza/Apri	660.13 kB	Adobe PDF	Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2500972

Citazioni

ND

ND

ND

social impact