The present paper aims to explore methodological approaches to the examination of corpus linguistic data through various tools deriving from discourse analysis. During the examination of data collected from a small, specialised corpus, interesting and unexpected elements were identified which invited further analysis. A corpus of around 100,000 tokens was assembled over a single day in 2014 by collecting contributions to a weblog discussion of the Israel-Palestine conflict. The initial aims of the research were to observe linguistic behaviour in English over a very limited timescale and involving a specific and highly controversial topic, with a secondary aim being the examination of methodological issues concerned with the creation of small corpora and how they should be interrogated. Data were analysed both quantitatively and qualitatively, but there was a stated intention from the outset to attempt to employ a hands-on approach as much as possible. Various problems emerged during the study, perhaps the most salient being that of attribution and tagging, and the most suggestive being the employment of names and highly pragmatic discourse features in curious and perhaps unexpected ways. A rereading of the corpus concentrating on names used and other forms of self-presentation, along with observations of attempts to impose identities on others, suggested a need for discourse-level analysis of linguistic behaviour in weblogs and discussion pages, since the naming devices often appear as phrases rather than individual words (even if they present as a single token with no spaces) and they presuppose a textual environment and a form of dialogue or interaction. Prevalent instances of metaphor and a repeated use of highly varied informal discourse markers (again often with apparently self-identifying pragmatic purposes) encouraged examination of cohesion over significant textual distances (c.f. progressive relatedness), while issues surrounding coherence again challenged the potentially limited quantitative notions of what corpus linguistic analysis entails. In the paper, some fundamental assumptions in corpus linguistics are questioned, including the concept of repetition and so of the seemingly obvious binary contrast of token and type, the parsing of items, the use and interpretation of metaphor, metonymy and intertextuality, and the sociolinguistic and pragmatic elements inherent in all utterances. The complexity and richness of corpus linguistic data is seen to render qualitative analysis very demanding, but of unquestionable potential significance. Discourse level analysis is deemed a necessary tool, and the paper concludes with the suggestion that the future of corpus linguistic studies should indeed be two-fold, with constant comparison and triangulation of data from large-scale general language corpora and small-scale, specialised ones.
Naming or shaming? Presentations of the self in specialisec weblog discourse
richard chapman
2017
Abstract
The present paper aims to explore methodological approaches to the examination of corpus linguistic data through various tools deriving from discourse analysis. During the examination of data collected from a small, specialised corpus, interesting and unexpected elements were identified which invited further analysis. A corpus of around 100,000 tokens was assembled over a single day in 2014 by collecting contributions to a weblog discussion of the Israel-Palestine conflict. The initial aims of the research were to observe linguistic behaviour in English over a very limited timescale and involving a specific and highly controversial topic, with a secondary aim being the examination of methodological issues concerned with the creation of small corpora and how they should be interrogated. Data were analysed both quantitatively and qualitatively, but there was a stated intention from the outset to attempt to employ a hands-on approach as much as possible. Various problems emerged during the study, perhaps the most salient being that of attribution and tagging, and the most suggestive being the employment of names and highly pragmatic discourse features in curious and perhaps unexpected ways. A rereading of the corpus concentrating on names used and other forms of self-presentation, along with observations of attempts to impose identities on others, suggested a need for discourse-level analysis of linguistic behaviour in weblogs and discussion pages, since the naming devices often appear as phrases rather than individual words (even if they present as a single token with no spaces) and they presuppose a textual environment and a form of dialogue or interaction. Prevalent instances of metaphor and a repeated use of highly varied informal discourse markers (again often with apparently self-identifying pragmatic purposes) encouraged examination of cohesion over significant textual distances (c.f. progressive relatedness), while issues surrounding coherence again challenged the potentially limited quantitative notions of what corpus linguistic analysis entails. In the paper, some fundamental assumptions in corpus linguistics are questioned, including the concept of repetition and so of the seemingly obvious binary contrast of token and type, the parsing of items, the use and interpretation of metaphor, metonymy and intertextuality, and the sociolinguistic and pragmatic elements inherent in all utterances. The complexity and richness of corpus linguistic data is seen to render qualitative analysis very demanding, but of unquestionable potential significance. Discourse level analysis is deemed a necessary tool, and the paper concludes with the suggestion that the future of corpus linguistic studies should indeed be two-fold, with constant comparison and triangulation of data from large-scale general language corpora and small-scale, specialised ones.File | Dimensione | Formato | |
---|---|---|---|
Chapman_20_2017.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
Full text (versione editoriale)
Licenza:
Creative commons
Dimensione
349.02 kB
Formato
Adobe PDF
|
349.02 kB | Adobe PDF | Visualizza/Apri |
I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.