Lingusitics 203: Handout
Week 1: September 25, 2002

Elizabeth Traugott

PROBING HISTORICAL CORPORA

Assuming that synchronic variation is the outcome of historical change, one of the main purposes of using historical corpora is to determine how the variation developed over time.

ICAME (International Computer Archives of Modern English, Bergen: Norwegian Computing Centre for the Humanities) contains a large number of corpora of various sizes and quality. Some are synchronic (including transcribed spoken language), others diachronic (see excerpt from Kytö and Rissanen 1997 outlining what historical materials are available, most of them on ICAME; for details of the latter visit the ICAME website at http://www.hd.uib.no/icame.html (the site includes contemporary spoken and written as well as historical corpora)

We will sample only two of these:

  1. The Helsinki Corpus of English Texts, Diachronic Part, (Early Modern period only for our purposes).
    The corpus consists of 1.5 million words, in three equal parts, Old English (texts from c. 750-1150; files starting with "co"), Middle English (texts from 1150 to 1500, files starting with "cm"), and Early Modern English (1500-1710; files starting with "ce"). Each of these is subdivided into three periods which can often be identified by 1,2,3, though as each text has a header with information about dates it is best to check there for dating. Tagged versions of the Old and Middle English portions are now available.
  2. The Lampeter Corpus of seventeenth century formal letters.

Historical corpora present the following opportunities (among others).

Historical corpora present the following challenges (among others):