Lingusitics 203: Handout
Week 1: September 25, 2002
Elizabeth Traugott
PROBING HISTORICAL CORPORA
Assuming that synchronic variation is the outcome of historical change, one of the main purposes of using historical corpora is to determine how the variation developed over time.
ICAME (International Computer Archives of Modern English, Bergen: Norwegian Computing Centre for the Humanities) contains a large number of corpora of various sizes and quality. Some are synchronic (including transcribed spoken language), others diachronic (see excerpt from Kytö and Rissanen 1997 outlining what historical materials are available, most of them on ICAME; for details of the latter visit the ICAME website at
http://www.hd.uib.no/icame.html
(the site includes contemporary spoken and written as well as historical corpora)
We will sample only two of these:
- The Helsinki Corpus of English Texts, Diachronic Part, (Early Modern period only for our purposes). The corpus consists of 1.5 million words, in three equal parts, Old English (texts from c. 750-1150; files starting with "co"), Middle English (texts from 1150 to 1500, files starting with "cm"), and Early Modern English (1500-1710; files starting with "ce"). Each of these is subdivided into three periods which can often be identified by 1,2,3, though as each text has a header with information about dates it is best to check there for dating. Tagged versions of the Old and Middle English portions are now available.
- The Lampeter Corpus of seventeenth century formal letters.
Historical corpora present the following opportunities (among others).
- Like other corpora they provide empirical evidence with which to challenge preconceptions, especially claims that have been made based on theoretical assumptions. (See brief case study of the development of subjecthood in English, September 25th.)
- They provide insight into the types of context in which changes occur. Much historical work up to c. 1990 was done without attention to context. E.g. it was repeatedly observed that verbs of motion, especially verbs meaning GO, become futures (cf. English be going to). The assumption was that this change arose via metaphor (mapping space onto time, i.e. using concrete experience of space to express the more abstract concept of time). Corpora force us to rethink how/why changes of this sort occur, by providing evidence of the contexts in which be going to appears to have primary (or exclusively) future meaning. (See case study of be going to using the Lampeter corpus, to be discussed October 9th.)
- They underline the fact that language change involves variation. Despite the way we tend to talk about change, A > B does not occur. Change always involves
A > A ~ B ( > B)
Historical corpora present the following challenges (among others):
- Materials are written only.
- What is the relationship of writing to language change?
- To what extent can we get access to language that approximates spoken language? Texts that are private diaries, personal letters, sermons, depositions of witnesses, plays, etc. are thought to be closest to spoken language.
An example from a private diary of a member of the House of Commons in 1621 (cited in Matti Rissanen. 1986. Variation and the study of English Historical Syntax. In David Sankoff, ed., Diversity and Diachrony, 97-109. Amsterdam: Benjamins.).
- Standardized spelling, punctuation, syntax as we now understand them developed in the eighteenth century (and differ in the US and UK). This means that to search corpora we have to be sensitive to spelling variations. What does spelling represent (phonemic, phonetic, morphophonemic structure?).
- Since our prime target is to trace variation over time, if corpora include texts of different genres how accurate a sense do we get of change?
- All corpora are edited, and excerpted, some are normalized. We should not allow easy access to corpora to lead us to ignore complete texts and especially MSS.