
While my broad research interest is in language and cognition, my work has recently focused on phonetic variation and its relationship to phonological and psycholinguistic theory. The amount of variation in the speech signal is astounding, yet listeners are able to overcome this variation with apparently little difficulty. For example, a single speaker can produce a number of acoustically distinct utterances for any given word. Any word can be produced uniquely by different speakers depending on unpredictable indexical characteristics (e.g., gender, age), or more systematic phonetic characteristics (e.g., dialect, native language phonology).
Consider the word card. For a listener, it is not implausible to encounter productions such as [khart] by a native Polish speaker (or a speaker from any other language with word-final devoicing), [kha:d] by a Bostonian, or [kard] by a native speaker of Spanish. The differences between the production [khard] and the few variations above are minimal. But in language, minimal differences are what make two words distinct. For example, was the Polish speaker talking about a cart or a card? Did the man from Boston say cod or card? And (since English listeners categorically perceive voiceless unaspirated stops as voiced), would they know if the Spanish speaker said guard or card?
Understanding how listeners overcome this variation, whether by storing detailed acoustic information, by repeated exposure, by storing speaker-specific information, or by learning to use different phonetic cues for different speakers, will ultimately impact and inform theoretical models of phonetics, phonology, and spoken word recognition.
Current Research
My current project entitled “The learning and generalization of new contrastive cues”, still in its infancy, examines how listeners use and store non-native phonetic cues to acclimate to non-native speakers of a languages. Individual and language-specific variation make the task of understanding speech more difficult for listeners. Listeners face added difficulty when processing non-native speech. Consider a non-native instructor of a large lecture class (and the students trying to understand the material). A native Polish speaker of English, again, may produce the English words cart and card in a way that is indistinguishable to the ears of native English listeners. In this case, the non-native speaker is not substituting one sound for another (e.g., [t] for [d]), but is substituting an L1 cue (e.g., glottal pulsing) for an English cue (e.g., V/C ratio). This is a potentially important domain of examination, since non-contrastive cues such as glottal pulsing into closure may indeed prove to be contrastive in the processing of non-native speech.
Phonetic variation is an important area of investigation for both linguistics and psychology. A question central to both fields is: How do listeners accommodate and represent the phonetic variation found in non-native speech? The immediate goals of this project are to examine whether (a) listeners map specific phonetic information onto abstract representations or store exemplars containing this critical acoustic detail, (b) listeners use subtle acoustic cues in the processing of non-native speech, and the extent to which factors such as word-frequency and exposure duration affect processing, and (c) explicit phonetic training of non-native English speakers improves native English listener performance (e.g., as measured by higher accuracy rates, faster reaction times in behavioral experiments).
The results of this research may impact both linguistics and psychology. In linguistics, this research has the potential to influence a long-standing debate over abstract and concrete representations, as well as the ways in which acoustic cues are used by speakers (e.g., gross pattern recognition and mapping onto abstract representations vs. storage of subtle phonetic detail). A potential result is that the cues generally thought to be non-contrastive are found to be contrastive. This result would force a reconsideration of general categories in phonology, as well as the assumptions that underlie current models of phonological theory. In psychology, results from this project will inform models of spoken word recognition that are currently unable to cope with variation found in non-native speech.
Past Projects
Within-language variation
Recently, in collaboration with Arthur Samuel, I examined how listeners handle within-language variation in the short-term and long-term and found that variants are processed differently depending on the lag between prime and target presentation. We examined three regular variants of word-final /t/ in English in a set of semantic and long-term priming experiments. In English, coda /t/ can be produced at least three different ways: (1) as a fully-released alveolar stop [t]; (2) as a glottalized, unreleased stop [?t]; and (3) as a released glottal stop [?]. In the New York City dialect of American English (the dialect of the participants), these three instantiations of /t/ are produced regularly, with the glottalized, unreleased stop being the most frequent variant. In two semantic priming tests, we found that all three variants of final-/t/ (e.g., in the word flute: [flut], [flu?t], [flu?]) equally primed a semantically-related word (e.g., music). A minimally-different pseudoword (the “mismatch” condition, e.g., [flus]) did not prime the same target. In two long-term repetition priming experiments, we only found an effect when the fully-articulated [t] was presented in the first block of trials about 15 minutes earlier, followed by the same sound in the second block. While we did find evidence of episodic traces for each variant, the strongest facilitation effects were found when a canonical [t] target was preceded by an identical prime.
Our results suggested that regular variants of a word are equally able to activate semantically-related targets. Deviant forms, as in the mismatch condition, showed no priming at all. In the short-term, then, within-language regular variation is accommodated very well by listeners. In the long-term, all variants are not equal, as prior exposure to the canonical [t] resulted in a much stronger effect than for other variants. We concluded that the featural or gestural information can be utilized immediately by listeners, but this information is not helpful in evoking a stored representation over a period of time. The fact that listeners can cope so well with immediate variation may not be surprising. After all, differing levels of formality, speaking rates, and speaker characteristics lead to a large amount of variation in the speech signal. Variation is not limited to these sources, though. Listeners come across speakers from different language backgrounds and dialects. One would hope that in these situations, our spoken word recognition abilities are also able to accommodate variation in order to understand language.
Dialectal Variation
Building on this concept, I have recently completed an interesting project examining the perception and representation of dialectal variation. This project has strong implications for the influence of variant exposure on both perception and representation. We ultimately argue that a dialect is not only something spoken, but also something internalized by listeners. In addition, we found a major difference between dialect listeners and quasi-dialect listeners (explained below) along with evidence for an intermediate dialect stage in which a quasi-dialect listener is able to process variants quite efficiently, but does not represent variants in the long-term as a dialect-listener does.
We examined r-dropping in the New York City dialect of American English (NY) to understand the consequences dialectal variation has for spoken word processing and representation dependent on the language background of participants. In this dialect, the –er sound at the end of words is produced as a schwa (e.g., similar to the final sound in the word panda). So, the word mother sounds like moth-uh. In three experiments (semantic priming, form priming, and long-term repetition priming), we used four speakers to examine issues surrounding the processing of dialectal variation: two female speakers produced the primes in all three experiments, and two male speakers produced the targets. One of the female speakers had a very strong NY accent, as did one of the male speakers. The remaining two speakers spoke General American (GA).
Through a series of post-experimental interviews and questionnaires, we isolated three listener groups to participate in the experiments: GA listeners, Overt-NY listeners, and Covert-NY listeners. GA listeners were born and raised outside of the New York City area. Overt-NY listeners were born and raised in the New York City area and productively exhibited r-dropping during post-experiment interviews. Finally, Covert-NY listeners were born and raised in the New York City area, but exhibited no r-dropping during the post-experiment interviews. Interestingly, the main difference between the two NY groups of listeners was where their parents and/or grandparents were born and raised. Nearly all of the Overt-NY listeners were third-generation New York City residents, while only 3% of the Covert-NY listeners were third-generation residents. In the immediate processing paradigms (semantic and form priming), we found an asymmetry between GA listeners on the one hand, and Overt- and Covert-NY listeners on the other hand. In these tasks, GA listeners treated NY r-dropped variants as arbitrary in much the same way as the mismatch items in our project examining final-/t/ variation. Both Overt-and Covert-NY listeners processed GA and NY variants effectively. The situation was quite different in the long-term. When primes and targets were presented after a 10 – 15 minute lag, GA listeners showed facilitation only for GA primes followed by GA targets, while Overt-NY listeners showed facilitation effects for both GA and NY primes, suggesting that both are encoded effectively. Interestingly, however, the Covert-NY listeners exhibited facilitation for all GA primes, independent of whether they were followed by GA or NY targets, but shoed no facilitation for targets preceded by NY primes. We conclude from these results that while Overt-NY listeners store variants for both dialects, Covert-NY listeners store only GA variants, but must have some flexibility in variant processing absent from GA listeners which allows them to process NY variants as a regular component of language. This result, while unexpected, suggests a strong influence of language exposure on the development of representations and perception.
Non-native language variation
To some, non-native speech may appear to be arbitrary. However, many of the characteristics that make speech sound ‘non-native’ are actually quite regular and systematic in many ways. Consider the English speech of a native speaker of Spanish. In Spanish, voiceless stops are unaspirated, while voiceless stops in English are aspirated. Generally, English speakers perceive unaspirated voiceless stops as voiced. However, recognition does not utterly fail when listeners meet speakers from different language backgrounds.
While different from English, there is a systematic pattern to the production of stops in English by a speaker of Spanish: voiceless stops are produced with a VOT hovering around zero, and voiced stops have a large, negative VOT. At some point, we would expect the word recognition system to overcome this variation, either be resetting phonetic categories for a particular speaker (which may take time) or by using a cue available in the speech signal each time a stop is produced. In our preliminary studies, however, we found this not to be the case. We found that native English listeners consistently mapped Spanish voiced and voiceless stops onto English voiced stops across a multitude of tasks and training exercises. Experience with a speaker actually expanded a listener’s phonetic category. This inflexibility is consistent with research in the area of second language acquisition suggesting that while new categories may be formed, the shifting of existing category boundaries is uncommon.
We were particularly intrigued by these results because there has recently been an onslaught of research suggesting perceptual learning is robust and relatively easy to manipulate. One difference between the growing body of perceptual learning literature and the shifts required in the English – Spanish situation described above is that recent perceptual learning experiments have shown subtle shifts in category boundaries based on exposure to ambiguous sounds (e.g., a sound ambiguous between [s] and [sh]), whereas the English – Spanish situation requires a listener to shift a category based on a sound that is perceptually unambiguous (e.g., [p]). While subtle shifts may play a role in adjusting to idiosyncratic differences (e.g., potential ambiguities between [s] and [th] found in the speech of a speaker with a lisp), they do not benefit the listener of English trying to reconcile two sounds that are unambiguously perceived as one.
Our next step in this project is to reconcile these two results by attempting to model the eventual gross category shift that may be ultimately present in the processing of non-native speech. One hypothesis is that gross category shifts are actually multiple subtle adjustments made over time. A second hypothesis is that listeners never make a gross category shift, but adjust to other phonetic cues such as intonation, improved processing of other sounds and that the improved perception that occurs over time is due to these factors and not to gross category shifts. Both of these hypotheses are currently under investigation.
Lexical Inhibition and Sublexical Facilitation
A secondary focus of my research is the organization and activation of lexical and sublexical information (see Sumner & Samuel, to appear, appended). When a listener hears a word like tape, current theories of spoken word recognition assert that recognition involves the activation of both lexical (‘tape’) and sublexical (e.g., /t/, /e/, /p/) representations. In contrast, when an unfamiliar utterance (dape) is heard, no lexical representations can be settled on. Using a long-term priming paradigm, we examined lexical decision times for nonwords (e.g. “dape”), as a function of the words or nonwords heard 10-20 minutes earlier. In four experiments we found that the time needed to identify a nonword as a nonword was delayed if a similar word was heard 10-20 minutes before (e.g., tape – dape, job – jop); there was no such delay if the nonword itself had previously been heard (e.g., dape – dape, jop – jop), and there was significant facilitation if a similar nonword had been heard earlier (e.g., tup – dup, jub – jup). The delay suggests that the word’s lexical representation remains active, and competes with the nonword during its recognition. This interference is found both for items sharing onsets (job-jop) and offsets (tape-dape). The equivalence of these two cases supports word recognition models in which a word’s lexical neighborhood determines the set of lexical competitors. In addition, the facilitation found for similar pseudowords (e.g., jub – jup) converging across experiments suggests that sublexical (not necessarily syllabic) CV and VC chunks are active during processing.