Expressive content and the semantics of contexts
NSF Grant No. BCS-0642752

Data

UMass Amherst Linguistics Sentiment Corpora
N-gram counts extracted from over 700,000 online product reviews in Chinese, English, German, and Japanese. The files are UTF-8 encoded text. They are formatted to be read in as R data frames, but they can easily be manipulated with other tools.
Embedded appositives
An annotated collection of 278 sentences containing appositives embedded syntactically in the complement of propositional attitude predicates and verbs of saying, drawn from 177 million words of novels, newspaper articles, and TV transcripts. Intended to inform work on appositives, conventional implicatures, and textual entailment. Includes a Javascript interface, an XML corpus, and a short write-up describing the data and their theoretical relevance.
Wait a minute! What kind of discourse strategy is this?
A lightly annotated collection of 439 examples, drawn from 77 million words of CNN television transcripts, involving Wait a minute. Intended to inform work on presuppositions. Includes a Javascript interface, an XML corpus, and a short write-up describing the data and their theoretical relevance.