Christopher Potts +> Code & Data
Linguistic sentiment analysis
- The 50,000 review subset of the IMDB (100MB download) we use in Learning word vectors for sentiment analysis (ACL 2010).
- Data viewer/visualizer for 'On the negativity of negation': an interactive webpage allowing users to visualize unigrams from a large (+250K word) vocabulary. The page also links to a data and code distribution for rerunning the calculations in the paper. [bibtex]
- Developing adjectival scales: Raw experimental data from my attempts to develop and evaluate methods for using naturally occurring metadata (star ratings on service and product reviews) to inform WordNet annotators in constructing modifier scales.
- Data and code for 'Affective demonstratives': The data and code from the paper with Florian Schwarz, including functions to generate all the plots and re-run the main experiments. [bibtex]
- UMass Amherst Linguistics Sentiment Corpora: N-gram counts extracted from over 700,000 online product reviews in Chinese, English, German, and Japanese. The files are UTF-8 encoded text. They are formatted to be read in as R data frames, but they can easily be manipulated with other tools. [bibtex]
The Cards corpus of collaborative task-oriented dialogues
A highly structured corpus of 744 task-oriented dialogues collected with the goal of informing models of pragmatics and discourse. The corpus distribution includes Python and R code for working with the corpus as well as a slideshow documenting its properities and reporting on some pilot studies. [bibtex]
PragBank
Extends FactBank with reader-based veridicality annotations at the level of utterance meaning. [bibtex]
Linguistic Oddities
This collection of examples consists mostly of oddities I found while reading. The emphasis is on example-types that would be very hard to find using standard search techniques. Includes a form for submitting new examples! [bibtex]
Embedded appositives
An annotated collection of 278 sentences containing appositives embedded syntactically in the complement of propositional attitude predicates and verbs of saying, drawn from 177 million words of novels, newspaper articles, and TV transcripts. Intended to inform work on appositives, conventional implicatures, and textual entailment. Includes a Javascript interface, an XML corpus, and a short write-up describing the data and their theoretical relevance. [bibtex]
Wait a minute! What kind of discourse strategy is this?
A lightly annotated collection of 439 examples, drawn from 77 million words of CNN television transcripts, involving Wait a minute. Intended to inform work on presuppositions. Includes a Javascript interface, an XML corpus, and a short write-up describing the data and their theoretical relevance. [bibtex]
Computational phonology
- HaLP: Harmonic grammar with linear programming: A Java implementation of the Harmonic Grammar solver described in Harmonic grammar with linear programming. Joint work with Joe Pater, Rajesh Bhatt, and Michael Becker. [bibtex]
- OT-Help: A downloadable Java program for solving large-scale phonological systems in Optimality Theory and Harmonic Grammar (using the algorithms from HaLP). Includes a robust typology calculator. Joint work with Joe Pater, Rajesh Bhatt, Michael Becker, and a bunch of other researchers at UMass Amherst. [bibtex]
Calculators
- Computing LOI: A tool for teaching Jeroen Groenendijk's Logic of Interrogation.
- Computing Inquisitive Semantics: A tool for calculating possibilities in Inquisitive Semantics.
- Computing PLA: A tool for teaching Paul Dekker's Predicate Logic with Anaphora.


