With Christopher Potts and Sven Lauer, I developed the Card Corpus, a highly structured corpus of 744 task-oriented dialogues collected with the goal of informing models of pragmatics and discourse. The corpus distribution includes Python and R code for working with the corpus as well as a slideshow documenting its properities and reporting on some pilot studies. Version 2 can be found here. Some papers associated with this project are listed below:
I developed the House Proceedings Corpus (HPC), a highly structured corpus of complete congressional house proceedings that contains over 2,700 transcripts, tagged for part-of-speech (POS) using the Stanford POS tagger. The HPC is
comprised of individual .JSON files to avoid data-corruption and easily importable into a MongoDB. The HPC has 181,648,994 tokens with a vocabulary of 314,031 words. The corpus itself is available upon request, and a Python wrapper for working
with the corpus (as well as other tools) are available here.
With Eric Acton and Christopher Potts, I am currently investigating the partisan use of attitude verbs and demonstratives. More to come.
I became interested in the Finnish case system in 2010, when I took a class offered by Paul Kiparsky , Lauri Karttunen and Arto Anttila. In Finnish, the object of a transitive verb is case marked in one of two ways: with the accusative or partitive. Which case is assigned is a function of the lexical semantic properties of the verb, its object and the way in which their meanings are put together. The interesting question is: which lexical semantic properties trigger which case assignment? I argue that Finnish states (in the sense of Vendler (1967)) which lexically encode for the property of existential commitment assign accusative case to their objects, whereas ones which lack it assign partitive. This observation is consistent with Barbara Partee's observation regarding the relation between intensionality and the genitive of negation in Russian.
I have re-conceived of OT in algebraic, or equivalently, propositional terms. Given that, for a set of arbitrary constraints and candidate, i.e., an input/output pair, the set of all grammars over that constraint set that make that candidate optimal can be determined, it is natural to think of a candidate in terms of the set of grammars that make it most optimal. In this way, notions like 'candidate entailment' can naturally be defined, allowing the OT-theorist to reason about datasets in strictly algebraic or logical terms.
With Arto Anttila, I have solved the ranking problem in Partial Order Optimality Theory (PoOT), which can be stated as follows: allowing for free variation, given a finite set of input/output pairs, i.e., a dataset, that a speaker knows to be part of some language, how can learn the set of all PoOT grammars under some constraint set compatible with that dataset?
For an arbitrary dataset, we provide set-theoretic means for constructing the set of all PoOT grammars compatible with that dataset. Specifically, we determine the set of all strict orders of constraints that are compatible with dataset. As every strict total order is in fact a strict order, our
solution is applicable in both PoOT and classical optimality theory (COT), showing that the ranking problem in COT is a special instance of a more general one in PoOT. Currently, Arto and I are developing a web-application implementing the solution laid out below.
More to come.
The role of inference as it relates to natural language semantics has oft been neglected. Recently, there has been a move away by some NL semanticists from the heavy machinery of Montagovian-style semantics to a more proof-based approach. This represents a belief that the derivability plays as central a role in NL semantics as that of entailment. Proof-theoretic semantics is a rich domain for developing computationally feasible models of natural language inference. I began work in this area by logicizing certain aspects of Bill MacCartney's algorithmic approach to natural logic and proved a completeness theorem for it. Code associated with this project can be found here. Some papers and talks associated with this project are listed below: