Linguist 278: Programming for Linguists
Sentence boundaries
The task of extracting sentences from unstructured text is surprisingly challenging. Write an intelligent regular expression function for doing this, along with a corresponding evaluator that measures how well you do:
Propositional logic formulae and interpretation
Create a Formula class and a Model class, to yield a complete implementation of PL. Optional extensions include adding new connectives and building TruthTable objects.
What (if anything) makes the "100 Most Beautiful Words" Special?
I took the 100 Most Beautiful Words (of which there are 107) and enriched them with CMU Pronouncing Dictionary represenations, Celex morphological representations, and frequencies according to the Google N-gram Corpus. I then did the same for a list of 107 randomly selected non-propernames. The goal is to write code that explores what, if anything, distinguishes the two lists, orthographically, phonologically, morphologically, ...
Interactive command-line programming
Use the raw text version of choose_your_career_in_linguistics (see assignment 1 for the links; Trey Jones's web version) as the basis for an interactive command-line program.
Set-theoretic closure operations
Develop methods for calculating powersets, permutations, n-tuples, partitions, cross products, and so forth. Aim for full generality, and perhaps calculate cardinalities first, so that you know whether one of these crazy exponential things is going to run for millennia if really given the user's input.