Sharon Goldwater

I am a postdoctoral scholar in the Department of Linguistics at Stanford University. I completed my Ph.D. at Brown University in September 2006.

You can email me (sgwater) at stanford.

Research Interests
Resources
Publications
Personal Info

Research Interests

One of the great motivating factors in the development of modern linguistic theory is the astonishing ability of children to attain linguistic proficiency in only a few years, with apparently impoverished input. My interests lie in exploring the extent to which this ability can be explained by appealing to probabilistic notions of language and learning. I consider questions such as: What kinds of structures are considered by the learning mechanism? How much and what sort of evidence is necessary to produce generalizations? Are there innate constraints that are specific to language acquisition, or can language be learned successfully using only general learning biases? I investigate these questions by implementing explicit computational models of language acquisition within a Bayesian statistical framework. To date, my research has focused on developing models of morphological and phonological acquisition. I am also interested in the problem of unsupervised learning in general (i.e. learning without access to the "correct" answers), and in adapting and applying unsupervised machine learning techniques in a cognitively plausible way.

Resources

Publications

A Bayesian Framework for Word Segmentation: Exploring the Effects of Context. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. In submission. ( PDF )
[NOTE: results in this paper are based on a newer version of the code used in the ACL06 and BUCLD07 word segmentation papers and chapter 5 of my thesis. The new version corrects a small bug in the implementation of the bigram (HDP) model. Please cite results from this paper in future publications.]

Which Words Are Hard to Recognize? Prosodic, Lexical, and Disfluency Factors That Increase ASR Error Rates. Sharon Goldwater, Dan Jurafsky, and Christopher D. Manning. Proceedings of ACL, 2008. ( PDF )

Modeling Human Performance on Statistical Word Segmentation Tasks. Michael C. Frank, Sharon Goldwater, Vikash Mansinghka, Tom Griffiths, and Joshua Tenenbaum. Proceedings of the 29th Annual Meeting of the Cognitive Science Society, 2007. ( PDF )

A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging. Sharon Goldwater and Thomas L. Griffiths. Proceedings of ACL, 2007. ( PDF )

Bayesian Inference for PCFGs via Markov Cain Monte Carlo. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Proceedings of NAACL, 2007. ( PDF )

Adaptor Grammars: a Framework for Specifying Compositional Nonparametric Bayesian Models. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Advances in Neural Information Processing Systems 19, 2007. ( PDF )

Distributional Cues to Word Segmentation: Context is Important. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of the 31st Boston University Conference on Language Development, 2007. ( PostScript , PDF ) If you plan to cite results from this paper, see this note.

Nonparametric Bayesian Models of Lexical Acquisition. Sharon Goldwater. Ph.D. thesis, Brown University, 2006. Tree-saving version (single spaced with minimal front matter, 115 pages), Official version (double spaced with all front matter, 176 pages). If you plan to cite results on word segmentation, see this note.

Contextual Dependencies in Unsupervised Word Segmentation. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of Coling/ACL, Sydney, 2006. ( PostScript , PDF .) Code is available on request. If you plan to cite results from this paper, see this note.

Interpolating between Types and Tokens by Estimating Power-Law Generators. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Advances in Neural Information Processing Systems 18, 2006. ( PostScript , PDF ) [NOTE: this is a corrected version.]

Improving Statistical MT Through Morphological Analysis. Sharon Goldwater and David McClosky. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Vancouver, 2005. ( PostScript , PDF )

Representational Bias in Unsupervised Learning of Syllable Structure. Sharon Goldwater and Mark Johnson. Proceedings of the 9th Conference on Computational Natural Language Learning (CONLL), Ann Arbor, 2005. ( PostScript , PDF )

Priors in Bayesian Learning of Phonological Rules. Sharon Goldwater and Mark Johnson. Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON), Barcelona, 2004. ( PostScript , PDF )

A Type System for Statically Detecting Spreadsheet Errors. Yanif Ahmad, Tudor Antoniu, Sharon Goldwater, and Shriram Krishnamurthi. Proceedings of the 18th IEEE International Symposium on Automated Software Engineering, 2003. ( PostScript , PDF )

Learning OT Constraint Rankings Using a Maximum Entropy Model. Sharon Goldwater and Mark Johnson. Proceedings of the Workshop on Variation within Optimality Theory, Stockholm University, 2003. ( PostScript , PDF )

Building a Robust Dialog System with Limited Data. Sharon Goldwater, Elizabeth Owen Bratt, Jean-Mark Gawron, and John Dowding. Proceedings of the Workshop on Conversational Systems at NAACL, 2000. (PostScript )

Interpreting Language in Context in CommandTalk. John Dowding, Elizabeth Owen Bratt, and Sharon Goldwater. Communicative Agents Workshop, Seattle, WA, 1999. (PostScript)

Edge-Based Best-First Chart Parsing. Eugene Charniak, Sharon Goldwater, and Mark Johnson. Proceedings of the Sixth Workshop on Very Large Corpora at COLING-ACL, 1998. (PostScript)

Personal Information

My admittedly impoverished personal web page is available for those who are inclined to be nosy.
Last modified: Thu Jan 31 15:49:13 PST 2008