The finite-state paradigm is increasingly popular, and natural-language applications based on the theory are elegant, robust and efficient. This book is a practical guide to finite-state theory and to the use of the Xerox finite-state programming languages LexC and xfst. It explains how to write morphological analyzer/generators and tokenizers for words in natural languages such as English, French, Arabic, Finnish, Hungarian, Malay, Korean, etc. The text provides graded introductions, examples, and exercises that are suitable for individual study or formal courses.
Natural-language words are typically formed of morphemes concatenated together, as in un+guard+ed+ly and over+critic+al, but some languages also exhibit non-concatenative processes such as interdigitation and reduplication. When morphemes are combined together into new words, they often display alternations in their pronunciation or spelling, as when swim+ing becomes swimming, take+ing becomes taking and die+ing becomes dying. Finite-state morphology assumes that both the word-formation rules (morphotactics) and the morpho-phonological alternation rules can be modeled as finite-state machines.
The LexC and xfst applications are widely tested, having been used commercially by Xerox and its partners, and in research by over 80 licensees. The book includes a non-commercial license and a CD-ROM with the Xerox finite-state software compiled for the Solaris, Linux, Windows, and Macintosh OS X operating systems.
For updates, corrections, and software support, please visit the Finite State Morphology website.