Finnish Noun Inflection

The goal of this excercise is to build a lexicon that contains Finnish nouns inflected for number and case. To keep the problem reasonably small, we consider only three cases: Nominative, Genitive, and Partitive and two optional clitics. For the same reason, we only consider two types of nouns: monosyllabic and some bisyllabic stems. Even with these limitations, the problem is challenging because the shape of the endings depends on the shape of the stem and the stems are subject to several regular alternations such as Vowel Harmony and Consonant Gradation, and a number of other phenomena. This excercise assumes that the students is very familiar with xfst replace rules and the syntax of lexc source files.

The Facts

A Finnish noun begins with a stem. In all of the cases below, the stem is identical with the nominative singular. A plural marker, if any, immediately follows the stem. After the stem and the possible plural marker comes one of several possible case endings. We consider only three cases: Nominative, Partitive and Genitive. For the purpose of this excercise we, assume that the plural and case endings are the following:


Table 1: Plural, Genitive, and Partitive Markers
t
Nominative Plural marker
I
Plural marker for other cases, realized as i or j
n
Genitive Singular marker
Ten
Genitive Plural marker, realized as ten, den, or en
TA
Partitive Marker marker, realized as ta, , a, or ä.

There are numerous clitics that attach to the end of the word after the case ending. Here we only consider two of them.



Table 2: Clitics
kO
Question clitic, realized as ko or
hAn
Politeness clitic, realized as han or hän

The following table illustrates some of the possible combinations with the noun valo 'light'.



Table 3: Examples illustrating morphotactics with valo 'light'.
valo Nominative Singular. No case ending
valot
Nominative Plural. The general plural marker I is not used in the nominative.
valoa Partitive Singular. The partitive marker  TA is realized here as a
valon Genitive Singular
valoja Partitive Plural. The plural marker I is realized as j here. The  partitive  marker  TA is  realized here as a.
valojen Genitive Plural. The plural I is realized as j here. The Plural Genitive maker Ten is realized as en.
valoko
Nominative Singular with the question clitic.
valonhan
Genitive Singular with the politeness clitic.
valoakohan
Partitive Singular with the question and politeness clitics.
valojenkohan
Genitive Plural with the question and politeness clitics.

Note that the order of the two optional clitics is fixed.  The form valohanko is a possible word in Finnish but it is interpretable only as a compound noun: valo 'light' + hanko 'pitchfork'.

Finnish is a language with vowel harmony. The realization of A and O in the first two tables depends on the previous "harmonizing vowel".  Finnish has eight vowels: a, e, i, o, u, y, ä, and ö. The back vowels a, o, u and the front vowels ä, ö, y are harmonizing vowels, The two remaining ones, e and i, are neutral.  For the purpose of this excercise, assume the following Vowel Harmony rule:

A -> a, O -> o || [a|o|u] ~$[ä|ö|y] _
    .o.
A -> ä, O -> ö.

The realization of the T in the Genitive Plural and the Partitive marker depends on the syllable structure of the word. After a monosyllabic stem such as maa 'earth', the T is realized as t as in maata. After a bisyllabic stem such as valo 'light', the T disappears as in valoa. In cases where the T in Genitive Plural marker is realized as t, it is subject to the consonant gradation rules and surfaces as d in most cases.

The plural marker I is realized as j between two vowels, otherwise it is realized as i.

With this information you should be able to write the rules that correctly realize the six endings in all environments.

The remaining problem is that the stem of the noun also undergoes alternations. Consonant Gradation is one that we have already seen and solved. The other stem alternations, Vowel Rounding, Vowel Lowering, Vowel Dropping, and Vowel Shortening are illustrated in Table 4 below.


Table 4: Stem and suffix alternations
Nom Sg
Gloss
Nom Pl
Gen Sg
Part Sg
Gen Pl
Part Pl
puu
tree
puut
puun
puuta
puiden
puita
maa
earth
maat
maan
maata
maiden
maita
pää
head
päät
pään
päätä
päiden
päitä
syy
reason
syyt
syyn
syytä
syiden
syitä
pii
silicon
piit
piin
piitä
piiden
piitä
suo
swamp
suot
suon
suota
soiden
soita
työ
work
työt
työn
työtä
töiden
töitä
tie
road
tiet
tien
tietä
teiden
teitä







tikka
dart
tikat
tikan
tikkaa
tikkojen
tikkoja
pappi
priest
papit
papin
pappia
pappien
pappeja
hytti
cabin
hytit
hytin
hyttiä
hyttien
hyttejä
kukka
flower
kukat
kukan
kukkaa
kukkien
kukkia
tutti
pacifier
tutit
tutin
tuttia
tuttien
tutteja
kauppa
shop
kaupat
kaupan
kauppaa
kauppojen
kauppoja
kuoppa
hole
kuopat
kuopan
kuoppaa
kuoppien
kuoppia







jalka
foot
jalat
jalan
jalkaa
jalkojen
jalkoja
härkä
ox
härät
härän
härkää
härkien
härkiä
linko
sling
lingot
lingon
linkoa
linkojen
linkoja
kyky
talent
kyvyt
kyvyn
kykyä
kykyjen
kykyjä







sopu
harmony
sovut
sovun
sopua
sopujen
sopuja
kampa
comb
kammat
kamman
kampaa
kampojen
kampoja
piispa
bishop
piispat
piispan
piispaa
piispojen
piispoja







vahti
guard
vahdit
vahdin
vahtia
vahtien
vahteja
ilta
evening
illat
illan
iltaa
iltojen
iltoja
sota
war
sodat
sodan
sotaa
sotien
sotia
häntä
tail
hännät
hännän
häntää
häntien
häntiä

Vowel Rounding

Short a is rounded to an o in front of the plural marker I. Examples: tikkaa Partitive Singular, tikkoja Partitive Plural, kamman Genitive Singular, kampojen Genitive Plural, kauppaa Partitive Singular, kauppoja Partitive Plural. This does not happen if the vowel nucleus of the preceding syllable starts with a rounded vowel (o or u). See the rule for Vowel Dropping.

Vowel Lowering

Short i is lowered to e in front of the plural marker I. Examples: vahtia Partitive Singular, vahteja Partitive Plural, pappia Partitive Singular, pappeja Partive Plural. This does not happen in the Genitive Plural. See the rule for Vowel Dropping.

Vowel Dropping

A short a is deleted in front of the plural marker I if the nucleus of the preciding syllable consists of, or begins with, a rounded vowel (u or o). Note the different behavior of kuoppa where the a is dropped and kauppa where the a is rounded to o in the plural. Short ä is always deleted in front of the plural marker I. Examples: kukan Genitive Singular, kukkien Genitive Plural, sotaa Partitive Singular, sotia Partitive Plural, kuoppaa Partitive Singular, kuoppia Partitive Plural, härän Genitive Singular, härkien Genitive Plural. Stem-final i is dropped in the Genitive Plural. Examples: papin Genitive Singular, pappien Genitive Plural, vahdin Genitive Singular, vahtien Genitive Plural.

Vowel Shortening

In front of the plural marker I, the long vowels aa, ee, ii, oo, uu, yy, ää, öö, are shortened to a, e, i, o, u, y, ä, ö, respectively. The diphthongs uo, , and ie shorten to o, ö, and e, respectively. Examples: puuta Partitive Singular, puita Partitive Plural, tietä Partitive Singular, teitä Partitive Plural, työn Genitive Singular, töiden Genitive Plural.

The Task

Your task is to write a lexicon, a source file to lexc, that includes the 27 words, 4 suffixes and 2 clitics mentioned above and assembles them into morphotactically correct underlying Finnish forms. User the tags +Sg, +Pl, +Nom, +Gen, +Part for marking number and case on the lexical side, and +Q, +P for the question and politeness clitics. Compile using lexc or the command read lex in xfst. At this point your network should contain pairs such as

kukka+Sg+Part    tutti+Pl+Gen    jalka+Pl+Part   härkä+Pl+Gen    tie+Pl+Part
kukka    TA      tutti I  Ten    jalka I  TA     härkä I  Ten    tie I  TA

Secondly, write replace rules for realizing the plural marker, the case endings, and the four stem-changing rules sketched above. Use the Vowel Harmony Rule shown above.  Combine the suffix realization rules with  the stem  alternation  rules  sketched above. For consonant gradation, use the rule given in the Gradation script. Think about how to order the rules. It matters.

Finally, create an xfst script that reads in the lexicon, compiles the rules, and composes the lexicon with the rules leaving the result on the stack.  The final result should contain pairs such as

kukka+Sg+Part    tutti+Pl+Gen    jalka+Pl+Part    härkä+Pl+Gen    tie+Pl+Part
kukka     a      tutt  i   en    jalko j   a      härk  i   en    te  i  tä
etc.

Verfify that the lower side of the lexicon contains the properly inflected surface forms by terminating the script with the command

print random-lower