Accessing NLTK on Stanford's corn machines

NLTK is a lightweight, easy-to-use NLP toolkit for Python. If you're new to Python, the NLTK book (Natural Language Processing with Python) provides a gentle introduction to both the Python language and the NLTK toolkit.

This document explains how to access NLTK on the Stanford FarmShare corn machines.

(If you find any problems with these instructions, please let us know by emailing cs224u-win1213-staff@lists.stanford.edu.)

First, ssh into a corn machine:

$ ssh corn.stanford.edu

Now create a symlink in your home directory so that NLTK can find its data files:

$ ln -s /afs/ir/class/cs224u/nltk_data ~/nltk_data

Now start Python. If you just type python on the corn machines, you seem to get Python 2.4, which is too old to run NLTK. So let's specify Python 2.7 explicitly:

$ python2.7

Now you're in Python, and ready to boogie. First, let's try tokenizing and part-of-speech tagging a simple sentence:

>>> import nltk
>>> sentence = "He didn't arrive until eight o'clock"
>>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['He', 'did', "n't", 'arrive', 'until', 'eight', "o'clock"]
>>> tagged = nltk.pos_tag(tokens)
>>> tagged
[('He', 'PRP'), ('did', 'VBD'), ("n't", 'RB'), ('arrive', 'VB'), ('until', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ')]

And let's get a taste of NLTK's WordNet interface:

>>> from nltk.corpus import wordnet as wn
>>> wn.synsets("forest")
[Synset('forest.n.01'), Synset('forest.n.02'), Synset('afforest.v.01')]
>>> wn.synset("forest.n.02").definition
'land that is covered with trees and shrubs'
>>> wn.synset("forest.n.02").hypernyms()
[Synset('land.n.04'), Synset('biome.n.01')]

Remember that in Python, you can get help on any object using help().

>>> help(wn)

If you want to install NLTK on your own machine, you can find instructions here.