NLTK is a lightweight, easy-to-use NLP toolkit for Python. If you're new to Python, the NLTK book (Natural Language Processing with Python) provides a gentle introduction to both the Python language and the NLTK toolkit.
This document explains how to access NLTK on the Stanford FarmShare corn machines.
(If you find any problems with these instructions, please let us know by emailing firstname.lastname@example.org.)
First, ssh into a corn machine:
$ ssh corn.stanford.edu
Now create a symlink in your home directory so that NLTK can find its data files:
$ ln -s /afs/ir/class/cs224u/nltk_data ~/nltk_data
Now start Python. If you just type
python on the
corn machines, you seem to get Python 2.4, which is too old to run NLTK.
So let's specify Python 2.7 explicitly:
Now you're in Python, and ready to boogie. First, let's try tokenizing and part-of-speech tagging a simple sentence:
>>> import nltk
>>> sentence = "He didn't arrive until eight o'clock"
>>> tokens = nltk.word_tokenize(sentence)
['He', 'did', "n't", 'arrive', 'until', 'eight', "o'clock"]
>>> tagged = nltk.pos_tag(tokens)
[('He', 'PRP'), ('did', 'VBD'), ("n't", 'RB'), ('arrive', 'VB'), ('until', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ')]
And let's get a taste of NLTK's WordNet interface:
>>> from nltk.corpus import wordnet as wn
[Synset('forest.n.01'), Synset('forest.n.02'), Synset('afforest.v.01')]
'land that is covered with trees and shrubs'
Remember that in Python, you can get help on any object using
If you want to install NLTK on your own machine, you can find instructions here.