Thursday, March 15, 2012

How to install the NLTK package for Python 2.7 on Windows

These instructions describe how to install NLTK for Python 2.7 on Windows 7. I installed them using Win 7 Professional with SP1)

Note that for these instructions, I am using Python 2.7 (32-bit). There is a known bug (unresolved since 2009!!) with the installers for 64-bit version of Python, so I removed it and installed 32-bit Python instead. I also did not use Python 3.2 as there are limited packages that are compatible with it at this point in time (15th March 2012). This will most likely change in future, but for now it's 2.7.

These instructions assume you have already installed Python 2.7 (32-bit) and set the PATH variable to include 'C:\Python27\' (or wherever you installed Python).

Here are the steps.

1. The best way to install NLTK is using easy-install. This is part of the setuptools package (http://pypi.python.org/pypi/setuptools). Download this and install it. [My install used: setuptools-0.6c11.win32-py2.7.exe]

Be sure to set the path to the easy install directory (If you're like me you might have skipped through the instructions in the install windows, but read this part: "
Once installation is complete, you will find an ``easy_install.exe`` program in your Python ``Scripts`` subdirectory. Be sure to add this directory to your ``PATH`` environment variable, if you haven't already done so.")

2. Install NumPy. Download from: http://numpy.scipy.org/. [My install used: numpy-1.6.1-win32-superpack-python2.7.exe]

3. Install PyYAML. Download from: . [My install used: PyYAML-3.10.win32-py2.7.exe]

4. Use easy-install to install NLTK. Just type something similar to the following to install: "
\Python27\lib\site-packages\easy_install.py nltk".

5. Install all the corpora required using the following command: "python -m nltk.downloader -d D:\Python27\nltk_data all"

Note that I've used -d D:\Python27\nltk_data. You can exclude this if you just want to install it in the default directory C:\nltk_data. I just wanted to install it in the specified location.
Find more info here: http://www.nltk.org/data

If any packages fail to install just press n on the retry option as shown:

[nltk_data] | Downloading package 'punkt' to
[nltk_data] | D:\Python27\nltk_data...
[nltk_data] | Unzipping tokenizers\punkt.zip.
[nltk_data] | Error with downloaded zip file
Error installing package. Retry? [n/y/e]
n

And run the following at the python command line:

>>> import nltk
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
Identifier> punkt
Downloading package 'punkt' to D:\python27\nltk_data...
Unzipping tokenizers\punkt.zip.

---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> q

Good Luck!