Question

I have been trying to make the NLTK (Natural Language Toolkit) work on the Google App Engine. The steps I followed are:

  1. Download the installer and run it (a .dmg file, as I am using a Mac).
  2. copy the nltk folder out of the python site-packages directory and place it as a sub-folder in my project folder.
  3. Create a python module in the folder that contains the nltk sub-folder and add the line: from nltk.tokenize import *

Unfortunately, after launching it I get this error (note that this error is raised deep within NLTK and I'm seeing it for my system installation of python as opposed to the one that is in the sub-folder of the GAE project):

 <type 'exceptions.ImportError'>: No module named nltk
Traceback (most recent call last):
  File "/base/data/home/apps/xxxx/1.335654715894946084/main.py", line 13, in <module>
    from lingua import reducer
  File "/base/data/home/apps/xxxx/1.335654715894946084/lingua/reducer.py", line 11, in <module>
    from nltk.tokenizer import *
  File "/base/data/home/apps/xxxx/1.335654715894946084/lingua/nltk/__init__.py", line 73, in <module>
    from internals import config_java
  File "/base/data/home/apps/xxxx/1.335654715894946084/lingua/nltk/internals.py", line 19, in <module>
    from nltk import __file__

Note: this is how the error looks in the logs when uploaded to GAE. If I run it locally I get the same error (except it seems to originate inside my site-packages instance of NLTK ... so no difference there). And "xxxx" signifies the project name.

So in summary:

  • Is what I am trying to do even possible? Will NLTK even run on the App Engine?
  • Is there something I missed? That is: copying "nltk" to the GAE project isn't enough?

EDIT: fixed typo and removed unnecessary step

Was it helpful?

Solution

The problem here is that nltk is attempting to do recursive imports: When nltk/__init__.py is imported, it imports nltk/internals.py, which then attempts to import nltk again. Since nltk is in the middle of being imported itself, it fails with a (rather unhelpful) error. Whatever they're doing is pretty weird anyway - it's unsurprising something like from nltk import __file__ breaks.

This looks like a problem with nltk itself - does it work when imported directly from a Python console? If so, they must be doing some sort of trickery in the installed version. I'd suggest asking on the nltk groups what they're up to and how to work around it.

OTHER TIPS

oakmad has managed to successfully work through deploying SEVERAL NLTK modules to GAE. Hope this helps. But , but be honest, I still don't think it's true even after read the post.

I've forked NLTK 2.0.3 on github to run it on app engine; tokenizing and simple POS tagging working with the MaxEnt Treebank tagger.

NLTK, I believe, does try its best to be pure-Python as a fallback (graceful degradation) when it can't have the C-coded accelerator extensions it would like. However one always needs to be moving with utter care to boldly inject such a rich package (recursively zipping up all of the .py files and using zipimport might be less flaky).

My installed NLTK, 0.95 I believe, has no ntlk.tokenizer -- it does have an nltk.tokenize, no trailing R, but obviously even the most minute such typo is 100% intolerable when you're trying to tell a computer exactly what you want, so I assume this is not a typo on your part but rather your use of a completely different and incompatible release of NLTK, so, WHAT release is it that has a subpackage named tokenizer rather than tokenize?

If you find a zero-tolerance policy for one-char typos hard to bear, computers and their programming are unlikely to be tolerable to you...;-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top