Question

I am writing a python program in Google App Engine that calculates tf-idf using TfidfVectorizer in sklearn.

I have added sklearn library and have the import as:

from sklearn.feature_extraction.text import TfidfVectorizer

However it gives me no module named _check_build although it is in the library that I have imported.

Note: I have the same code in pure python and it works just fine so there is nothing wrong with the python syntax or imports; The problem starts with GAE.

Do you know any way to solve this issue?

Was it helpful?

Solution 2

if you are not using any of GAE-specific tools, try deploying your app on Heroku. It let's you deploy a whole virtual environment with all the installed libraries on it. Specifically, Scikit-learn works on Heroku just fine. Check this Github repo for example.

OTHER TIPS

You can't. sklearn has a lot of 'c' based dependencies and typically any module that is named with a leading _ is a binary module.

So that's why you are getting a no module named _check_build error.

I seriously doubt you will get it to run even if you fake some of the 'c' libs unless they have pure python analogues.

I have done this in the past where libs had 'c' based performance versions as well as pure python.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top