Вопрос

I am doing a project on identifying the emotions in the tweets. I have a tweet dataset of around half million. I am using weka.classifiers.functions.SMO as the machine learning classifier. I exactly have 10577 feature words which means every tweet's feature vector will have 10577 attributes plus one more for specifying the CLASS.

I am working on windows environment. I am performing my training on 16GB system. But still I am getting "outofmemoryerror java heap space" error. The size of my training set is around 8MB. I have tried increasing the heap size in weka runconfiguration.ini as well as the -Xmx option in java. Is there any way for training SMO classifier for a large dataset or it is possible to train the SMO classifier incrementally?

Это было полезно?

Решение

I had similar problems while using Weka, i guess standart JVM cannot handle such huge space requirements. May be there are other ways but when i googled it i saw someone recommended using Oracle JRockit as JVM. When i installed it my problem solved is instantly. May be you can give it a try.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top