Question

I'm trying to use the Stanford POS tagger (http://nlp.stanford.edu/downloads/tagger.shtml#Download), but when trying to initialize the tagger by

MaxentTagger tagger = new MaxentTagger("english-left3words-distsim.tagger");

I always get this error:

Reading POS tagger model from stanford-postagger-2013-11-12/models/english-left3words-distsim.tagger ... Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
  at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3443)
  at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3250)
  at java.io.ObjectInputStream.readString(ObjectInputStream.java:1628)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1320)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
  at java.util.HashMap.readObject(HashMap.java:1029)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:979)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1873)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1685)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1323)
  at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1970)
  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1895)
  at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1777)
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1329)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:349)
  at edu.stanford.nlp.tagger.maxent.MaxentTagger.readExtractors(MaxentTagger.java:582)
  at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:808)
  at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(MaxentTagger.java:755)
  at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:289)
  at edu.stanford.nlp.tagger.maxent.MaxentTagger.<init>(MaxentTagger.java:253)

I'm running this in Eclipse, and I'm allocating 4gb of heap space to the jvm as per the linked instructions from http://nlp.stanford.edu/downloads/pos-tagger-faq.shtml#oom (using -vmargs -Xms4096M -Xmx4096M -mx4096m)

I found this bug in a bit of searching: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6525563 which seems related, but I think the Stanford tagger is so widely used I doubt I would be the first one to turn up this OutOfMemoryError if it were due to that bug...

Update: it seems that Eclipse is not actually getting the memory I'm trying to allocate. Runtime.getRuntime().maxMemory() reports that it only has 123Mb available, while other projects in the same workspace have 1 gig available.

Was it helpful?

Solution

Forgot to add the vmargs to the project's run configuration.

Adding -Xms512m -Xmx1024m -mx512m to the VM arguments section of Run Configurations allows the tagger to be read properly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top