Does anybody have any experience with using StanfordCoreNLP ( http://nlp.stanford.edu/software/corenlp.shtml through rJava in R? I’ve been struggling to get it to work for two days now, and think I’ve exhausted Google and previous questions on StackOverflow.

Essentially I’m trying to use the StanfordNLP libraries from within R. I have zero Java experience, but experience with other languages, so understand the basics about classes and objects etc.

From what I can see, the demo .java file that comes with the libraries seems to show that to use the classes from within Java, you’d import the libraries and then create a new object, along the lines of:

import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;

    public class demo {

        etc.
        etc.

        StanfordCoreNLP pipeline = new StanfordCoreNLP();

        etc.

From within R, I’ve tried calling some standard java functions; this works fine, which makes me think it’s the way I’m trying to access the Stanford libraries that’s causing the issue.

I extracted the Stanford ZIP to h:\stanfordcore, so the .jar files are all in the root of this directory. As well as the various other files contained in the zip, it contains the main .jar files:

  • joda-time.jar
  • stanford-corenlp-1.3.4.jar
  • stanford-corenlp-1.3.4-javadoc.jar
  • stanford-corenlp-1.3.4-models.jar
  • joda-time-2.1-sources.jar
  • jollyday-0.4.7-sources.jar
  • stanford-corenlp-1.3.4-sources.jar
  • xom.jar
  • jollyday.jar

If I try to access the NLP tools from the command line, it works fine.

From within R, I initalized the JVM and set the classpath variable:

.jinit(classpath = " h:/stanfordcore", parameters = getOption("java.parameters"),silent = FALSE, force.init = TRUE)

After this, if I use the command

.jclassPath() 

This shows that the directory containing the required .jar files has been added and gives this output in R:

[1] "H:\RProject-2.15.1\library\rJava\java" "h:\ stanfordcore"

However, when I try create a new object (not sure if this is the right Java terminology) I get an error.

I’ve tried creating the object in dozens of different ways (basically shooting in the dark though), but the most promising (simply because it seems to actually find the class is):

pipeline <- .jnew(class="edu/stanford/nlp/pipeline/StanfordCoreNLP",check=TRUE,silent=FALSE)

I know this finds the class, because if I change the class parameter to something not listed in the API, I get a cannot find class error.

As it stands, however, I get the error:

Error in .jnew(class = "edu/stanford/nlp/pipeline/StanfordCoreNLP", check = TRUE, : java.lang.NoClassDefFoundError: Could not initialize class edu.stanford.nlp.pipeline.StanfordCoreNLP

My Googling indicates that this might be something to do with not finding a required .jar file, but I’m completely stuck. Am I missing something obvious?

If anyone can point me even a little in the right direction, I’d be incredibly grateful.

Thanks in advance!

Peter

有帮助吗?

解决方案

Your classpath is wrong - you are using a directory but you have JAR files. You have to either unpack all JAR files in the directory you specify (unusual) or you have to add all the JAR files to the class path (more common). [And you'll have to fix your typos, obviously, but I assume those come form the fact that you were not using copy/paste]

PS: please use stats-rosuda-devel mailing list if you want more timely answers.

其他提示

Success!

After hours of tinkering, I managed to find a work-around. If anyone is interested, this is what I did:

  • Using Eclipse, I started a new project.

  • I then created a directory called ‘lib’ under the root of the project and copied all the Stanford .jar files into this directory.

  • After this, I edited the properties of the project in Eclipse, went to ‘Java Build Path’, clicked the libraries tab.

  • I then choose to import the Java system libraries.

  • I also clicked ‘Add External Jars’ and selected all the Stanford jars from the lib directory.

  • I then created intetermediary Java classes to call the Stanford classes (rather than trying to call them directly from R).

Example:

import java.lang.Object;
import java.util.Properties;
import java.io.*;
import java.util.*;

import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.util.*;


public class NLP {

    public static void main(String[] args) {

        Properties props = new Properties();
        props.put("annotators", "tokenize");
        StanfordCoreNLP coreNLP = new StanfordCoreNLP(props);
      }

}

This doesn’t return anything, but shows how a Stanford object can be created.

  • Build the project using Eclipse.

  • From within R, then set the working directory to the Java project's /bin directory (this isn’t strictly necessary, as you can add the classpath directory instead, but it simplifies things).

Then the object can be created in R with:

.jinit(classpath = ".")    // This initilizes the JVM
obj = .jnew("NLP")   

After this any methods you’ve created within the intermediary java classes can be called with:

Name_of_var_to_store_return_value = . jcall(class name, signature type, method, paramters)

I still didn’t figure out why I can’t call the Stanford classes directly from R, but this method works. I suspect that @ChristopherManning is right and my problem is down to calling the external jar from R. By building it from scratch, the Stanford jars are linked during the build, so I guess that’s what fixed it.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top