Question

I'm trying to use a Java library called langdetect hosted here. It couldn't be easier to use:

Detector detector;
String langDetected = "";
try {
    String path = "C:/Users/myUser/Desktop/jars/langdetect/profiles";
    DetectorFactory.loadProfile(path);
    detector = DetectorFactory.create();
    detector.append(text);
    langDetected = detector.detect();
} 
catch (LangDetectException e) {
    throw e;
}

return langDetected;

Except with respect to the DetectFactory.loadProfile method. This library works great when I pass it an absolute file path, but ultimately I think I need to package my code and langdetect's companion profiles directory inside the same JAR file:

myapp.jar/
    META-INF/
    langdetect/
        profiles/
            af
            bn
            en
            ...etc.
    com/
        me/
            myorg/
                LangDetectAdaptor --> is what actually uses the code above

I will make sure that the LangDetectAdaptor which is located inside myapp.jar is supplied with both the langdetect.jar and jsonic.jar dependencies it needs for langdetect to work at runtime. However I'm confused as to what I need to pass in to DetectFactory.loadProfile in order to work:

  • The langdetect JAR ships with the profiles directory, but you need to initialize it from inside your JAR. So do I copy the profiles directory and put it inside my JAR (like I prescribe above), or is there a way to keep it inside langdetect.jar but access it from inside my code?

Thanks in advance for any help here!

Edit : I think the problem here is that langdetect ships with this profiles directory, but then wants you to initialize it from inside your JAR. The API would probably benefit from being changed a little bit to just consider profiles its own configuration, and to then provide methods like DetectFactory.loadProfiles().except("fr") in the event that you don't want it to initialize French, etc. But this still doesn't solve my problem!

Was it helpful?

Solution

Looks like the library only accepts files. You can either change the code and try submitting the changes upstream. Or write your resource to a temp file and get it to load that.

OTHER TIPS

I have the same problem. You can load the profiles from the LangDetect jar using JarUrlConnection and JarEntry. Note in this example I am using Java 7 resource management.

    String dirname = "profiles/";
    Enumeration<URL> en = Detector.class.getClassLoader().getResources(
            dirname);
    List<String> profiles = new ArrayList<>();
    if (en.hasMoreElements()) {
        URL url = en.nextElement();
        JarURLConnection urlcon = (JarURLConnection) url.openConnection();
        try (JarFile jar = urlcon.getJarFile();) {
            Enumeration<JarEntry> entries = jar.entries();
            while (entries.hasMoreElements()) {
                String entry = entries.nextElement().getName();
                if (entry.startsWith(dirname)) {
                    try (InputStream in = Detector.class.getClassLoader()
                            .getResourceAsStream(entry);) {
                        profiles.add(IOUtils.toString(in));
                    }
                }
            }
        }
    }

    DetectorFactory.loadProfile(profiles);
    Detector detector = DetectorFactory.create();
    detector.append(text);
    String langDetected = detector.detect();
    System.out.println(langDetected);

Since no maven-support was available, and the mechanism to load profiles was not perfect (since you you need to define files instead of resources), I created a fork which solves that problem:

https://github.com/galan/language-detector

I mailed the original author, so he can fork/maintain the changes, but no luck - seems the project is abandoned.

Here is an example of how to use it now (own profiles can be written where necessary):

DetectorFactory.loadProfile(new DefaultProfile()); // SmProfile is also available
Detector detector = DetectorFactory.create();
detector.append(input);
String result = detector.detect();
// maybe work with detector.getProbabilities()

I don't like the static approach the DetectorFactory uses, but I won't rewrite the full project, you have to create your own fork/pull request :)

The solution provided by Mark Butler is still valid and solved my problem, but the dirname needs to be updated as the jar content has changed. The problem has been reported by Deepak but I have insufficient reputation to reply in comments. Here you are the two declarations you need.

In order to load short profiles:

String dirname = "profiles/shorttext/";

In order to load long profiles:

String dirname = "profiles/longtext/";

Setting the working dir for me fixed the problem.

 String workingDir = System.getProperty("user.dir");
 DetectorFactory.loadProfile(workingDir+"/profiles/");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top